Protecting PHI: How to Handle HIPAA’s 18 Identifiers During Data Sharing
Protecting PHI during data sharing starts with understanding HIPAA’s 18 identifiers and how they trigger privacy obligations. When you know what must be removed or controlled, you can apply PHI de-identification strategies that preserve data utility while minimizing re-identification risk.
This guide explains the identifiers, compares the Safe Harbor Method with the Expert Determination Method, clarifies when a Limited Data Set fits, and shows how to manage coded data and Data Use Agreements effectively.
Overview of HIPAA's 18 Identifiers
HIPAA treats information as PHI when it relates to health or payment for care and contains one or more of these identifiers. Removing or tightly safeguarding them is central to compliant data sharing.
- Names.
- Geographic subdivisions smaller than a state (street address, city, county, precinct, ZIP code, and geocodes).
- All elements of dates (except year) for dates directly related to an individual; and ages over 89 (aggregate as 90+).
- Telephone numbers.
- Fax numbers.
- Email addresses.
- Social Security numbers.
- Medical record numbers.
- Health plan beneficiary numbers.
- Account numbers.
- Certificate/license numbers.
- Vehicle identifiers and serial numbers, including license plates.
- Device identifiers and serial numbers.
- Web URLs.
- IP addresses.
- Biometric identifiers (e.g., finger and voice prints).
- Full-face photographs and comparable images.
- Any other unique identifying number, characteristic, or code (except permitted re-identification codes).
Note: Under the Safe Harbor Method, only the first three digits of a ZIP code may remain, and only if the population in the associated area is at least 20,000; otherwise, use 000.
De-identification Methods for PHI
Safe Harbor Method
The Safe Harbor Method removes all 18 identifiers and requires you to have no actual knowledge that the remaining data could identify an individual. It’s straightforward, repeatable, and well-suited to standardized releases, but it can reduce granularity (for example, detailed dates and precise locations are removed).
Expert Determination Method
The Expert Determination Method uses a qualified expert to apply accepted statistical or scientific principles to conclude that the risk of re-identification is very small. Techniques can include generalization, suppression, perturbation, k-anonymity, l-diversity, and differential privacy. You must keep documentation of methods, assumptions, and results.
Choosing a method
Use Safe Harbor when speed, simplicity, and consistency are priorities. Choose Expert Determination when you need more data utility (e.g., keeping month-level dates or broader geographies) and can invest in a documented, risk-based assessment.
Using Limited Data Sets
A Limited Data Set (LDS) is PHI from which direct identifiers are removed but certain fields—like city, state, ZIP code, and elements of dates—may remain. An LDS may be used or disclosed only for research, public health, or health care operations and always requires a Data Use Agreement.
What an LDS may retain
- City, state, and ZIP code.
- Dates such as admission, discharge, service, birth, and death.
- Other non-direct identifiers needed for the approved purpose.
What an LDS must exclude
- Names and full postal addresses (beyond city, state, ZIP).
- Telephone, fax, and email.
- SSN, medical record, health plan beneficiary, and account numbers.
- Certificate/license numbers; vehicle and device identifiers.
- URLs, IP addresses, biometric identifiers.
- Full-face photographs and comparable images.
Compared with fully de-identified data, an LDS offers higher utility but carries more compliance obligations, including strict use limits, safeguards, and DUA enforcement.
Managing Coded Data
Coding (pseudonymization) replaces direct identifiers with a code so records can be linked without exposing identity. Under HIPAA, a re-identification code must not be derived from PHI (for example, not an unsalted hash of a SSN), and the code key must be kept separately with appropriate safeguards.
Coded Data Safeguards
- Separate the code key from the coded dataset; restrict and audit access to both.
- Generate codes with secret salts or keys; avoid reversible transformations and obvious keys.
- Encrypt data in transit and at rest; maintain tamper-evident logs for all linkages.
- Define re-identification procedures, approvals, and purpose limitations in advance.
Remember that coded data is not automatically de-identified; its status depends on the presence, control, and risk of the code and key.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.
Assessing Risks of Re-identification
Re-identification risk arises when data can be linked back to an individual via unique combinations, external datasets, or residual identifiers. Assess risk from both data-intrinsic properties (rarity, outliers) and external threats (public registers, news, social media).
Practical risk-reduction techniques
- Generalize or bin quasi-identifiers (e.g., age bands, broader geographies, month instead of exact day).
- Suppress or perturb rare combinations; set minimum cell sizes for published aggregates.
- Truncate or jitter dates; consider differential privacy for high-utility releases.
- Scan and redact free text and images to remove hidden identifiers and metadata.
- Conduct adversarial testing and replicate “linkage” attempts before release.
For Expert Determination, document your attack models, thresholds, and residual risk, and ensure periodic re-evaluation when data, context, or public datasets change.
Compliance and Best Practices
Adopt a governance program that aligns data use with privacy obligations and business goals. Apply least-necessary data, role-based access, encryption, and continuous monitoring in every sharing workflow.
Operational controls
- Data inventory and classification tied to the 18 identifiers and quasi-identifiers.
- Standard playbooks for Safe Harbor Method, Expert Determination Method, and Limited Data Set creation.
- Vendor due diligence, BAAs where appropriate, and DUA enforcement for each data share.
- Incident response, breach reporting pathways, and timely revocation of access.
- Retention limits and secure destruction for both datasets and code keys.
Documentation to maintain
- De-identification specifications, transformation logs, and QA results.
- Expert opinions and supporting analysis for risk determinations.
- DUAs, approvals, and audit trails of disclosures and downstream recipients.
Educate teams regularly; many failures stem from overlooked attachments, unredacted notes, or inconsistent application of rules across projects.
Implementing Data Use Agreements
A Data Use Agreement defines who may use or receive a Limited Data Set, for what purposes, and under which safeguards. It operationalizes boundaries so you can share useful data while keeping obligations enforceable.
Key DUA terms to include
- Permitted uses and disclosures limited to research, public health, or operations.
- Identification of authorized users and recipient organizations.
- Safeguards, including access controls, encryption, and Coded Data Safeguards where codes are used.
- Prohibitions on re-identification and contacting individuals.
- Reporting and mitigation duties for any misuse or breach.
- Flow-down obligations for contractors and agents.
- Return or destruction of the LDS at project end and right to audit.
Implementation steps
- Define the use case and decide between de-identified data, Limited Data Set, or full PHI with BAA.
- Map required fields to the minimum necessary; design transformations and risk controls.
- Draft and execute the DUA; train users before access is granted.
- Provision secure environments, monitor usage, and review periodic attestations.
- Close out access, collect attestations, and destroy or return data on completion.
Conclusion
Protecting PHI during data sharing hinges on mastering HIPAA’s 18 identifiers, selecting the right de-identification pathway, and enforcing disciplined safeguards. By pairing sound technical methods with strong DUAs and governance, you reduce re-identification risk while retaining the data utility your programs require.
FAQs.
What are HIPAA’s 18 identifiers?
They are specific data elements—such as names, smaller-than-state geography, detailed dates, contact numbers, account and record numbers, biometric and photographic data, IP addresses, URLs, and other unique codes—that can directly or indirectly identify a person. Removing or safeguarding them is central to compliant sharing.
How does the Safe Harbor method protect PHI?
The Safe Harbor Method protects PHI by removing all 18 identifiers and ensuring you have no actual knowledge that the remaining information could identify an individual. It is a clear, rule-based approach that simplifies releases but often reduces data granularity.
What is the difference between Limited Data Sets and de-identified data?
De-identified data (via Safe Harbor or Expert Determination) is not PHI because re-identification risk is minimal. A Limited Data Set is still PHI—some fields like city, state, ZIP, and dates may remain—so it can be used only for defined purposes and always requires a Data Use Agreement.
How can organizations reduce re-identification risks?
Apply minimization and transformations (generalization, suppression, perturbation), set minimum cell sizes for aggregates, manage coded data with strict key controls, and perform adversarial testing. For Expert Determination, document methods and maintain ongoing reviews as contexts and external datasets evolve.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.