What Counts as De-Identified Information Under HIPAA? Safe Harbor vs. Expert Determination
Safe Harbor Method Requirements
Under HIPAA, the Safe Harbor method de-identifies data by removing specific identifiers from Individually Identifiable Health Information so that the remaining data cannot reasonably identify a person. It is a rule-based checklist that—when followed precisely—creates strong compliance certainty.
The 18 identifiers that must be removed
- Names.
- All geographic subdivisions smaller than a state (street address, city, county, precinct, ZIP code, and equivalent geocodes), except the initial three digits of a ZIP code if the combined area has a population of more than 20,000; otherwise the first three digits must be changed to 000.
- All elements of dates (except year) directly related to an individual, including birth, admission, discharge, and death dates; and all ages over 89 and related date elements (which must be aggregated into a single category of age 90 or older).
- Telephone numbers.
- Fax numbers.
- Email addresses.
- Social Security numbers.
- Medical record numbers.
- Health plan beneficiary numbers.
- Account numbers.
- Certificate/license numbers.
- Vehicle identifiers and serial numbers, including license plate numbers.
- Device identifiers and serial numbers.
- Web URLs.
- Internet Protocol (IP) addresses.
- Biometric identifiers, including finger and voice prints.
- Full-face photographic images and any comparable images.
- Any other unique identifying number, characteristic, or code (except a permitted re-identification code maintained separately by the covered entity).
Additional Safe Harbor conditions
- Remove identifiers of the individual and of relatives, employers, or household members.
- Ensure Unique Identifiers Removal is complete across primary fields, notes, and embedded metadata.
- Do not release any information if you have actual knowledge that remaining data could still identify the person.
Implementation tips
- Create a traceable workflow that confirms each identifier’s deletion before release.
- Document how ZIP code and age rules were applied, including aggregation for 90+ ages.
- Maintain a separate, secure mapping table only if you must allow re-identification internally.
Expert Determination Approach
The Expert Determination method relies on a qualified expert who applies Statistical De-Identification Techniques to evaluate whether the risk of re-identification is very small. Unlike Safe Harbor, it flexes to your data and use case, often preserving more utility while managing risk through analysis and controls.
Who the “expert” is
An expert has appropriate knowledge and experience with generally accepted statistical and scientific principles for rendering information not individually identifiable. Typical backgrounds include biostatistics, quantitative privacy, or data science with demonstrated work in re-identification risk.
Process the expert follows
- Profile quasi-identifiers (for example, combinations of ZIP, birth date, and gender) that could indirectly identify a person.
- Estimate re-identification risk for plausible attack models (linkage, inference, and deductive disclosure).
- Apply transformations—such as generalization, suppression, perturbation, hashing with safeguards, and data swapping—to reduce risk.
- Re-calculate residual risk and iterate until it is very small for the intended context of use and data release environment.
- Create Risk Assessment Documentation describing methods, assumptions, results, and release conditions.
Common techniques and controls
- k-anonymity, l-diversity, and t-closeness to mitigate linkage and inference risks.
- Bucketing dates, coarsening geography, and top-coding ages (for example, 90+).
- Access controls and contractual limits that constrain user behavior and data linkage.
Criteria for De-Identification Under HIPAA
HIPAA recognizes two pathways to de-identification: the Safe Harbor checklist and the Expert Determination approach. If either pathway is satisfied and you lack actual knowledge that remaining information could identify a person, the resulting dataset is no longer PHI and can be used or disclosed outside HIPAA’s PHI rules.
Key criteria you must meet
- Use either Safe Harbor’s defined Unique Identifiers Removal or obtain a written determination from a qualified expert.
- Retain documentation: for Safe Harbor, evidence that each identifier was removed; for Expert Determination, the expert’s methods and conclusions.
- If you assign a re-identification code, keep the key separately and do not derive the code from the individual’s information.
Individually Identifiable Health Information vs. de-identified data
Individually Identifiable Health Information links to a person or could reasonably identify the person. Once properly de-identified, the dataset no longer contains IIHI and falls outside HIPAA’s PHI requirements, though other laws or ethical standards may still apply.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.
Limited Data Set and Data Use Agreements
A Limited Data Set (LDS) is PHI with direct identifiers removed but that may retain certain details—such as city, state, ZIP code, and elements of dates. An LDS enables greater analytic utility for research, public health, and healthcare operations, subject to a Data Use Agreement.
What must be removed for an LDS
- Direct identifiers including names, full addresses (except city, state, ZIP), telephone and fax numbers, email addresses, Social Security numbers, medical record numbers, Health Plan Beneficiary Numbers, account numbers, certificate/license numbers, vehicle and device identifiers, URLs, IP addresses, Biometric Identifiers, and full-face photographs or comparable images.
What may remain in an LDS
- City, state, ZIP code, and dates such as admission, discharge, birth, and death.
- Other non-direct identifiers necessary for the approved purpose, consistent with minimum necessary principles.
Data Use Agreement essentials
- Permitted uses and disclosures (research, public health, operations) and identification of authorized recipients.
- Prohibitions on re-identification and contact with individuals.
- Safeguards, reporting obligations, and flow-down requirements to agents or subcontractors.
Managing Re-Identification Risk
Re-identification risk depends on the data itself, external data availability, and release context. Managing it requires technical, organizational, and contractual controls working together.
Risk drivers to consider
- Granularity of quasi-identifiers (fine-grained dates or geography increase risk).
- Uniqueness of records (rare conditions or outliers can be identifying).
- Linkability to public or commercial datasets.
Controls that reduce risk
- Data transformations: generalization, suppression, noise addition, and aggregation.
- Access and environmental controls: user vetting, secure enclaves, output checking, and usage monitoring.
- Contractual limits: purpose restrictions and explicit bans on linkage attempts, enforced through a Data Use Agreement.
Operating model and documentation
- Maintain living Risk Assessment Documentation aligned to your release scenarios.
- Re-evaluate risk whenever data content, user population, or available external data changes.
- Adopt minimum necessary data design and document why each element is required.
Comparison of De-Identification Methods
When Safe Harbor is a good fit
- You need a clear, checklist-driven pathway with predictable compliance.
- Your use case tolerates the loss of detail from removing the 18 identifiers.
- You prefer not to rely on specialized expertise or modeling.
When Expert Determination is a better fit
- You need more data utility (for example, retaining some dates or finer geography) while keeping risk very small.
- You can support expert involvement, governance, and periodic reassessment.
- You have defined release contexts that enable layered technical and contractual controls.
Trade-offs at a glance
- Compliance certainty: Safe Harbor is prescriptive; Expert Determination is flexible but requires defensible analysis.
- Data utility: Expert Determination often preserves more utility than Safe Harbor or an LDS.
- Effort and cost: Safe Harbor is faster to implement; Expert Determination requires expert time and ongoing governance.
Conclusion
HIPAA permits two valid routes to de-identification. Safe Harbor offers a straightforward checklist emphasizing Unique Identifiers Removal, while Expert Determination tailors Statistical De-Identification Techniques to achieve a very small risk in context. Limited Data Sets, governed by a Data Use Agreement, provide a middle ground for specific purposes. Choose the pathway that balances compliance certainty, analytic value, and your capability to manage re-identification risk over time.
FAQs.
What identifiers must be removed under the Safe Harbor method?
You must remove 18 identifiers of the individual and of relatives, employers, or household members: names; all sub-state geography (with the three-digit ZIP rule); all elements of dates except year and all ages over 89 (aggregate to 90+); telephone and fax numbers; email addresses; Social Security, medical record, Health Plan Beneficiary Numbers; account and certificate/license numbers; vehicle and device identifiers; URLs and IP addresses; Biometric Identifiers; full-face photos and comparable images; and any other unique identifying number, characteristic, or code.
Who qualifies as an expert for Expert Determination?
An expert is someone with appropriate knowledge and experience applying statistical and scientific methods to de-identification—commonly a biostatistician or privacy/data scientist—who can document that the re-identification risk is very small for the intended data and context.
How is re-identification risk assessed?
The expert profiles quasi-identifiers, models plausible attack scenarios (such as linkage and inference), applies transformations and controls to reduce risk, and then quantifies residual risk. The expert issues Risk Assessment Documentation describing assumptions, methods, results, and conditions for data release and ongoing monitoring.
What is the purpose of a Limited Data Set?
A Limited Data Set removes direct identifiers but can retain city, state, ZIP code, and dates to improve utility for research, public health, or healthcare operations. It must be shared under a Data Use Agreement that defines permitted uses, safeguards, and prohibitions on re-identification or unauthorized disclosure.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.