HIPAA Privacy Rule De-Identification Checklist: 18 Identifiers, Exceptions, and Examples

Check out the new compliance progress tracker


Product Pricing Demo Video Free HIPAA Training
LATEST
video thumbnail
Admin Dashboard Walkthrough Jake guides you step-by-step through the process of achieving HIPAA compliance
Ready to get started? Book a demo with our team
Talk to an expert

HIPAA Privacy Rule De-Identification Checklist: 18 Identifiers, Exceptions, and Examples

Kevin Henry

HIPAA

May 02, 2024

7 minutes read
Share this article
HIPAA Privacy Rule De-Identification Checklist: 18 Identifiers, Exceptions, and Examples

Overview of the HIPAA Privacy Rule

The HIPAA Privacy Rule protects individually identifiable health information (IIHI) held by a covered entity or its business associate. Once data are properly de-identified, they are no longer considered protected health information under HIPAA and can be used or disclosed without authorization.

HIPAA recognizes two pathways for de-identifying data: the Safe Harbor Method and the Expert Determination Method. Both aim to minimize re-identification risk while preserving analytic utility, but they use different standards and documentation.

As you plan data sharing or analytics, decide whether you act as a covered entity, business associate, or recipient. Your role determines which safeguards, agreements, and approvals you must implement to handle de-identified or limited datasets responsibly.

Detailed List of the 18 Identifiers

Under the Safe Harbor Method, you must remove the following 18 identifiers and have no actual knowledge that the remaining data could identify an individual:

  1. Names.
  2. All geographic subdivisions smaller than a state, including street address, city, county, precinct, and ZIP code, except the initial three digits when the combined area includes more than 20,000 people; otherwise use 000.
  3. All elements of dates (except year) directly related to an individual, including birth, admission, discharge, and death dates; ages over 89 must be aggregated to “90 or older.”
  4. Telephone numbers.
  5. Fax numbers.
  6. Email addresses.
  7. Social Security numbers.
  8. Medical record numbers.
  9. Health plan beneficiary numbers.
  10. Account numbers.
  11. Certificate/license numbers.
  12. Vehicle identifiers and serial numbers, including license plates.
  13. Device identifiers and serial numbers.
  14. Web URLs.
  15. IP address numbers.
  16. Biometric identifiers, including finger and voice prints.
  17. Full-face photographs and comparable images.
  18. Any other unique identifying number, characteristic, or code (other than a permitted re-identification code maintained separately).

Exceptions to De-Identification Requirements

HIPAA’s Safe Harbor includes narrow allowances that are often misunderstood. Use these carefully and document decisions to manage re-identification risk.

  • Geography: You may retain the initial three digits of a ZIP code only when all ZIP codes sharing those digits represent a population greater than 20,000; otherwise, replace with 000. State-level geography may be retained.
  • Dates: You may keep only the year for all date elements tied to an individual. Aggregate all ages over 89 into a single “90 or older” category.
  • Decedent information protection: PHI of a decedent remains protected for 50 years after death. After that, the information is no longer PHI, reducing the need for de-identification in that context.
  • Limited data set alternative: When full de-identification is not feasible, a limited data set may be shared under a Data Use Agreement. It can retain certain dates and general geography (e.g., city, state, ZIP) but remains PHI and is not de-identified.

Methods for De-Identifying Data

Safe Harbor Method

Remove the 18 identifiers and confirm you have no actual knowledge that the remaining data could identify an individual. This method is straightforward, auditable, and fast, but it can reduce data utility, especially for time- or location-sensitive analyses.

Expert Determination Method

A qualified expert applies accepted statistical or scientific principles to determine that the risk of re-identification is very small. The expert documents techniques, assumptions, and testing, and you retain this report to substantiate compliance.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Common expert techniques

  • Generalization and binning (e.g., age bands, coarser geographies).
  • Suppression and masking of high-risk values or small cell sizes.
  • Perturbation or noise infusion with utility-preserving constraints.
  • Outlier treatment to reduce linkage risk from rare combinations.
  • Quantified risk testing (e.g., k-anonymity, l-diversity, t-closeness, population uniqueness modeling).

Workflow You Can Reuse

  1. Scope the use case and necessary features; prefer the Safe Harbor Method if sufficient.
  2. Profile re-identification risk; identify high-risk quasi-identifiers and rare categories.
  3. Apply transformation strategies; iterate to meet utility goals.
  4. Validate with an expert (if using Expert Determination) and capture formal documentation.
  5. Implement access controls, logging, and sharing terms; re-test after any change.

Examples

  • Safe Harbor example: Convert a birth date of 07/12/1976 to year 1976, replace 02139 with 021 if the 3-digit area exceeds 20,000 population (else 000), and remove street, phone, MRN, and URLs.
  • Expert Determination example: Generalize ages into 5-year bands, top-code at 90+, coarsen ZIP to county groups, suppress small cells, and validate that the modeled re-identification risk is below a pre-set threshold.

Use of Re-Identification Codes

HIPAA permits assigning a code so a covered entity or business associate can link de-identified records back to the source when needed (e.g., quality checks or clinical follow-up), provided strict conditions are met.

  • The code must not be derived from or related to information about the individual (avoid hashes of names, MRNs, or SSNs) and must not be reversible.
  • Store the key separately; use and disclose neither the code for other purposes nor the re-identification mechanism.
  • Limit key access to a minimal set of custodians; log every use; rotate or retire codes per policy.

Good choices include randomly generated tokens or system-assigned IDs that have no mathematical relationship to individual attributes. Avoid deterministic transformations of personal identifiers.

De-Identification in Research Settings

De-identified data can typically be used for research without HIPAA authorization because it is no longer PHI. This makes it valuable for feasibility assessments, algorithm development, and benchmarking where individual identity is irrelevant.

When you need more granular detail, a limited data set under a Data Use Agreement allows retention of certain dates and geography. The agreement must define permitted uses, recipients, safeguards, and prohibit re-identification attempts and onward disclosures.

Institutional oversight remains essential. Maintain documentation (e.g., expert reports, DUAs), control access on a need-to-know basis, and periodically reassess re-identification risk as datasets, tools, or external data landscapes change.

Example

A multi-site outcomes study shares a limited data set containing city, state, five-digit ZIP codes, and procedure dates under a Data Use Agreement. Analysts build risk-adjusted models, while contractual and technical controls prevent re-identification and unauthorized disclosures.

Limitations and Risks of De-Identification

De-identification reduces but does not eliminate re-identification risk. Linkage attacks using external datasets, rare diagnoses, fine-grained timestamps, or small geography can compromise anonymity if controls are weak.

Utility–privacy trade-offs are real. Overzealous suppression can cripple analyses, while insufficient generalization can leave residual risk. Treat de-identification as part of a broader governance program, not a one-time transformation.

Mitigation Strategies

  • Adopt risk thresholds and test routinely, especially before new releases or merges.
  • Use layered controls: access governance, query auditing, minimum necessary data, and contractual prohibitions against re-identification.
  • Prefer privacy-by-design: plan generalization and aggregation early; avoid collecting unnecessary identifiers.
  • Monitor evolving external data sources that could change re-identification risk.

Conclusion

Effective de-identification under HIPAA aligns method, documentation, and governance. Use the Safe Harbor Method when feasible, the Expert Determination Method when you need flexibility, manage re-identification risk continuously, and rely on Data Use Agreements for limited data sets in research.

FAQs

What are the 18 HIPAA identifiers that must be removed for de-identification?

They include names; geographic subdivisions smaller than a state (with the 3-digit ZIP caveat); all elements of dates (except year) and ages over 89; phone and fax numbers; email; Social Security, medical record, health plan beneficiary, and account numbers; certificate/license numbers; vehicle and device identifiers; URLs; IP addresses; biometric identifiers; full-face images; and any other unique identifying number, characteristic, or code.

How does the Expert Determination method differ from Safe Harbor?

Safe Harbor uses a fixed checklist of 18 identifiers and a “no actual knowledge” standard. Expert Determination relies on a qualified expert to show, using accepted techniques and testing, that the residual re-identification risk is very small, with documented methods and results. It provides flexibility but requires expertise and formal documentation.

Can de-identified data be re-identified under HIPAA rules?

Yes, but only by the covered entity or business associate that assigned a compliant re-identification code, and only under strict controls. The code cannot be derived from personal attributes, the key must be kept separately, and neither the code nor the mechanism may be disclosed for other uses.

What exceptions apply to geographic data in HIPAA de-identification?

You must remove all geographic subdivisions smaller than a state, except you may retain the first three digits of a ZIP code when the combined population for all ZIP codes with those digits exceeds 20,000; otherwise replace the digits with 000. State-level geography may be kept, but city, county, street address, and full ZIP codes must be removed under Safe Harbor.

Share this article

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Related Articles