HIPAA De-Identification Explained: What You Must Remove and Why

Check out the new compliance progress tracker


Product Pricing Demo Video Free HIPAA Training
LATEST
video thumbnail
Admin Dashboard Walkthrough Jake guides you step-by-step through the process of achieving HIPAA compliance
Ready to get started? Book a demo with our team
Talk to an expert

HIPAA De-Identification Explained: What You Must Remove and Why

Kevin Henry

HIPAA

May 03, 2024

8 minutes read
Share this article
HIPAA De-Identification Explained: What You Must Remove and Why

When you share health data outside your organization, HIPAA de-identification ensures individuals cannot be identified. Two compliant pathways exist—Safe Harbor and Expert Determination—and both aim to reduce the risk of re-identification while preserving analytical value. This guide explains exactly what you must remove, how to treat geographic and date elements, and when specialized methods or Limited Data Sets with a Data Use Agreement make sense for your use case.

Safe Harbor Method Requirements

The Safe Harbor Method requires you to remove a specific list of identifiers from each record and to have no actual knowledge that the remaining information could identify an individual. It is straightforward, rules-based, and well-suited when you can tolerate coarse geography and limited temporal detail.

What Safe Harbor Demands

  • Remove the 18 enumerated identifiers from all records, including free text and embedded metadata.
  • Retain only state-level geography (or three-digit ZIP Codes meeting the population rule) and years for dates, with a special rule for individuals aged 90 and over.
  • Do not include codes derived from identifiers (for example, a hashed SSN); if you need re-linkage, use a random code and store the codebook separately.
  • Confirm you do not have actual knowledge that the data could still identify a person when combined with other information you reasonably possess.

Practical Steps to Implement

  • Inventory all fields, including notes, images, documents, and logs.
  • Strip direct contact fields; generalize geographic subdivisions; convert dates to year; aggregate ages 90+ to a single category.
  • Redact identifiers in free text using pattern and dictionary-based detection plus manual spot checks.
  • Validate outputs with sampling and attempt re-identification tests aligned to your data release context.

Expert Determination Approach

The Expert Determination pathway relies on a qualified expert who applies statistical and scientific principles to conclude that the risk of re-identification is very small for the anticipated data use. It is preferred when you need more granular data than Safe Harbor allows.

What the Expert Does

  • Profiles the dataset and external data environments to understand linkage risks.
  • Applies transformations—generalization, suppression, noise addition, data swapping, or controlled date shifting—to meet a defined risk threshold.
  • Documents methods, assumptions, and residual risk for the specific release and audience.

When to Choose Expert Determination

Use this approach when you need finer geography, more detailed timelines, or rare-condition data that Safe Harbor would over-suppress. Many organizations pair Expert Determination with governance measures like Data Use Agreements to further reduce the practical Risk of Re-Identification.

List of 18 Identifiers to Remove

  1. Names.
  2. All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP Code, and equivalent geocodes (except permitted three-digit ZIP Codes; see Geographic Data Handling).
  3. All elements of dates (except year) for dates directly related to an individual (for example, birth, admission, discharge, death) and all ages over 89, including any date elements indicative of such age; ages 90+ must be grouped into a single category.
  4. Telephone numbers.
  5. Fax numbers.
  6. Email addresses.
  7. Social Security numbers.
  8. Medical record numbers.
  9. Health plan beneficiary numbers.
  10. Account numbers.
  11. Certificate/license numbers.
  12. Vehicle identifiers and serial numbers, including license plate numbers.
  13. Device identifiers and serial numbers.
  14. Web URLs.
  15. IP address numbers.
  16. Biometric identifiers (for example, fingerprints, voiceprints).
  17. Full-face photographic images and any comparable images.
  18. Any other unique identifying number, characteristic, or code (except a non-derivable re-identification code maintained separately).

Geographic Data Handling

Under Safe Harbor, you must remove geographic subdivisions smaller than a state. That means no street addresses, cities, counties, precincts, or precise geocodes. Latitude/longitude and exact facility coordinates must be excluded.

Three-Digit ZIP Code Rule

  • You may include the first three digits of a ZIP Code only if the aggregated area of all ZIP Codes sharing those three digits has a population of more than 20,000.
  • If the population threshold is not met, replace the three-digit ZIP with 000.
  • State, region, or country-level indicators are generally acceptable; avoid disclosing small Geographic Subdivisions that narrow to specific neighborhoods or facilities.

If You Need Finer Geography

Consider a Limited Data Set, which may include city, full ZIP Code, and full dates, but only under a Data Use Agreement and for permitted purposes (such as research, public health, or health care operations). A Limited Data Set is not de-identified data and remains subject to HIPAA safeguards.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Date Elements Removal

Safe Harbor requires removing all elements of dates directly related to an individual, except the year. This includes months, days, and exact times for events like encounters, procedures, specimen collection, and death.

Practical Treatments

  • Keep year-only for events and birth year, except for individuals aged 90+.
  • Report age in years (or coarse bands) up to 89; recode ages 90 and older to a single “90 or older” value.
  • Avoid exact timestamps; if timing is analytically essential, use relative intervals or coarse bins rather than real calendar dates under Safe Harbor.
  • When richer temporal detail is necessary, rely on Expert Determination or a Limited Data Set with a Data Use Agreement.

Biometric and Photographic Identifiers

Biometric Identifiers uniquely tie measurements to a person. Under HIPAA de-identification, you must remove them alongside photographic images that could reveal identity.

Biometrics to Exclude

  • Fingerprints and voiceprints.
  • Device-generated identifiers and signatures that are biometric in nature (for example, certain face or gait “prints”), unless transformed so they cannot identify an individual.

Photographs and Comparable Images

  • Remove full-face photographs and images comparable in identifying power (for example, profile images showing identifiable facial features).
  • Strip embedded metadata from images and files, which can include timestamps, GPS coordinates, and device serial numbers.

Managing Re-Identification Risk

Even after removing identifiers, linkages to external data or unusual combinations of attributes can create residual risk. Manage that risk through technical, organizational, and contractual controls matched to your data’s sensitivity and audience.

Common Pitfalls

  • Identifiers left in free text, scanned forms, filenames, or metadata.
  • Highly unique combinations (for example, rare diagnoses plus small Geographic Subdivisions) that enable singling-out.
  • Codes derived from identifiers (hashed SSNs, reversible tokens) that violate Safe Harbor’s code restrictions.
  • Release of small, known cohorts that are easily linkable to public information.

Controls That Work

  • Generalize quasi-identifiers (age bands, broader regions), suppress rare values, and apply noise or swapping where appropriate.
  • Use a non-derivable random code for record linkage and store the codebook separately with strict access controls.
  • Limit fields to the minimum necessary, implement user access tiers, and require a Data Use Agreement that prohibits re-identification attempts and onward sharing.
  • Periodically reassess Risk of Re-Identification as external data ecosystems evolve.

Conclusion

To comply with HIPAA de-identification, choose the pathway that fits your needs: Safe Harbor for clear, rules-based removal of the 18 identifiers, or Expert Determination for tailored protection when more detail is required. Handle geography and dates carefully, exclude biometric and photographic identifiers, and reinforce safeguards with governance and Data Use Agreements where needed.

FAQs.

What are the 18 identifiers that must be removed for HIPAA de-identification?

The 18 are: names; geographic subdivisions smaller than a state (including street, city, county, precinct, ZIP, and geocodes, with a limited three-digit ZIP exception); all elements of dates except year for dates directly related to an individual plus ages over 89 (aggregate to 90+); telephone numbers; fax numbers; email addresses; Social Security numbers; medical record numbers; health plan beneficiary numbers; account numbers; certificate/license numbers; vehicle identifiers and serial numbers (including plates); device identifiers and serial numbers; web URLs; IP addresses; biometric identifiers (for example, fingerprints, voiceprints); full-face photographic images and comparable images; and any other unique identifying number, characteristic, or code not permitted for re-identification.

How does the Safe Harbor method differ from Expert Determination?

Safe Harbor is rules-based: remove the 18 identifiers and ensure no actual knowledge of identifiability remains. Expert Determination is risk-based: a qualified expert applies statistical techniques and concludes that the data presents a very small re-identification risk for the intended use, often allowing more granular geography or timing than Safe Harbor.

What specific geographic data must be removed or modified?

Remove street address, city, county, full ZIP Code, and precise geocodes. You may include the first three digits of a ZIP Code only if the combined area for that three-digit prefix exceeds 20,000 people; otherwise use 000. State-level (or broader) geography is acceptable under Safe Harbor. If you require city or full ZIP Code, use a Limited Data Set under a Data Use Agreement.

How is the risk of re-identification minimized after de-identification?

Mitigate risk by generalizing or suppressing quasi-identifiers, adding controlled noise or swapping where appropriate, limiting released fields, enforcing Data Use Agreements, and periodically reassessing exposure against new external datasets. For complex releases, engage the Expert Determination pathway to quantify and bound the Risk of Re-Identification.

Share this article

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Related Articles