Decoding HIPAA: A Deep Dive into Identifiers for De-identification

Check out the new compliance progress tracker


Product Pricing Demo Video Free HIPAA Training
LATEST
video thumbnail
Admin Dashboard Walkthrough Jake guides you step-by-step through the process of achieving HIPAA compliance
Ready to get started? Book a demo with our team
Talk to an expert

Decoding HIPAA: A Deep Dive into Identifiers for De-identification

Kevin Henry

HIPAA

January 13, 2024

7 minutes read
Share this article
Decoding HIPAA: A Deep Dive into Identifiers for De-identification

Overview of HIPAA De-Identification

HIPAA de-identification is the process of transforming Protected Health Information so that it can no longer reasonably identify an individual. Done correctly, the resulting dataset is not PHI and can be used or disclosed without patient authorization.

HIPAA recognizes two pathways: the Safe Harbor Standard and Expert Determination. Both aim to reduce re-identification risk to an acceptable threshold, but they take different routes—one by removing specific identifiers, the other by using statistical methods and expert judgment.

Why de-identification matters

De-identification unlocks data for analytics, quality improvement, AI development, public health, and research while honoring privacy. The key is controlling re-identification risk not just at the record level, but also across linkable datasets and small subpopulations.

PHI versus de-identified data

PHI includes any individually identifiable health information in any form. De-identified data has had identifiers removed or masked so that the information cannot reasonably be linked back to a person. The moment re-identification becomes reasonably possible, the data again functions as PHI.

The 18 HIPAA Identifiers

Under the Safe Harbor Standard, you must remove these 18 identifiers for every individual, relative, employer, or household member:

  1. Names.
  2. All geographic subdivisions smaller than a state (street address, city, county, precinct, ZIP code, and equivalent geocodes), except the initial three digits of a ZIP code when the combined area has more than 20,000 people; otherwise use 000.
  3. All elements of dates (except year) directly related to an individual, including birth, admission, discharge, and death dates; plus all ages over 89 and all related date elements (aggregate as “90 or older”).
  4. Telephone numbers.
  5. Fax numbers.
  6. Email addresses.
  7. Social Security numbers.
  8. Medical Record Numbers.
  9. Health Plan Beneficiary Numbers.
  10. Account numbers.
  11. Certificate/license numbers.
  12. Vehicle identifiers and serial numbers, including license plate numbers.
  13. Device identifiers and serial numbers.
  14. Web URLs.
  15. IP address numbers.
  16. Biometric Identifiers, including finger and voice prints.
  17. Full-face photographs and any comparable images.
  18. Any other unique identifying number, characteristic, or code (except as permitted for re-identification codes).

Safe Harbor Method Requirements

The Safe Harbor Standard requires removing all 18 identifiers from the dataset and having no actual knowledge that the remaining information could identify an individual, alone or in combination with other data.

Geography and dates

Truncate location to state-level only. For ZIP codes, keep only the first three digits if the aggregated population exceeds 20,000; otherwise replace with 000. Keep years, but strip months, days, and exact timestamps tied to individuals; aggregate ages over 89 into a single “90+” category.

Common pitfalls

  • Free-text notes that contain residual names, addresses, or dates.
  • Small cells (rare diseases, unique procedures) that enable linkage attacks.
  • Embedded identifiers in filenames, image pixels, or metadata.

Operational guidance

  • Inventory data sources and map each field to the identifier list.
  • Apply consistent rules for suppression, generalization, and date shifting.
  • Record decisions and perform quality checks to confirm removals are complete.

Expert Determination Method Explained

Expert Determination allows you to retain more data utility by having a qualified expert conclude, using accepted principles, that the risk of re-identification is very small for anticipated recipients, data uses, and release conditions.

Who qualifies and what is “very small”

The expert should have relevant knowledge in statistics, computer science, or privacy engineering. “Very small” is context-specific and depends on plausible attacks, available auxiliary data, and safeguards such as access controls and contractual limits.

Techniques and safeguards

  • Risk modeling (e.g., k-anonymity, l-diversity, t-closeness) and outlier handling.
  • Pseudonymization with robust key management and perturbation of quasi-identifiers.
  • Administrative and technical controls that reduce re-identification risk.

Documentation and lifecycle

The expert should deliver a written report describing methods, assumptions, data transformations, risk estimates, and conditions of use. Reassess risk when data, context, or external data availability changes.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Use of Re-identification Codes

HIPAA permits assigning a code to allow authorized re-linking, provided the code is not derived from or related to information about the individual (for example, no hash of a Social Security number), and the mapping mechanism is not disclosed.

Good practices

  • Generate random, non-derivable tokens and store the mapping in a segregated, access-controlled system.
  • Limit who can re-identify and for what purposes, and log all access.
  • Define retention and destruction schedules for codebooks.

De-Identification in Research and Public Health

Once data are de-identified, they are no longer PHI and may be used or disclosed without patient authorization for research, analytics, or public health purposes. Still, you should manage re-identification risk, especially when data could be linked with external sources.

De-identified data vs. limited data sets

A limited data set may include certain dates and city/ZIP but remains PHI and requires a Data Use Agreement. Fully de-identified data under Safe Harbor or Expert Determination does not require a DUA, though contracts are often used to set recipient obligations.

Risk-aware sharing

  • Minimize linkage risks by removing rare attributes and generalizing small cells.
  • Impose recipient obligations that prohibit re-identification attempts and onward sharing.
  • Monitor for data misuse and update controls as contexts evolve.

De-Identification of Medical Images

Medical images can carry identifiers both in metadata and in pixels. Full-face photographs and comparable images are explicit identifiers under Safe Harbor, and some scans can reconstruct faces (a biometric risk).

Metadata scrubbing (DICOM)

  • Remove fields such as patient name, PatientID, birth date, Medical Record Numbers, Health Plan Beneficiary Numbers, accession numbers, and contact details.
  • Strip dates/times beyond year when tied to an individual (StudyDate, SeriesDate, AcquisitionDateTime) and remove device serial numbers and operator names where necessary.

Pixel-level protections

  • Redact “burned-in” overlays that show names, dates, or MRNs.
  • Apply de-facing for head CT/MRI to prevent facial reconstruction while preserving clinical anatomy when possible.
  • Validate that no PHI is present after transformations.

Workflow tips

  • Automate DICOM tag removal using vetted profiles, then run human spot checks.
  • Maintain a secure, separate token map if re-identification is required for clinical follow-up.
  • Balance privacy with utility through careful parameter tuning and image QA.

Summary

Whether you choose the Safe Harbor Standard or Expert Determination, success depends on rigorous identifier removal, sound risk modeling, and governance that keeps re-identification risk low over time. Treat de-identification as an ongoing program, not a one-time task.

FAQs

What are the 18 HIPAA identifiers for de-identification?

They include: names; geographic details below state (with the three-digit ZIP rule); all elements of dates except year and ages over 89; phone and fax numbers; email addresses; Social Security numbers; Medical Record Numbers; Health Plan Beneficiary Numbers; account, certificate/license, vehicle, and device numbers; URLs and IP addresses; Biometric Identifiers; full-face photos and comparable images; and any other unique identifying number, characteristic, or code except permitted re-identification codes.

How does the Safe Harbor method ensure HIPAA compliance?

Safe Harbor requires removing all 18 identifiers from every record and ensuring you have no actual knowledge that the remaining data could identify someone, alone or combined with other data. Proper handling of geography, dates, small cells, and free text is essential to maintain compliance.

What is the Expert Determination method under HIPAA?

It is a pathway where a qualified expert applies accepted statistical and scientific techniques to determine that the risk of re-identification is very small for specific release conditions. The expert documents methods, assumptions, and safeguards, and organizations maintain those conditions over time.

Can de-identified data be used for research without patient authorization?

Yes. Data that are de-identified under Safe Harbor or Expert Determination are no longer PHI and may be used or disclosed for research without patient authorization. If instead you use a limited data set, a Data Use Agreement is required because it still contains certain identifiers.

Share this article

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Related Articles