How to De-Identify PHI Under the HIPAA Privacy Rule: A Practical Guide

Product Pricing Demo Video Free HIPAA Training
LATEST
video thumbnail
Admin Dashboard Walkthrough Jake guides you step-by-step through the process of achieving HIPAA compliance
Ready to get started? Book a demo with our team
Talk to an expert

How to De-Identify PHI Under the HIPAA Privacy Rule: A Practical Guide

Kevin Henry

HIPAA

March 01, 2025

7 minutes read
Share this article
How to De-Identify PHI Under the HIPAA Privacy Rule: A Practical Guide

Overview of HIPAA Privacy Rule De-Identification

Under the HIPAA Privacy Rule, Protected Health Information (PHI) becomes de-identified when it can no longer reasonably identify an individual. De-identified information is not subject to HIPAA, though contractual commitments and other laws may still apply. HIPAA recognizes two De-Identification Standards: the Safe Harbor method and the Expert Determination method.

Safe Harbor removes specific identifiers that are most likely to reveal identity. Expert Determination relies on Expert Statistical Analysis to ensure the risk of re-identification is very small given the data and the release context. You should choose the method that best fits your use case, timeline, skill set, and the sensitivity of the data environment.

Safe Harbor Method Requirements

The Safe Harbor method requires removing all “Safe Harbor Identifiers” and having no actual knowledge that remaining data could identify a person. The 18 identifiers are:

  • Names.
  • All geographic subdivisions smaller than a state (street address, city, county, precinct, ZIP code), except the initial three digits of a ZIP code if the combined area contains more than 20,000 people; otherwise use 000.
  • All elements of dates (except year) directly related to an individual (e.g., birth, admission, discharge, death) and all ages over 89, which must be aggregated into “age 90 or older.”
  • Telephone numbers.
  • Fax numbers.
  • Email addresses.
  • Social Security numbers.
  • Medical record numbers.
  • Health plan beneficiary numbers.
  • Account numbers.
  • Certificate/license numbers.
  • Vehicle identifiers and serial numbers, including license plates.
  • Device identifiers and serial numbers.
  • Web URLs.
  • IP addresses.
  • Biometric identifiers (e.g., finger and voice prints).
  • Full-face photographs and comparable images.
  • Any other unique identifying number, characteristic, or code (except a nonderivable code assigned for re-identification by the originating entity).

Beyond direct identifiers, scan notes and free text for residual hints (e.g., rare events, job titles, or locations) that could enable linkage. Validate output with sampling, automated rules, and peer review, and document your rationale for each retained field to satisfy De-Identification Standards.

Expert Determination Method Process

Expert Determination uses an independent expert to certify that the chance of re-identification is very small, considering data features and the release setting. This path is flexible and often preserves more utility than Safe Harbor while meeting the same legal endpoint.

Core steps

  • Scope and inventory: define purposes, data elements, quasi-identifiers, and intended recipients.
  • Threat modeling: evaluate plausible attackers, available auxiliary data, and incentives.
  • Re-Identification Risk Assessment: measure risks such as distinguishability, linkability, and replicability across the dataset and subgroups.
  • Transformations: apply generalization, suppression, top/bottom coding, binning, perturbation, microaggregation, or tokenization as needed.
  • Validation: test risk metrics post-transform (e.g., k-anonymity targets, attribute disclosure checks like l-diversity/t-closeness where appropriate).
  • Documentation: issue a signed report describing methods, assumptions, data versions, limits, and any conditions for use or sharing.

Experts should have relevant statistical and privacy expertise, act independently, and specify when reassessment is required (e.g., if data are combined with new sources or the access model changes). Maintain the Expert’s report with your release package.

Managing Re-Identification Risk

Risk is a function of the data and the environment. Minimize what you share, control who can access it, and reduce how easily records can be singled out or linked. Combine technical controls with governance to keep overall risk very small.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Technical strategies

  • Generalize or bin precise values (dates to month or quarter; ages to ranges; geography to larger areas).
  • Suppress or mask rare categories and outliers that make individuals unique.
  • Replace persistent identifiers with nonderivable tokens; avoid reversible hashes of direct identifiers.
  • Use noise addition or differential privacy for statistics when sharing aggregates.

Governance strategies

  • Limit recipients and uses via contracts; prohibit any attempt at re-identification or linkage.
  • Choose a release model that fits risk: open release, registered access, or secure enclave.
  • Monitor for data recombination risks and re-run assessments when context changes.

Using De-Identified Health Information

Once data are de-identified under HIPAA, they are no longer PHI and may be used or disclosed without patient authorization for analytics, quality improvement, AI development, product design, or research. However, consider ethics, promises made in notices, state privacy laws, and sectoral rules when planning downstream uses.

Preserve utility by retaining meaningful clinical concepts while removing or generalizing identifiers. Keep provenance, versioning, and transformation logs so results can be replicated and datasets can be refreshed consistently.

Implementing Data Use Agreements

HIPAA requires a Data Use Agreement (DUA) for Limited Data Sets, which still contain some elements (e.g., dates, city, state, ZIP) and remain Protected Health Information (PHI). A DUA is not required for fully de-identified data, but many organizations adopt one to reinforce controls and expectations.

Required Data Use Agreement Provisions for Limited Data Sets

  • Permitted uses and disclosures and who may use/receive the data.
  • Recipient obligations to use only as permitted, apply safeguards, and report improper uses/disclosures.
  • Flow-down: ensure agents and subcontractors accept the same restrictions.
  • No attempts to re-identify or contact individuals.

Best-practice provisions for de-identified data

  • Purpose limitation, access controls, and audit rights.
  • Prohibition on linkage that would raise re-identification risk.
  • Breach notification timelines and data return/destruction terms.

De-Identification in Research and Medical Imaging

Research often needs granular detail while protecting privacy. For fully de-identified datasets, HIPAA authorization and Institutional Review Board Waivers are not required under HIPAA; for Limited Data Sets or PHI, IRB or Privacy Board review may be needed depending on the protocol and the Common Rule.

Medical imaging considerations

  • DICOM headers: remove or replace patient-related tags (e.g., names, IDs, accession numbers) per recognized de-identification profiles.
  • Pixel data: redact burned-in annotations; for faces or distinctive body parts, remove or blur full-face and comparable images to meet Safe Harbor.
  • Metadata utility: retain clinically meaningful fields (modality, study descriptors) by generalizing rather than deleting where possible.
  • Quality checks: run automated detectors plus human review to confirm no residual PHI in headers or pixels.

Conclusion

Effective HIPAA de-identification blends the right method (Safe Harbor or Expert Determination) with sound risk management and governance. By applying rigorous transformations, validating risk, and enforcing clear Data Use Agreement Provisions, you can share high-utility health data responsibly and compliantly.

FAQs

What is the Safe Harbor method for de-identifying PHI?

Safe Harbor requires removing 18 specified identifiers (such as names, detailed geography, contact numbers, and full-face photos) and ensuring you have no actual knowledge that the remaining data could identify someone. When complete, the dataset is considered de-identified under HIPAA.

How does the Expert Determination method reduce re-identification risk?

An independent expert conducts a Re-Identification Risk Assessment, applies statistical and scientific techniques to reduce risk (e.g., generalization, suppression, perturbation), and certifies that the likelihood of re-identification is very small for the intended use and release context.

Can de-identified data be shared without patient authorization?

Yes. HIPAA allows the use and disclosure of de-identified data without patient authorization. Still, organizations often use contracts to set purpose limits, forbid re-identification, and require safeguards.

What identifiers must be removed to comply with HIPAA de-identification?

Under Safe Harbor, you must remove 18 categories including names; geographic details smaller than a state (with limited ZIP code exceptions); all elements of dates except year and ages over 89 (grouped as 90+); contact numbers and emails; government and medical IDs; account and license numbers; vehicle, device, web, and IP identifiers; biometrics; full-face images; and any other unique identifying codes not permitted for re-identification.

Share this article

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Related Articles