De‑Identified PHI: What It Is, Methods, and HIPAA Compliance

Check out the new compliance progress tracker


Product Pricing Demo Video Free HIPAA Training
LATEST
video thumbnail
Admin Dashboard Walkthrough Jake guides you step-by-step through the process of achieving HIPAA compliance
Ready to get started? Book a demo with our team
Talk to an expert

De‑Identified PHI: What It Is, Methods, and HIPAA Compliance

Kevin Henry

HIPAA

July 26, 2025

7 minutes read
Share this article
De‑Identified PHI: What It Is, Methods, and HIPAA Compliance

Overview of De-Identified PHI

De-identified PHI is health information that no longer identifies a person and for which you have no reasonable basis to believe it can be used to identify someone. Once properly de-identified, it falls under the HIPAA Privacy Rule Exemption and is not regulated as PHI.

De-identification is more than simple masking. It blends Protected Health Information Removal with context-aware Data De-Identification Techniques so the remaining data stays useful for analysis while minimizing re-identification risk. You should plan for governance, documentation, and ongoing monitoring, not just a one-time scrub.

What de-identification achieves

Effective de-identification lets you share, analyze, and innovate—with fewer legal constraints—while honoring patient privacy. It enables secure data collaboration for research, quality improvement, and product development without exposing individuals.

Safe Harbor Method Explained

The Safe Harbor method removes a fixed set of Safe Harbor Identifiers from a dataset and requires you to have no actual knowledge that the remaining information could identify a person. It is deterministic, transparent, and widely adopted for operational use cases.

The 18 identifiers to remove

  • Names
  • Geographic subdivisions smaller than a state (street address, city, county, precinct, ZIP; keep only the initial 3 ZIP digits when the area has more than 20,000 people; otherwise use 000)
  • All elements of dates (except year) directly related to an individual (e.g., birth, admission, discharge, death); ages over 89 must be grouped as 90 or older
  • Telephone numbers
  • Fax numbers
  • Email addresses
  • Social Security numbers
  • Medical record numbers
  • Health plan beneficiary numbers
  • Account numbers
  • Certificate/license numbers
  • Vehicle identifiers and serial numbers, including license plates
  • Device identifiers and serial numbers
  • Web URLs
  • IP addresses
  • Biometric identifiers (e.g., finger and voice prints)
  • Full-face photos and comparable images
  • Any other unique identifying number, characteristic, or code

How to apply Safe Harbor in practice

  • Inventory data fields and map them to the 18 identifiers.
  • Generalize or suppress dates to year and geography to state (or valid 3-digit ZIP rules).
  • Remove direct identifiers and test for residual uniqueness (e.g., rare combinations such as age 90+, rare diagnoses, or tiny geographies).
  • Document the process and assert you have no actual knowledge of re-identification risk beyond a very small threshold.

Strengths and limitations

Safe Harbor is simple and auditable, but it can reduce data utility (especially for time- and location-sensitive analyses) and may still leave quasi-identifiers that, in certain contexts, can enable linkage attacks. Pair it with Privacy Safeguards and usage controls to mitigate residual risk.

Expert Determination Method

The Expert Determination pathway uses Expert Statistical Analysis by a qualified expert who certifies that the risk of re-identification is very small, considering data features, recipient context, and reasonably available external data. This path is flexible and can preserve more utility than Safe Harbor.

What the expert does

  • Conducts a Re-Identification Risk Assessment tailored to data content, environment, and adversary models.
  • Applies transformations such as suppression, generalization, sampling, perturbation, swapping, k-anonymity, l-diversity, t-closeness, or differential privacy techniques.
  • Documents methods, assumptions, thresholds, and validation tests and issues a formal opinion.
  • Re-evaluates controls when data, recipients, or external risk factors change.

When to choose Expert Determination

Use it when you need granular dates, detailed geography, rare-event analysis, or longitudinal linkages that Safe Harbor would strip away. It retains analytic value while controlling risk through rigorous, context-aware methodology.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Risks and Safeguards

Even after de-identification, risks persist. Linkage attacks can match quasi-identifiers (e.g., age, gender, 3-digit ZIP, event dates) with public or commercial datasets. Small cell sizes, outliers, unusual procedures, or device signatures can also increase identifiability.

Privacy Safeguards to reduce residual risk

  • Data minimization: keep only fields needed for the use case.
  • Aggregation and thresholds: suppress small cells; coarsen dates/locations.
  • Noise and privacy budgets for aggregate releases (e.g., differential privacy).
  • Access controls: encrypt at rest/in transit; limit recipients; implement purpose-based access.
  • Contractual controls: prohibit re-identification and onward sharing; require incident reporting and audits.
  • Monitoring and governance: periodic risk re-assessment, approval workflows, and defensible documentation.

HIPAA Compliance Requirements

De-identified data is not PHI, but HIPAA compliance starts earlier—during processing of identifiable data. If a vendor de-identifies data for you, that vendor is a Business Associate and requires a BAA. After de-identification, continue to manage risk contractually and operationally.

Compliance checklist

  • Choose and document your pathway: Safe Harbor or Expert Determination.
  • Maintain evidence: data inventories, transformation logs, and expert opinions where applicable.
  • No re-identification: if you generate a code to re-link records, ensure it is not derived from the individual’s information and keep the key separate and confidential.
  • Policies and training: define permissible uses, recipient vetting, and incident response for de-identified data.
  • Vendor management: BAAs for PHI handling; data-sharing terms post de-identification to ban re-identification and re-disclosure.
  • Consider other laws: state privacy statutes or sectoral rules may still apply to de-identified or pseudonymous data.

A Limited Data Set is not fully de-identified and remains PHI; it can include dates and certain geography but requires a Data Use Agreement. Don’t conflate it with de-identified data under the HIPAA Privacy Rule Exemption.

Use Cases for De-Identified PHI

  • Clinical research and outcomes studies without patient authorization.
  • Quality measurement, benchmarking, and value-based care analytics.
  • AI/ML model development, validation, and drift monitoring.
  • Public health surveillance, forecasting, and capacity planning.
  • Product design, usability testing, and real-world performance analysis.
  • Vendor collaboration and data marketplace participation with contractual Privacy Safeguards.

Best Practices for Data De-Identification

  • Align technique to purpose: use Safe Harbor for routine sharing; use Expert Determination when you need higher utility with controlled risk.
  • Adopt a tiered release model: internal detailed data under stricter controls; external data further generalized or aggregated.
  • Standardize transformations: consistent generalization hierarchies for dates, locations, and codes; maintain a change log.
  • Test before release: simulate attacks, measure uniqueness, and document a Re-Identification Risk Assessment.
  • Strengthen governance: cross-functional reviews (privacy, security, clinical, data science) and periodic recertification.
  • Treat de-identified data as sensitive: enforce least privilege, audit access, and maintain retention and deletion schedules.

Conclusion

De-identified PHI enables valuable analytics and collaboration while honoring privacy. Choose the right pathway, apply rigorous controls, and document decisions. With sound Expert Statistical Analysis or Safe Harbor Identifiers removal—and robust Privacy Safeguards—you can unlock data utility and maintain HIPAA compliance.

FAQs.

What is de-identified PHI under HIPAA?

It is health information that does not identify an individual and for which you have no reasonable basis to believe it can be used to identify someone. Properly de-identified data is outside HIPAA’s Privacy Rule because of the HIPAA Privacy Rule Exemption.

How does the Safe Harbor Method work?

You remove 18 specific Safe Harbor Identifiers, generalize dates to year and geography to permitted levels, and ensure you have no actual knowledge that the remaining data could identify a person. You then document your Protected Health Information Removal process.

What qualifies an expert for Expert Determination?

An expert is someone with appropriate knowledge and experience applying accepted statistical or scientific principles to privacy risk. They perform Expert Statistical Analysis, produce a written Re-Identification Risk Assessment, and attest that re-identification risk is very small for the specific data and context.

Are there risks of re-identification after de-identification?

Yes. Residual risk can arise from quasi-identifiers, rare events, small cells, or linkage with external datasets. You mitigate this with aggregation, suppression, noise injection, contractual bans on re-identification, access controls, and periodic risk reviews.

How is HIPAA compliance maintained with de-identified data?

Document your chosen method, maintain evidence, and manage vendors via BAAs when PHI is processed pre-de-identification. Afterward, enforce Privacy Safeguards contractually, avoid re-identification, protect any linkage codes, and reassess risk when data or recipients change.

Share this article

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Related Articles