HIPAA Privacy Rule: How to De-Identify PHI Using Two Methods

Product Pricing Demo Video Free HIPAA Training
LATEST
video thumbnail
Admin Dashboard Walkthrough Jake guides you step-by-step through the process of achieving HIPAA compliance
Ready to get started? Book a demo with our team
Talk to an expert

HIPAA Privacy Rule: How to De-Identify PHI Using Two Methods

Kevin Henry

HIPAA

March 02, 2025

6 minutes read
Share this article
HIPAA Privacy Rule: How to De-Identify PHI Using Two Methods

Overview of the HIPAA Privacy Rule

The HIPAA Privacy Rule sets De-Identification Standards that let you remove direct and indirect identifiers from Protected Health Information (PHI) so the data is no longer regulated as PHI. Proper de-identification enables analysis, innovation, and sharing while aligning with U.S. Data Privacy Regulations.

Under HIPAA, you may de-identify PHI using one of two methods: Safe Harbor or Expert Determination. Safe Harbor relies on removing specific Safe Harbor Identifiers, while Expert Determination uses Expert Statistical Analysis to demonstrate that the risk of re-identification is very small for the intended use.

De-identified data helps you support quality improvement, research, and product development without handling regulated PHI—provided you follow the rule precisely and keep documentation that shows Covered Entity Compliance.

Safe Harbor Method Explained

What the method requires

Safe Harbor requires you to remove 18 categories of identifiers from each record. You must also have no actual knowledge that the remaining information could identify an individual. Some elements, like the three-digit ZIP rule and the “90 or older” aggregation, have special handling.

How it works in practice

  • Strip the 18 identifiers across patients, relatives, employers, and household members.
  • Limit geography to state level, with narrowly defined allowances for three-digit ZIPs tied to population thresholds.
  • Generalize dates to year only and treat ages over 89 as “90 or older.”
  • Keep any internal re-identification code separate, non-derivative, and undisclosed.

Safe Harbor is straightforward, auditable, and fast. The trade-off is reduced data granularity, which can affect utility for detailed analytics.

Expert Determination Method Process

Core steps

  • Scope the data use: who will access it, how it will be shared, and the environment controls.
  • Conduct a Re-identification Risk Assessment that considers plausible threats, external data sources, and recipient capabilities.
  • Apply transformations (e.g., generalization, suppression, aggregation) and test residual risk.
  • Document findings and conclusions by a qualified expert that the risk is very small for the specific context.

Techniques an expert may use

  • Statistical disclosure controls: k-anonymity, l-diversity, t-closeness.
  • Perturbation and noise infusion for numeric stability.
  • Bucketization of quasi-identifiers (e.g., age bands, coarse geographies).
  • Outlier handling for rare combinations (small-cell suppression).

Why choose Expert Determination

Expert Statistical Analysis can retain more data utility than Safe Harbor by tailoring protections to your release context. It is ideal when you need finer time, geography, or clinical detail that Safe Harbor would otherwise remove.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Comparison of De-Identification Methods

Safe Harbor

  • Strengths: clear checklist, quick implementation, easy to audit, consistent across datasets.
  • Considerations: significant loss of precision (dates to year, strict geography), less suitable for small-population or rare-condition studies.

Expert Determination

  • Strengths: higher data utility via context-aware controls; adaptable to different sharing models.
  • Considerations: requires qualified expertise, formal documentation, and periodic reassessment if contexts change.

How to choose

  • Use Safe Harbor for standardized, routine releases where coarse data suffices.
  • Use Expert Determination when you need more granular variables or when linkage risks demand a custom Re-identification Risk Assessment.

Compliance Requirements for Covered Entities

Governance and policy

  • Adopt written procedures describing your chosen method, approval workflows, and verification steps.
  • Train workforce members who prepare, review, and distribute de-identified data.

Documentation

  • Safe Harbor: maintain records showing removal of all identifiers and “no actual knowledge” determinations.
  • Expert Determination: retain the expert’s methodology, assumptions, transformations applied, risk metrics, and signed conclusion.

Controls and contracts

  • Limit access, log disclosures, and manage any re-identification codes securely.
  • Align agreements with recipients to restrict attempts at re-identification and onward sharing.

Ongoing compliance

  • Reassess risk when data, recipients, or environments change.
  • Coordinate with broader Data Privacy Regulations (state privacy laws, consumer protection) that may still apply even when data is de-identified under HIPAA.

Risks and Limitations of De-Identification

Residual risk drivers

  • Linkage with external datasets that contain overlapping attributes.
  • Small populations, rare diagnoses, or unusual treatment timelines that create unique patterns.
  • Data drift: new public data releases can alter risk over time.

Risk-reduction practices

  • Minimize quasi-identifiers, broaden categories, and suppress small cells.
  • Use controlled-access environments and data use agreements that prohibit re-identification.
  • Monitor for re-identification attempts and update controls as threat models evolve.

Even after de-identification, you should treat risk as dynamic and manage it with technical, administrative, and contractual safeguards.

Practical Applications of De-Identified Data

  • Operational analytics: throughput, readmissions, utilization, and quality metrics without exposing PHI.
  • Clinical research and real-world evidence where coarse dates or age bands still support valid inference.
  • Population health and benchmarking across markets using generalized geography.
  • Product development and AI model training with curated, risk-assessed datasets.

Conclusion

HIPAA offers two compliant paths to de-identify PHI: Safe Harbor’s checklist and Expert Determination’s risk-based approach. Choose the method that balances privacy with utility for your use case, document your process thoroughly, and maintain ongoing controls to keep re-identification risk very small.

FAQs

What are the 18 identifiers removed in the Safe Harbor Method?

  1. Names.
  2. All geographic subdivisions smaller than a state, including street address, city, county, precinct, and ZIP code; you may retain only the initial three digits of a ZIP code if the combined area has more than 20,000 people; otherwise, replace them with 000.
  3. All elements of dates (except year) directly related to an individual (e.g., birth, admission, discharge, death); ages over 89 and related elements must be aggregated into “age 90 or older.”
  4. Telephone numbers.
  5. Fax numbers.
  6. Email addresses.
  7. Social Security numbers.
  8. Medical record numbers.
  9. Health plan beneficiary numbers.
  10. Account numbers.
  11. Certificate/license numbers.
  12. Vehicle identifiers and serial numbers, including license plates.
  13. Device identifiers and serial numbers.
  14. Web URLs.
  15. IP address numbers.
  16. Biometric identifiers, including finger and voice prints.
  17. Full-face photographs and comparable images.
  18. Any other unique identifying number, characteristic, or code (except a non-derivative, internal code used solely for re-identification by the covered entity).

How does the Expert Determination Method reduce re-identification risk?

A qualified expert assesses plausible attack scenarios and applies statistical and technical controls—such as generalization, suppression, noise, and small-cell handling—until the residual risk is very small for the defined context. The expert documents methods, assumptions, and tests that support the conclusion and specifies conditions under which the data can be shared.

Can de-identified data be re-identified under HIPAA?

HIPAA requires that de-identified data carry a very small re-identification risk, but risk is not zero. If new linkable data emerges or controls are weakened, re-identification may become more plausible. Strong governance, contracts prohibiting re-identification, and periodic reassessment help sustain a very small risk over time.

What are the compliance obligations for covered entities using de-identified data?

You must follow a recognized de-identification method, keep thorough documentation (checklist evidence for Safe Harbor or expert report for Expert Determination), secure any re-identification codes, train your workforce, and bind recipients to restrictions against re-identification and onward disclosure. Reevaluate risk when data, use, or environment changes and align practices with applicable Data Privacy Regulations beyond HIPAA.

Share this article

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Related Articles