De-Identifying PHI Correctly: HIPAA Privacy Rule Standards and Common Pitfalls

Product Pricing
Ready to get started? Book a demo with our team
Talk to an expert

De-Identifying PHI Correctly: HIPAA Privacy Rule Standards and Common Pitfalls

Kevin Henry

HIPAA

March 02, 2025

6 minutes read
Share this article
De-Identifying PHI Correctly: HIPAA Privacy Rule Standards and Common Pitfalls

Expert Determination Method Overview

Under the HIPAA Privacy Rule, Expert Determination is a pathway for de-identifying Protected Health Information when simple identifier removal is not sufficient for your use case. A qualified expert conducts an expert risk assessment and applies statistical de-identification techniques to conclude that the risk of re-identification is very small for the anticipated recipients and data uses.

In practice, you begin by scoping your release: who will access the data, for what purpose, and with what auxiliary data they might possess. The expert then models threats and selects controls—generalization, suppression, aggregation, perturbation, or synthetic data—consistent with data de-identification standards and the minimum necessary principle.

Risk is quantified and tested using accepted methods (for example, k-anonymity, l-diversity, or disclosure risk simulations), but HIPAA is method-agnostic. What matters is a defensible conclusion that re-identification risk is very small, backed by documented methods, assumptions, and results that you can explain and reproduce for privacy rule compliance.

When a re-identification code is needed, the expert ensures it is not derived from the underlying identifiers, is meaningless outside your environment, and that the mapping file is segregated and protected with strict access controls.

Safe Harbor Method Requirements

The Safe Harbor method relies on identifier removal and a “no actual knowledge” check. You must remove all specified identifiers from each record, then ensure you do not actually know the remaining data could identify a person, alone or in combination with reasonably available information.

Safe Harbor is deterministic and well-suited to routine disclosure when analytic utility does not require fine-grained dates or locations. You should plan for edge cases—rare conditions, unusual occupations, or small geographies—because these can still create re-identification risk even after identifier removal, triggering the “actual knowledge” prohibition.

Use Safe Harbor when standardized outputs and rapid release cycles matter, and consider Expert Determination when you need higher data utility or nuanced protections tailored to your environment.

Identifiers Required for Removal

For Safe Harbor, remove the following identifiers about the individual or relatives, employers, or household members:

  • Names.
  • All geographic subdivisions smaller than a state (street address, city, county, precinct, ZIP code, and geocodes), except the initial three digits of a ZIP code if the combined area contains more than 20,000 people; otherwise, use 000.
  • All elements of dates (except year) directly related to an individual, including birth, admission, discharge, and death dates; for age, aggregate all ages over 89 into a single 90-or-older category.
  • Telephone numbers.
  • Fax numbers.
  • Email addresses.
  • Social Security numbers.
  • Medical record numbers.
  • Health plan beneficiary numbers.
  • Account numbers.
  • Certificate/license numbers.
  • Vehicle identifiers and serial numbers, including license plate numbers.
  • Device identifiers and serial numbers.
  • Web URLs.
  • IP address numbers.
  • Biometric identifiers, including finger and voice prints.
  • Full-face photographs and comparable images.
  • Any other unique identifying number, characteristic, or code (except a permitted re-identification code kept separately).

Risks of Re-Identification

Re-identification risk arises when quasi-identifiers—such as age, gender, and coarse geography—link to external data (voter rolls, social media, or news). Small cell sizes, outliers, or rare diagnosis–procedure combinations can make individuals stand out even after identifier removal.

Temporal and location trails also matter. Detailed timestamps, repeated visits, or fine-grained location attributes enable linkage attacks. Rich modalities—images, genomic markers, device metadata, or narrative text—often leak identity through embedded details, EXIF tags, or distinctive clinical events.

To reduce re-identification risk, combine technical controls (generalization, suppression, micro-aggregation, noise infusion, rounding, top-coding, or synthetic data) with administrative measures (access controls, acceptable use terms, and monitoring). Defense in depth ensures an attacker must defeat multiple layers, not just statistical protections.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Documentation and Compliance

Good documentation proves privacy rule compliance and supports audits. For Expert Determination, retain the expert’s qualifications, methodology, risk metrics, assumptions, transformations applied, and the final conclusion that risk is very small for your intended release and recipients.

For Safe Harbor, maintain the identifier removal checklist, transformation scripts, and the “no actual knowledge” evaluation. Record the dataset version, purpose, recipients, date of release, and any use limitations. If you use a re-identification code, document its design and protections separately.

Operationalize compliance with policies, role-based access, data retention schedules, and regular reviews. Keep required documentation for the appropriate retention period (commonly six years for HIPAA-required documentation) and align controls with your broader information security and data governance program.

Common Mistakes in De-Identification

Leaving PHI in free-text notes is the top error; narrative fields often contain names, locations, and exact dates. Another frequent mistake is releasing small-cell tables or rare-condition slices that enable easy singling out.

Technical missteps include hashing direct identifiers without salts and calling the result “de-identified,” reusing stable pseudonyms across datasets, or forgetting metadata in DICOM images, PDFs, and spreadsheets. Each can leak identity despite apparent identifier removal.

Process gaps—no documented “actual knowledge” check, inconsistent application of rules across refreshes, or releasing multiple overlapping snapshots—compound risk and undermine your de-identification controls.

Best Practices for HIPAA Compliance

Start with data inventory and purpose limitation: collect and share only what you need. Choose the appropriate pathway—Safe Harbor for standardized identifier removal or Expert Determination for higher-utility releases that still keep re-identification risk very small.

Implement layered protections. Use statistical de-identification controls, lock down re-identification codes, and enforce access governance with least privilege, auditing, and user agreements. Test releases for residual risk before disclosure and re-test on each refresh.

Sustain the program through training, reproducible pipelines, and change management. Align your practices with data de-identification standards, maintain clear SOPs, and monitor for drift as new external data sources or analytical capabilities emerge.

Conclusion

De-identifying PHI correctly requires more than mechanical identifier removal. By choosing the right method, targeting re-identification risk, and documenting each step, you balance data utility with privacy rule compliance and avoid the common pitfalls that lead to preventable disclosures.

FAQs.

What are the two methods for de-identifying PHI under HIPAA?

HIPAA offers two methods: Safe Harbor, which requires removal of specified identifiers and no actual knowledge of re-identification, and Expert Determination, where a qualified expert uses statistical and scientific techniques to conclude that the re-identification risk is very small.

What identifiers must be removed in the Safe Harbor method?

You must remove 18 categories, including names; geographies smaller than a state (with ZIP code rules); all date elements except year and ages over 89 (top-coded to 90+); phone, fax, and email; SSN; medical record and health plan numbers; account and license numbers; vehicle and device IDs; URLs and IP addresses; biometric identifiers; full-face images; and any other unique identifying code.

How can re-identification risk be minimized?

Combine technical controls—generalization, suppression, top-coding, micro-aggregation, and noise—with administrative measures such as access controls, user agreements, monitoring, and release approvals. Test risk before and after transformations and re-evaluate whenever you refresh or link data.

Why is documentation important in the de-identification process?

Documentation demonstrates privacy rule compliance, enables reproducibility, and supports audits. It records your identifier removal steps or expert risk assessment, the rationale for chosen controls, release conditions, and version history, reducing operational and legal risk.

Share this article

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Related Articles