How to De-Identify PHI Under the HIPAA Privacy Rule

Check out the new compliance progress tracker


Product Pricing Demo Video Free HIPAA Training
LATEST
video thumbnail
Admin Dashboard Walkthrough Jake guides you step-by-step through the process of achieving HIPAA compliance
Ready to get started? Book a demo with our team
Talk to an expert

How to De-Identify PHI Under the HIPAA Privacy Rule

Kevin Henry

HIPAA

May 02, 2024

7 minutes read
Share this article
How to De-Identify PHI Under the HIPAA Privacy Rule

De-identifying Protected Health Information allows you to share and analyze health data while honoring HIPAA Regulatory Guidance and modern Data Privacy Standards. HIPAA provides two approved pathways—Safe Harbor and Expert Determination. This guide shows you how each works, which identifiers to remove, how to assess Re-Identification Risk, and how to document Covered Entity Compliance for trustworthy, useful data.

Safe Harbor Method Requirements

Under Safe Harbor, you must remove specified identifiers from PHI and have no actual knowledge that remaining information could identify an individual. This bright-line approach is straightforward to audit and implement but requires disciplined execution across structured fields, free text, and metadata.

Core criteria

Delete all 18 HIPAA-specified identifiers, treat ages over 89 as 90-or-older, and follow ZIP code and date rules precisely. After de-identification, you may assign a non-derived re-identification code solely for internal linkage, stored separately from the dataset.

Practical steps

  • Inventory data elements and map each to HIPAA’s identifier categories.
  • Apply standardized suppression and generalization for dates, geography, and small cells.
  • Scrub free text and image metadata using automated detection plus human review.
  • Record checks and attest that you have no actual knowledge of identifiability.

Common pitfalls

Progress notes, device logs, and image files often contain hidden identifiers. Rare combinations (for example, an unusual diagnosis within a small area) can enable inference even without direct identifiers; widen categories or aggregate counts to reduce distinguishability.

Expert Determination Method Overview

Expert Determination relies on an independent professional applying generally accepted statistical and scientific principles to conclude that the risk of re-identification is very small, considering reasonably available external data. The expert documents methodology, tests, assumptions, and results in a written determination.

Who qualifies as an expert

Select someone with proven experience in Statistical De-Identification, linkage analysis, and disclosure control across health datasets. Their approach should reflect Expert Statistical Analysis and be defensible under HIPAA Regulatory Guidance.

Typical techniques

Experts combine generalization and suppression with models such as k-anonymity, l-diversity, and t-closeness, plus perturbation, noise addition, or synthetic data generation. These tools balance data utility with a very small Re-Identification Risk.

When to use it

Choose Expert Determination when Safe Harbor would remove critical detail—such as month-level dates, partial geography, device attributes, or event sequences needed for clinical quality, research, or product analytics.

Identifiers to Remove

For Safe Harbor de-identification, remove these identifiers wherever they appear, including free text, images, metadata, and logs:

  • Names.
  • Geographic subdivisions smaller than a state, including street address, city, county, precinct, and ZIP code (except the initial three digits of a ZIP code if the combined area has more than 20,000 people; otherwise replace with 000).
  • All elements of dates (except year) for dates directly related to an individual—such as birth, admission, discharge, and death dates—and all ages over 89, which must be grouped as 90 or older.
  • Telephone numbers.
  • Fax numbers.
  • Email addresses.
  • Social Security numbers.
  • Medical record numbers.
  • Health plan beneficiary numbers.
  • Account numbers.
  • Certificate/license numbers.
  • Vehicle identifiers and serial numbers, including license plate numbers.
  • Device identifiers and serial numbers.
  • Web URLs.
  • IP address numbers.
  • Biometric identifiers, including finger and voice prints.
  • Full-face photographic images and any comparable images.
  • Any other unique identifying number, characteristic, or code (except an internal re-identification code that is not derived from PHI and is kept separately).

Notes on dates and ages

If exact dates are essential, consider Expert Determination to retain month or quarter safely. For very young or very old populations, broaden categories to avoid unique combinations that could single out individuals.

Risk Assessment for Re-Identification

Whether you use Safe Harbor or Expert Determination, evaluate how easily the dataset could be linked to individuals. Assess the data, plausible attack strategies, and the recipient’s environment to balance utility and protection.

Contextual factors

  • Distinguishability: Are records unique given quasi-identifiers like age, sex, and location?
  • Replicability: Are attributes stable over time, aiding record linkage?
  • Availability: Are similar datasets publicly or commercially available for matching?
  • Controls: Do contracts, access limits, and security measures reduce residual risk?

Quantitative checks

Use k-anonymity thresholds, small-cell and outlier analysis, population uniqueness modeling, and membership disclosure tests. Iterate with suppression and generalization until residual risk is very small and analytic goals are preserved.

Quality assurance

Automate scanning for identifiers, perform targeted human review, and verify that any linkable codes are not derived from Protected Health Information. Log each change to maintain a defensible audit trail.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Documentation and Compliance

Clear, durable records demonstrate Covered Entity Compliance and support audits.

For Safe Harbor

  • Field-level mapping to the 18 identifiers and the actions taken.
  • Evidence of free-text scrubbing, image/metadata cleansing, and small-cell handling.
  • Attestation that you have no actual knowledge of identifiability after removal.
  • Governance for re-identification codes, including separate storage and restricted access.

For Expert Determination

  • Signed expert report detailing methods, tests, assumptions, and the “very small risk” conclusion.
  • Documented controls and permitted uses the expert relied on, plus monitoring plans.
  • Versioning for datasets and re-reviews when data, context, or external data sources change.

Operational safeguards

  • Policies, training, and audits aligned with Data Privacy Standards.
  • Contractual terms prohibiting re-identification and onward sharing without approval.
  • Access controls, logging, retention, and incident response procedures.

Use of De-Identified Data

Properly de-identified data are not PHI under HIPAA and may be used or disclosed without patient authorization. You can support research, analytics, quality improvement, benchmarking, and product development while honoring ethical norms and contractual obligations.

Maintain utility with clear data dictionaries, derivation logic, and cohort definitions. If linkage is needed, use a non-derived re-identification code and protect the key in a segregated, access-controlled system.

Differences Between De-Identification Methods

Scope and certainty

Safe Harbor is rules-based and easy to verify but can reduce granularity (for example, dates and geography). Expert Determination offers flexibility and greater utility, anchored by Expert Statistical Analysis and governance to keep Re-Identification Risk very small.

Cost, speed, and scalability

Safe Harbor scales for routine feeds with consistent schemas. Expert Determination requires upfront expertise and modeling but may preserve more analytical value, reducing repeated suppressions over time.

Choosing the right approach

Use Safe Harbor for standard reporting where coarse time and location suffice. Choose Expert Determination when you need specificity—seasonality, care pathways, or community-level interventions—without compromising privacy.

Conclusion

Effective de-identification under the HIPAA Privacy Rule means choosing the right pathway, executing controls rigorously, and documenting every step. By aligning with HIPAA Regulatory Guidance, applying Statistical De-Identification techniques, and continuously evaluating risk, you protect privacy while enabling meaningful insight from health data.

FAQs

What is the Safe Harbor method under HIPAA?

Safe Harbor requires removing 18 specific identifiers from a dataset and confirming you have no actual knowledge that the remaining information could identify an individual. It is deterministic and simpler to audit but may limit data utility by eliminating granular dates and geography.

How does the Expert Determination method work?

An independent expert applies accepted statistical and scientific techniques to demonstrate that the risk of re-identification is very small, given reasonably available external data and the controls in place. The expert issues written documentation describing methods, tests, and conclusions.

What types of identifiers must be removed for de-identification?

HIPAA lists 18 categories, including names; detailed geography (smaller than a state); elements of dates (except year) and ages over 89; contact numbers and emails; Social Security, medical record, and account numbers; license, vehicle, and device identifiers; URLs and IP addresses; biometric identifiers; full-face images; and any other unique identifying number or code not permitted for internal re-identification.

How is re-identification risk assessed?

Assess both context and content: measure uniqueness and small cells, model linkage and membership risks, and consider external data availability and recipient safeguards. Iterate with suppression, generalization, and other techniques until residual risk is very small while maintaining analytical usefulness.

Share this article

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Related Articles