How to De‑Identify PHI Under HIPAA: Process, Documentation, and Oversight

Kevin Henry

HIPAA

May 04, 2024

7 minutes read

Share this article

If you handle Protected Health Information, knowing how to de‑identify PHI under HIPAA is essential for safe analytics and sharing. This guide explains the approved methods, the Safe Harbor Rule details, the Expert Determination Standard, and the documentation and oversight you need to keep risk very small and stay audit‑ready.

HIPAA-Compliant De-Identification Methods

The two approved paths

HIPAA permits two de‑identification approaches: the Safe Harbor Rule, which removes specified identifiers, and the Expert Determination Standard, which relies on a qualified expert’s analysis that the risk of re‑identification is very small. You may use either method; both are valid when implemented correctly.

Choosing a method

Use Safe Harbor when your data can tolerate removing direct identifiers and detailed dates/geography without harming utility.
Use Expert Determination when you need to preserve more detail (for example, granular dates or geography) and can justify it through a documented Risk Assessment.
For iterative projects, start with Safe Harbor, then consider an Expert Determination to regain utility while maintaining a very small risk profile.

Core process

Inventory PHI sources and map data flows end‑to‑end.
Select de‑identification method and define transformation rules.
Implement controls, validate outputs, and capture De‑Identification Documentation.
Operate under governance with Compliance Monitoring and periodic re‑evaluation.

Safe Harbor Identifier Removal

Under the Safe Harbor Rule, you must remove specified identifiers and have no actual knowledge that remaining information could identify an individual alone or in combination with other data.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

The identifiers to remove (all must be addressed)

Names.
All geographic subdivisions smaller than a state (street address, city, county, precinct, and ZIP); you may retain only the first three ZIP digits if the combined area has more than 20,000 people; otherwise replace with 000.
All elements of dates (except year) directly related to an individual (for example, birth, admission, discharge, death, appointment dates); ages over 89 and related year elements must be grouped as age 90 or older.
Telephone numbers.
Fax numbers.
Email addresses.
Social Security numbers.
Medical record numbers.
Health plan beneficiary numbers.
Account numbers.
Certificate and license numbers.
Vehicle identifiers and serial numbers, including license plates.
Device identifiers and serial numbers.
Web URLs.
IP address numbers.
Biometric identifiers (for example, fingerprints, voiceprints).
Full‑face photos and comparable images.
Any other unique identifying number, characteristic, or code (unless permitted as a non‑derivable re‑identification code kept separately).

Implementation tips

Structured fields: drop or generalize prohibited values; ensure date logic applies to all event timestamps and age.
Free text and images: use automated scanning plus manual review to redact identifiers and metadata (for example, DICOM headers, EXIF tags).
Quality assurance: sample records pre/post‑processing, verify population thresholds for three‑digit ZIPs, and document “no actual knowledge” checks.

Common pitfalls

Leaving identifiers inside notes, scanned forms, or filenames.
Overlooking small‑cell counts that enable singling out in narrow cohorts.
Retaining reversible “codes” derived from PHI rather than random, non‑derivable tokens.

Expert Determination Analysis

Under the Expert Determination Standard, a qualified expert applies statistical and scientific principles and determines, with supporting analysis, that the risk of re‑identification is very small given the data and its release context.

What the expert evaluates

Quasi‑identifiers (for example, granular dates, geography, rare diagnoses) and linkability to reasonably available external data.
Release context: who will access the data, data use purpose, contractual controls, and technical environment.
Adversary models and attack feasibility, including singling out, linkage, and inference risks.

Risk reduction techniques

Generalization and suppression to achieve k‑anonymity, l‑diversity, or t‑closeness thresholds appropriate to the data and setting.
Noise addition, rounding, date shifting, sampling, and aggregation; or query‑based protections (for example, minimum cell sizes).
Non‑derivable tokenization with strict separation of any re‑identification keys.

Deliverables and maintenance

A written opinion explaining methods, assumptions, Risk Assessment results, residual risk, and why it is “very small.”
Scope definition: datasets covered, permitted uses, controls required, and an effective date with re‑evaluation triggers.
Repeat assessments when data, context, or external data availability changes.

Documentation and Record-Keeping

De‑Identification Documentation essentials

Policies and procedures (SOPs) for Safe Harbor and Expert Determination workflows.
Data inventories, flow diagrams, and transformation rules.
Validation and quality checks, including samples and metrics.
Expert opinion reports, reviewer qualifications, and approval records.
Access logs, distribution lists, and Data Use Agreements or internal data‑sharing terms.

Retention and access

Maintain required HIPAA documentation for at least six years from creation or last effective date.
Store records securely with role‑based access; keep re‑identification keys physically and logically separate.

Change and incident records

Version control every rule change or model update, with rationale and approver sign‑off.
Document suspected or confirmed re‑identification events and corrective actions.

Oversight and Compliance Procedures

Governance

Designate a privacy lead and a data governance committee to approve methods and monitor operations.
Define roles for requesters, data engineers, reviewers, and approvers; enforce segregation of duties.

Compliance Monitoring

Run periodic audits, sampling, and red‑team tests to probe for re‑identification risks.
Track KPIs (for example, number of releases reviewed, policy exceptions, time‑to‑remediation).
Provide training and maintain a sanctions policy for violations.

Third‑party oversight

When vendors handle PHI prior to de‑identification, execute Business Associate Agreements and verify controls.
For de‑identified data recipients, use contracts that prohibit re‑identification and onward sharing without approval.

Re-Identification Safeguards

Administrative and contractual controls

Adopt written prohibitions on re‑identification and attempts to contact individuals.
Limit access to those with a defined need; log and review access regularly.
Include clear Data Re‑Identification Controls in agreements, with remedies and audit rights.

Technical controls

Use non‑derivable random tokens; never expose the re‑identification mechanism.
Separate and encrypt any linkage keys; restrict to a minimal number of custodians.
Apply suppression and minimum cell thresholds to prevent small‑cell disclosures.

Use and Security of De-Identified Data

Security expectations

Even though de‑identified data is not PHI, protect it with strong security: encryption in transit and at rest, access controls, and monitoring.
Classify de‑identified datasets and manage their lifecycle from creation through disposition.

Use data‑sharing terms that restrict purpose, access, re‑distribution, and re‑identification attempts.
Re‑assess risk before linking de‑identified data with other datasets to avoid the “mosaic effect.”

Conclusion

Effective de‑identification hinges on choosing the right method, executing precise transformations, and backing everything with rigorous De‑Identification Documentation, oversight, and Compliance Monitoring. With sound controls and periodic re‑assessment, you can unlock data value while keeping the risk of re‑identification very small.

FAQs

What are the two HIPAA-approved de-identification methods?

HIPAA allows two methods: the Safe Harbor Rule, which removes specified identifiers and requires no actual knowledge of identifiability, and the Expert Determination Standard, where a qualified expert documents that the re‑identification risk is very small given the data and its context.

How is the Safe Harbor method implemented?

You remove all 18 designated identifiers, generalize dates to year and geography to above the ZIP‑3 level when needed, address free‑text and metadata, validate results, and document that you have no actual knowledge the remaining information could identify an individual, alone or in combination.

What documentation is required for Expert Determination?

You need a written expert opinion describing qualifications, methods, assumptions, Risk Assessment, applied transformations, residual risk rationale, covered datasets, allowed uses, required controls, effective date, and conditions that trigger re‑evaluation.

How are de-identification processes monitored for compliance?

Establish governance with defined roles, perform periodic audits and sampling, track KPIs, enforce contractual prohibitions on re‑identification, monitor access, train users, and document findings and remediation as part of ongoing Compliance Monitoring.

Table of Contents

HIPAA-Compliant De-Identification Methods
Safe Harbor Identifier Removal
Expert Determination Analysis
Documentation and Record-Keeping
Oversight and Compliance Procedures
Re-Identification Safeguards
- Administrative and contractual controls
- Technical controls
Use and Security of De-Identified Data
FAQs

Share this article

How to De‑Identify PHI Under HIPAA: Process, Documentation, and Oversight