What Is HIPAA De-Identification? Definition, 18 Identifiers, and Best Practices

Check out the new compliance progress tracker


Product Pricing Demo Video Free HIPAA Training
LATEST
video thumbnail
Admin Dashboard Walkthrough Jake guides you step-by-step through the process of achieving HIPAA compliance
Ready to get started? Book a demo with our team
Talk to an expert

What Is HIPAA De-Identification? Definition, 18 Identifiers, and Best Practices

Kevin Henry

HIPAA

May 01, 2024

7 minutes read
Share this article
What Is HIPAA De-Identification? Definition, 18 Identifiers, and Best Practices

Definition of HIPAA De-Identification

HIPAA de-identification is the process of transforming protected health information so that the risk of a person being identified is very small. Once data are de-identified under the HIPAA Privacy Rule, they are no longer considered PHI and fall outside HIPAA’s use and disclosure restrictions.

HIPAA recognizes two De-Identification Standards for removing identifiers: the Safe Harbor Method and the Expert Determination Method. Both aim to minimize re-identification risk while preserving utility for analytics, research, and operations. Because risk depends on Data Use Context, your approach should match how and where the dataset will be used or shared.

Many people refer to PHI as Personal Health Information; formally, HIPAA uses Protected Health Information. Regardless of terminology, the goal is the same: apply defensible controls to prevent identity disclosure and downstream harms.

The 18 HIPAA Identifiers

Under the Safe Harbor framework, all of the following identifiers of the individual or of relatives, employers, or household members must be removed:

  • 1. Names.
  • 2. All geographic subdivisions smaller than a state (street address, city, county, precinct, ZIP code, and equivalent geocodes), except the initial three digits of a ZIP code if the combined area has more than 20,000 people; otherwise, the three digits must be replaced with 000.
  • 3. All elements of dates (except year) directly related to an individual, including birth date, admission, discharge, and death; and all ages over 89 and all related date elements for such individuals, which must be aggregated into a single category of age 90 or older.
  • 4. Telephone numbers.
  • 5. Fax numbers.
  • 6. Email addresses.
  • 7. Social Security numbers.
  • 8. Medical record numbers.
  • 9. Health plan beneficiary numbers.
  • 10. Account numbers.
  • 11. Certificate/license numbers.
  • 12. Vehicle identifiers and serial numbers, including license plate numbers.
  • 13. Device identifiers and serial numbers.
  • 14. Web Universal Resource Locators (URLs).
  • 15. Internet Protocol (IP) address numbers.
  • 16. Biometric identifiers, including finger and voice prints.
  • 17. Full-face photographs and comparable images.
  • 18. Any other unique identifying number, characteristic, or code (except a re-identification code retained internally that is not derived from personal information and is not disclosed).

Safe Harbor Method

The Safe Harbor Method requires removing all 18 HIPAA identifiers and having no actual knowledge that the remaining information could identify an individual. It is rules-based and straightforward to audit, making it effective for standardized pipelines.

How to apply Safe Harbor

  • Inventory PHI elements across all fields, free text, images, and metadata (e.g., DICOM headers, audit logs).
  • Remove or transform the 18 identifiers: redact names; generalize location to state; convert dates to year; top-code ages at 90+; suppress URLs, IPs, and device/vehicle numbers.
  • Handle quasi-identifiers in context: rare occupations, small facilities, or unusual events can still enable linkage. Suppress or generalize as needed to avoid actual knowledge of identification.
  • Validate outputs: sample records, scan for residual identifiers, and test small-cell counts (e.g., avoid publishing cells with very low counts in public reports).
  • Document the process, controls, and testing so the de-identification can be reproduced and defended.

Strengths and limitations

  • Strengths: clear checklist, consistent outcomes, and quick to operationalize.
  • Limitations: may remove more detail than necessary for some analyses and does not formally quantify Residual Risk Assessment.

Expert Determination Method

The Expert Determination Method relies on a qualified expert who applies generally accepted statistical or scientific principles to determine that the risk of re-identification is very small, given the Data Use Context. Unlike Safe Harbor, it can retain more data utility when justified by analysis.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Core expectations

  • Engage an expert with appropriate experience in de-identification, privacy risk modeling, and statistical disclosure control.
  • Define the context: who will access the data, technical and contractual safeguards, adversary capabilities, and potential external data sources.
  • Measure baseline re-identification risk using recognized techniques (e.g., k-anonymity, l-diversity, t-closeness, uniqueness, linkage simulations).
  • Apply tailored transformations (generalization, suppression, noise addition, date shifting, micro-aggregation, or differential privacy) and re-measure risk.
  • Conclude with a written determination that residual re-identification risk is very small, and retain detailed documentation of methods, assumptions, and results.

When to prefer Expert Determination

  • High-value analytics need finer granularity (e.g., detailed dates or geography) within a controlled environment.
  • Public or semi-public release requires formal evidence that Re-Identification Risk remains very small despite data richness.
  • Complex datasets (notes, images, device streams) need nuanced, field-specific transformations.

Assessing Residual Re-Identification Risks

Residual Risk Assessment estimates the likelihood that an attacker could identify a person in the released dataset by matching quasi-identifiers to external data. Sound assessments consider both data properties and attack realism.

Key factors and methods

  • Uniqueness and linkage: evaluate how many records are unique on combinations (e.g., year, state, sex, rare condition) and simulate linkage to likely external files.
  • Data Use Context: internal use with access controls may tolerate finer detail than public release; document technical, administrative, and contractual safeguards.
  • Outliers and small cells: suppress or combine rare categories and assess the impact on inference and singling-out risks.
  • Free text and images: use automated detectors plus human review to remove embedded identifiers, OCR text, and imaging metadata.
  • Ongoing monitoring: periodically re-run risk analyses as new external datasets become available or as uses change.

Implementing Best Practices

Effective programs combine technical controls with governance. The goal is to lower Re-Identification Risk while preserving fitness for purpose.

Technical practices

  • Minimize and generalize: share only needed fields; bin ages, round measures, and shift dates consistently within subjects.
  • Suppress high-risk values: remove rare categories or replace with broader groupings; top- or bottom-code extremes.
  • Add calibrated noise where appropriate to protect aggregates while keeping analytic validity.
  • Automate PHI detection for structured and unstructured data, then layer human QA focused on edge cases.

Governance and process

  • Maintain a data inventory and classification; define approval gates for releases and recipients.
  • Use Data Use Agreements to bind recipients to purpose limitations, reuse restrictions, and re-identification prohibitions.
  • Log access, enforce role-based permissions, and review audit trails for anomalous behavior.
  • Schedule periodic reviews of methods and outcomes; re-assess when the Data Use Context changes.

Regulatory Compliance Considerations

De-identified data are not PHI under HIPAA; however, contractual commitments and other privacy or consumer protection laws may still govern collection, sharing, and use. Align policies so that technical de-identification is matched by enforceable terms and operational controls.

Do not confuse de-identified data with a Limited Data Set. A Limited Data Set may retain certain dates and broader geography but remains PHI and requires a Data Use Agreement. Under HIPAA, a covered entity may keep an internal re-identification code that is not derived from personal information and is never disclosed alongside the dataset.

Conclusion

HIPAA de-identification reduces identity disclosure risk so data can be used responsibly. Choose the Safe Harbor Method for clear, rules-based removal of the 18 identifiers, or the Expert Determination Method when you need more detail under managed risk. In every case, anchor decisions in Residual Risk Assessment and the intended Data Use Context, and back them with strong governance.

FAQs.

What is the Safe Harbor Method under HIPAA?

The Safe Harbor Method is a rules-based approach that removes all 18 HIPAA identifiers and requires that you have no actual knowledge the remaining data could identify someone. When these conditions are met, the dataset is considered de-identified under HIPAA.

How does expert determination reduce re-identification risk?

Expert determination employs a qualified expert to analyze the dataset and its Data Use Context, quantify re-identification risk, apply targeted transformations (such as generalization, suppression, or noise), and confirm in writing that the residual risk is very small. This allows retention of more useful detail while controlling risk.

What are the consequences of incomplete de-identification?

If identifiers or high-risk quasi-identifiers remain, you may inadvertently disclose PHI, triggering privacy incidents, regulatory exposure, contractual breach, and reputational harm. Incomplete de-identification can also enable linkage attacks that compromise individuals and undermine data-sharing programs.

How often should de-identification practices be reviewed?

Review methods at least annually, and sooner whenever the Data Use Context, external data landscape, or analytic needs change. Re-run risk assessments before each major release, update controls accordingly, and document decisions to demonstrate ongoing compliance.

Share this article

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Related Articles