HIPAA De-Identification: Data Elements That Must Be Removed (18 Identifiers)

Kevin Henry

HIPAA

May 03, 2024

8 minutes read

Share this article

HIPAA De-Identification Process

HIPAA de-identification removes or obscures elements that could link a record to a person. Under the HIPAA Privacy Rule, your goal is to share data while protecting individuals’ Protected Health Information (PHI) and minimizing re-identification risk.

Step-by-step workflow

Profile your data: inventory fields, free text, images, and derived attributes that may reveal identity.
Choose a method: Safe Harbor for rule-based unique identifiers removal, or Expert Determination for statistical de-identification tailored to context.
Transform: suppress, generalize, or perturb risky fields; redact free text; replace direct identifiers with tokens kept separately.
Validate: test for linkage risks, small-cell issues, and outliers; confirm Safe Harbor compliance if using that path.
Document: record decisions, transformations, and residual risk; maintain an audit trail for regulators and stakeholders.
Govern: control access, train users, and monitor for drift as new data arrives or data privacy regulations evolve.

Selecting the right method

Use Safe Harbor when you can remove all specified identifiers and still meet your use-case needs. Use Expert Determination when you need more data utility (for example, granular dates or geography) and can justify a “very small” re-identification risk with expert analysis and controls.

Documentation and governance

Maintain data maps, transformation logic, testing results, and retention rules. Clear governance helps you demonstrate compliance, replicate your approach, and adapt quickly as requirements change.

The 18 HIPAA Identifiers

To meet Safe Harbor, you must remove these 18 identifiers from PHI before sharing or publishing the data:

Names.
All geographic subdivisions smaller than a state (street address, city, county, precinct, ZIP code, and equivalent geocodes), except you may keep only the initial three digits of a ZIP code if the combined area for those three digits exceeds 20,000 people; otherwise use 000.
All elements of dates (except year) directly related to an individual (e.g., birth, admission, discharge, death), and all ages over 89 and related date elements; ages 90+ may be grouped as “90 or older.”
Telephone numbers.
Fax numbers.
Email addresses.
Social Security numbers.
Medical record numbers.
Health plan beneficiary numbers.
Account numbers.
Certificate or license numbers.
Vehicle identifiers and serial numbers, including license plate numbers.
Device identifiers and serial numbers.
Web URLs.
Internet Protocol (IP) address numbers.
Biometric identifiers, including finger and voice prints.
Full-face photographs and any comparable images.
Any other unique identifying number, characteristic, or code (except a re-identification code retained internally under permitted conditions).

Notes on dates and ZIP codes

Keep only the year for event dates; suppress or generalize day and month.
Aggregate ages above 89 to limit identifiability of the oldest individuals.
Apply the three-digit ZIP code rule strictly to avoid small-population disclosure.

Safe Harbor Method Requirements

Safe Harbor is a rule-based path that enables you to share data after unique identifiers removal. It is straightforward, widely understood, and defensible when implemented carefully.

Core requirements

Remove all 18 HIPAA identifiers listed above from the dataset (including within free text and images).
Generalize dates to year only and handle ages 90+ as a single category.
Apply the three-digit ZIP code population threshold; replace disallowed ZIP codes with 000.
Ensure you do not have actual knowledge that remaining data could identify an individual alone or in combination.
If you assign a code for internal re-identification, keep the key separately and do not derive it from removed identifiers.

Implementation tips

Scan and redact identifiers embedded in notes, images, and metadata.
Use consistent tokenization so downstream users can analyze longitudinal patterns without access to identities.
Perform small-cell analysis and aggregation to prevent unique records in narrow cohorts.
Record Safe Harbor compliance checks, including proofs for ZIP code and age generalization choices.

Expert Determination Method Overview

Expert Determination relies on an expert’s documented finding, using statistical and scientific principles, that the likelihood of re-identification is very small. This statistical de-identification path preserves more utility when Safe Harbor removal would be too destructive.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

What experts evaluate

Threat models: who could attempt re-identification, with what auxiliary data.
Data risk: uniqueness, outliers, sparsity, and linkability across releases.
Context controls: access restrictions, agreements, technical safeguards, and user training.
Residual risk: quantitative thresholds (e.g., k-anonymity levels) that meet organizational risk appetite.

Techniques commonly applied

Generalization and suppression to reach k-anonymity, l-diversity, or t-closeness.
Pseudonymization, hashing, and salted keys for internal linkage.
Noise addition, binning, differential privacy-style perturbations where appropriate.
Free-text de-identification with NLP plus manual QA for edge cases.

Documentation to retain

Expert’s qualifications, methodology, metrics, and final determination statement.
Transformations applied, validation results, and any usage or access constraints.
Ongoing monitoring plan to reassess re-identification risk as environments change.

Limited Data Set Characteristics

A Limited Data Set (LDS) is still PHI, but certain fields may remain when used for research, public health, or health care operations under a Data Use Agreement. It is not the same as de-identification.

What may remain in an LDS

City, state, and ZIP code (beyond the three-digit Safe Harbor rule).
All relevant dates (e.g., dates of birth, admission, discharge, death).
Other non-direct fields necessary for the approved purpose, provided direct identifiers are removed.

What must be removed from an LDS

Names and full street addresses.
Contact information (telephone, fax, email), account numbers, and government IDs (e.g., SSN).
Medical record numbers, health plan beneficiary numbers, certificate/license numbers.
Vehicle and device identifiers, URLs, IP addresses, biometric identifiers, and full-face photos or comparable images.

Data Use Agreement essentials

Permitted uses/disclosures and who may receive/use the data.
Prohibitions on re-identification and contacting individuals.
Safeguards, breach reporting, flow-down obligations, and data return/destruction terms.

Challenges in Data De-Identification

Even after removing obvious identifiers, re-identification risk can persist through rare combinations of quasi-identifiers, small subpopulations, or linkage with external datasets. You should test and mitigate these risks proactively.

Common pain points

High-dimensional EHR data creating unique patient “fingerprints.”
Geotemporal precision (fine-grained locations and timestamps) enabling linkage attacks.
Free-text notes, images, and metadata harboring latent identifiers.
Repeated releases that allow adversaries to triangulate identities over time (the “mosaic effect”).

Mitigation strategies

Data minimization: share only fields that serve the stated purpose.
Aggregation and binning to reduce uniqueness while preserving analytic value.
Noise injection or rounding for sensitive counts and time fields.
Rigorous QA of redactions and automated detectors for residual identifiers.
Contractual, technical, and training controls aligned with broader data privacy regulations.

Ensuring HIPAA Compliance

Operationalize de-identification with a repeatable program. Establish roles, controls, and evidence so you can demonstrate compliance and maintain data utility over time.

Operational controls

Policies that define when to use Safe Harbor vs. Expert Determination and who approves each release.
Standard transformation playbooks for unique identifiers removal and quasi-identifier handling.
Access controls, logging, and retention aligned with the HIPAA Privacy Rule and your security program.
Vendor governance with Business Associate Agreements and clear downstream obligations.

Testing and assurance

Pre-release risk assessments and small-cell checks; post-release monitoring for drift.
Periodic audits to verify Safe Harbor compliance or to refresh Expert Determinations as context changes.
Training for data stewards and recipients to prevent misuse and unintended re-identification.

Conclusion and key takeaways

Safe Harbor requires removing the 18 HIPAA identifiers and meeting its population/date rules.
Expert Determination uses statistical de-identification to achieve a very small re-identification risk with documented safeguards.
Limited Data Sets allow certain dates and geography under a Data Use Agreement; they are still PHI.
Sustained compliance depends on sound governance, testing, and clear documentation.

FAQs.

What are the 18 identifiers that must be removed for HIPAA de-identification?

The 18 identifiers are: names; all geography smaller than a state (with the three-digit ZIP code rule); all elements of dates (except year) and ages over 89; telephone numbers; fax numbers; email addresses; Social Security numbers; medical record numbers; health plan beneficiary numbers; account numbers; certificate/license numbers; vehicle identifiers and serial numbers (including license plates); device identifiers and serial numbers; URLs; IP addresses; biometric identifiers; full-face photos and comparable images; and any other unique identifying number, characteristic, or code (except a permitted internal re-identification code).

How does the Safe Harbor method ensure data privacy?

Safe Harbor ensures privacy by removing all direct identifiers on a fixed list, generalizing dates to year, applying the three-digit ZIP code population rule, and requiring that you have no actual knowledge that the remaining data could identify someone. When these conditions and validation checks are met, you achieve Safe Harbor compliance.

What is the difference between Safe Harbor and Expert Determination methods?

Safe Harbor is rule-based and prescriptive: remove the 18 identifiers and follow ZIP/date rules. Expert Determination is risk-based: a qualified expert applies statistical de-identification techniques and contextual controls to show the re-identification risk is very small, allowing more granular data when justified.

Can limited data sets include some identifiers under HIPAA?

Yes. A Limited Data Set may include city, state, ZIP code, and full dates (e.g., admission or discharge dates), but it must exclude direct identifiers like names, full street addresses, contact details, account and record numbers, biometric identifiers, and full-face images. Use requires a Data Use Agreement that prohibits re-identification and mandates safeguards.

Table of Contents

HIPAA De-Identification Process
The 18 HIPAA Identifiers
- Notes on dates and ZIP codes
Safe Harbor Method Requirements
- Core requirements
- Implementation tips
Expert Determination Method Overview
Limited Data Set Characteristics
Challenges in Data De-Identification
- Common pain points
- Mitigation strategies
Ensuring HIPAA Compliance
FAQs.

Share this article

HIPAA De-Identification: Data Elements That Must Be Removed (18 Identifiers)

HIPAA De-Identification Process

Step-by-step workflow

Selecting the right method

Documentation and governance

The 18 HIPAA Identifiers

Notes on dates and ZIP codes

Safe Harbor Method Requirements

Core requirements

Implementation tips

Expert Determination Method Overview

Ready to simplify HIPAA compliance?

What experts evaluate

Techniques commonly applied

Documentation to retain

Limited Data Set Characteristics

What may remain in an LDS

What must be removed from an LDS

Data Use Agreement essentials

Challenges in Data De-Identification

Common pain points

Mitigation strategies

Ensuring HIPAA Compliance

Operational controls

Testing and assurance

Conclusion and key takeaways

FAQs.

What are the 18 identifiers that must be removed for HIPAA de-identification?

How does the Safe Harbor method ensure data privacy?

What is the difference between Safe Harbor and Expert Determination methods?

Can limited data sets include some identifiers under HIPAA?

Ready to simplify HIPAA compliance?

Dental Compliance Training for Your Team: OSHA, HIPAA & Infection Control Made Simple

Comparing Popular HIPAA-Compliant Telehealth Tools

Top Cloud Storage Mistakes That Can Lead to HIPAA Violations