HIPAA Individual Identifiers: Direct vs. Indirect and How to De‑Identify Data

Check out the new compliance progress tracker


Product Pricing Demo Video Free HIPAA Training
LATEST
video thumbnail
Admin Dashboard Walkthrough Jake guides you step-by-step through the process of achieving HIPAA compliance
Ready to get started? Book a demo with our team
Talk to an expert

HIPAA Individual Identifiers: Direct vs. Indirect and How to De‑Identify Data

Kevin Henry

HIPAA

January 24, 2024

7 minutes read
Share this article
HIPAA Individual Identifiers: Direct vs. Indirect and How to De‑Identify Data

Direct Identifiers in HIPAA

Under the Health Insurance Portability and Accountability Act, direct identifiers are data elements that, by themselves, point to a specific individual or their household. When any direct identifier appears alongside health information, the record is considered protected health information (PHI).

What counts as a direct identifier

Direct identifiers include obvious contact details and government or system-assigned numbers that uniquely tag a person. They also include images and device or vehicle numbers that consistently single someone out, even without a name.

  • Examples: name, full street address, phone and fax numbers, email, Social Security number, medical record number, health plan beneficiary number, account and certificate/license numbers.
  • Other examples: vehicle and device identifiers/serials, full-face photographs or comparable images, URLs, IP addresses, and other unique codes that identify an individual.

You must remove direct identifiers before sharing data externally unless another HIPAA pathway applies. Doing so transforms the dataset toward de-identified health information, provided you also manage remaining risks.

Indirect Identifiers and Their Significance

Indirect identifiers, or quasi-identifiers, do not name someone outright but can identify them when combined. On their own they seem harmless; together they can recreate a narrow “fingerprint” that increases re-identification risk.

Common indirect identifiers

  • Dates related to care (admission, discharge, service), birth and death dates, and age.
  • Geographic detail below state level (city, county, three-digit ZIP regions) and facility location.
  • Demographics and context such as gender, rare diagnoses or procedures, occupation, employer, or language.

Why indirect identifiers matter

Linkage attacks join quasi-identifiers with outside sources, shrinking anonymity sets until few people match. Managing these fields through generalization, binning, or suppression is central to controlling re-identification risk.

Safe Harbor De-Identification Method

The Safe Harbor Method removes all 18 HIPAA identifiers and requires no actual knowledge that remaining data could identify an individual. When applied correctly, the result qualifies as de-identified health information under HIPAA.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

How to implement Safe Harbor

  • Inventory data elements and map them to the 18 identifiers; delete or mask them.
  • Generalize geography to the first three ZIP digits only when the combined area exceeds 20,000 people; otherwise set the three-digit ZIP to 000.
  • Aggregate ages over 89 (and related dates) into a single 90-or-older category.
  • Verify you have no actual knowledge that anyone could still be identified from the dataset alone or via reasonable linkages.

Strengths and limitations

  • Strengths: clear checklist, fast compliance, easy to operationalize.
  • Limitations: can remove useful detail (dates, granularity, small-area geography) and may underperform for niche analyses needing precision.

Expert Determination De-Identification Method

The Expert Determination Method engages a qualified expert to apply accepted statistical and scientific principles to ensure the risk of identification is very small. It supports richer data utility than Safe Harbor while maintaining measurable privacy assurances.

Typical expert determination process

  • Define scope and recipients, enumerate direct and indirect identifiers, and select a quantifiable risk threshold.
  • Assess external data availability and linkage threats; model record uniqueness and replicability.
  • Apply transformations (generalization, suppression, noise, date shifting, top/bottom coding, binning) and iterate until the risk threshold is met.
  • Layer safeguards beyond the data (access controls, Data Use Agreement terms, audit rights) that the expert can rely on in the risk calculus.
  • Produce a written determination describing methods, assumptions, residual risk, controls, and validity period; retain documentation.

This pathway preserves more analytic value, but it requires expertise, documentation, and periodic review as data ecosystems and risks evolve.

Limited Data Sets and Data Use Agreements

A Limited Data Set (LDS) is PHI stripped of direct identifiers but allowed to retain certain detail such as dates and limited geography. You may use and disclose an LDS for research, public health, or health care operations, subject to a signed Data Use Agreement.

What an LDS may include

  • City, state, and ZIP code (five-digit); not full street address.
  • All relevant dates (e.g., birth, death, admission, discharge, service dates).
  • Other clinical and operational fields that are not direct identifiers.

You must exclude from an LDS

  • Names and full postal address (other than city, state, ZIP code).
  • Telephone and fax numbers, email addresses, Social Security numbers.
  • Medical record and health plan beneficiary numbers, account and certificate/license numbers.
  • Vehicle and device identifiers/serial numbers, URLs, IP addresses.
  • Biometric identifiers, full-face photos and comparable images.

Data Use Agreement essentials

  • Specify permitted uses/disclosures, authorized recipients, and prohibition on re-identification or contact.
  • Require safeguards, breach/misuse reporting, and that agents/subrecipients follow the same restrictions.
  • Limit use to stated purposes, ensure return or destruction of data when the purpose ends, or continue protections if destruction is infeasible.

Limited Data Set Compliance tips

  • Automate LDS extraction templates, validate exclusions, and log releases.
  • Align small-cell suppression and date-shifting policies with your DUA and recipients’ environments.
  • Review DUAs annually and update controls as your risk posture or external data landscape changes.

Managing Re-identification Risk

Risk management does not stop at de-identification; it continues across people, process, and technology. You should treat re-identification risk as a measurable, monitorable metric tied to controls and recipient obligations.

Technical and analytical controls

  • Pseudonymize identifiers, tokenize keys, encrypt at rest/in transit, and restrict row-level access.
  • Generalize and bin quasi-identifiers; apply top/bottom coding, date shifting, rounding, and outlier handling.
  • Adopt small-cell suppression and k-anonymity style thresholds; test l-diversity/t-closeness where appropriate.
  • Score linkage risk by considering external data availability, uniqueness, and replicability; reassess before each new release.

Governance and operational safeguards

  • Gate data through approvals, minimum-necessary reviews, and documented privacy impact assessments.
  • Bind recipients with a Data Use Agreement; monitor compliance with audits and high-risk query alerts.
  • Train staff, manage vendors, and maintain incident response playbooks and retention schedules.

Conclusion

Direct identifiers must be removed to protect privacy; indirect identifiers must be managed to control linkage risk. Safe Harbor offers a prescriptive path, while the Expert Determination Method delivers flexibility with quantifiable safeguards. Limited Data Sets enable richer analysis under a Data Use Agreement, and sustained governance keeps re-identification risk acceptably low.

FAQs

What are the 18 HIPAA identifiers?

The 18 identifiers are: (1) names; (2) all geographic subdivisions smaller than a state (street address, city, county, precinct, ZIP code and equivalents), except the initial three ZIP digits when the combined area has more than 20,000 people (otherwise use 000); (3) all elements of dates (except year) related to an individual, and all ages over 89 grouped as 90 or older; (4) telephone numbers; (5) fax numbers; (6) email addresses; (7) Social Security numbers; (8) medical record numbers; (9) health plan beneficiary numbers; (10) account numbers; (11) certificate/license numbers; (12) vehicle identifiers and serial numbers, including license plates; (13) device identifiers and serial numbers; (14) web URLs; (15) IP address numbers; (16) biometric identifiers, including finger and voice prints; (17) full-face photographic images and comparable images; (18) any other unique identifying number, characteristic, or code.

How does the Safe Harbor method reduce risk?

The Safe Harbor Method deletes all 18 identifiers and confirms there is no actual knowledge that remaining data could identify someone. By removing the strongest identifying signals and applying rules for ZIP codes and advanced ages, it lowers linkage potential and produces de-identified health information suitable for broad sharing.

What is a Limited Data Set under HIPAA?

A Limited Data Set is PHI that excludes direct identifiers but may retain dates and limited geography (city, state, ZIP). It can be disclosed for research, public health, or health care operations only under a Data Use Agreement that specifies purpose, recipients, safeguards, and strict no re-identification or contact provisions.

How is expert determination performed?

A qualified expert defines the data’s context and recipients, selects a risk threshold, and models identification risk using accepted statistical methods. The expert then applies transformations and administrative controls until the risk is very small, documents methods and assumptions, and issues a written Expert Determination Method report that supports ongoing oversight.

Share this article

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Related Articles