HIPAA De-Identification Requirements: Which Data Elements Must Be Removed?

Check out the new compliance progress tracker


Product Pricing Demo Video Free HIPAA Training
LATEST
video thumbnail
Admin Dashboard Walkthrough Jake guides you step-by-step through the process of achieving HIPAA compliance
Ready to get started? Book a demo with our team
Talk to an expert

HIPAA De-Identification Requirements: Which Data Elements Must Be Removed?

Kevin Henry

HIPAA

May 03, 2024

7 minutes read
Share this article
HIPAA De-Identification Requirements: Which Data Elements Must Be Removed?

If you handle Protected Health Information, understanding HIPAA de-identification requirements helps you share useful data while controlling re-identification risk. Below, you’ll find the two approved standards, the full list of identifiers to remove, how Limited Data Sets work, and practical steps to stay compliant.

HIPAA De-Identification Standards

Safe Harbor (remove the 18 HIPAA Identifiers)

Under Safe Harbor, you must strip the dataset of the 18 HIPAA Identifiers about the individual or the individual’s relatives, employers, or household members. After removal and reasonable assurance that no residual knowledge can re-identify a person, the data is considered de-identified and no longer treated as PHI under HIPAA.

Expert Determination (risk-based approach)

An expert with appropriate statistical and scientific knowledge applies accepted methods to determine that the risk of re-identification is very small. The expert documents assumptions, transformations, testing, and residual risk, and you retain that documentation. This path can preserve more data utility than Safe Harbor by using techniques like generalization, suppression, and noise infusion.

List of Required Identifiers to Remove

  1. Names.
  2. All geographic subdivisions smaller than a state (Geographic Subdivisions), including street address, city, county, precinct, ZIP code, and equivalent geocodes, except as noted below.
  3. All elements of dates (except year) directly related to an individual (for example, birth, admission, discharge, and death dates), and all ages over 89 and any elements that indicate such ages.
  4. Telephone numbers.
  5. Fax numbers.
  6. Email addresses.
  7. Social Security numbers.
  8. Medical record numbers.
  9. Health plan beneficiary numbers.
  10. Account numbers.
  11. Certificate/license numbers.
  12. Vehicle identifiers and serial numbers, including license plate numbers.
  13. Device identifiers and serial numbers.
  14. Web URLs.
  15. IP address numbers.
  16. Biometric Identifiers, including finger and voice prints.
  17. Full-face photographic images and any comparable images.
  18. Any other Unique Identifying Codes, numbers, characteristics, or combinations that could identify an individual (except a re-identification code retained solely by the covered entity as permitted by HIPAA).

Clarifications on dates and geography

  • Dates: Only the year may remain under Safe Harbor; all other date elements must be removed or generalized. Ages 90+ must be grouped into a single “90 or older” category.
  • ZIP codes: The initial three digits may remain only if all ZIP codes with those three digits cover more than 20,000 people; otherwise, replace with 000.

Exceptions for Limited Data Sets

A Limited Data Set (LDS) is still PHI, not de-identified data. It may be used and disclosed only for research, public health, or health care operations—and only under a Data Use Agreement.

What an LDS may include

  • City, state, and full ZIP code (but not street address).
  • All elements of dates relevant to the individual (for example, admission, discharge, service, birth, and death dates).
  • Other clinically useful fields that are not direct identifiers.

What an LDS must still exclude

  • Direct identifiers such as names, street address, telephone and fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle and device identifiers, URLs, IP addresses, Biometric Identifiers, full-face photos, and any Unique Identifying Codes that could enable identification.

Because an LDS remains PHI, HIPAA’s safeguards and minimum necessary principles still apply, and disclosures require a Data Use Agreement rather than treating the file as de-identified.

Data Use Agreements for Limited Data Sets

Before sharing an LDS, execute a Data Use Agreement that:

  • Specifies the permitted purposes (research, public health, or health care operations) and limits use and disclosure to those purposes.
  • Identifies who is authorized to receive and use the LDS.
  • Requires appropriate administrative, physical, and technical safeguards to prevent unauthorized use or disclosure.
  • Requires the recipient to report any unauthorized use or disclosure and to mitigate potential harms.
  • Flows down the same restrictions to agents and subcontractors.
  • Prohibits re-identification and prohibits contacting individuals.
  • Requires return or destruction of the LDS at the end of the project, if feasible, or documented justification if not feasible.

A Data Use Agreement differs from a Business Associate Agreement; a DUA governs a Limited Data Set shared for defined purposes, while a BAA is needed when a vendor handles PHI to perform functions for the covered entity.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Compliance and Enforcement

The HHS Office for Civil Rights enforces the HIPAA Privacy, Security, and Breach Notification Rules. If you disclose data that is not properly de-identified, or misuse an LDS, you may face investigations, resolution agreements, corrective action plans, and civil monetary penalties. Certain intentional acts can trigger criminal liability, and state attorneys general may also bring actions.

Maintain thorough documentation: which de-identification method you used, the transformations applied, expert reports (if any), data release logs, and executed Data Use Agreements. Strong records demonstrate diligence and reduce enforcement risk.

Best Practices for De-Identification

  • Choose the right pathway: use Safe Harbor for clear rules or Expert Determination when you need more data utility and can defend a “very small risk.”
  • Minimize data: share only fields that support the stated purpose; avoid quasi-identifiers you do not need.
  • Apply proven techniques: generalization, suppression, sampling, date shifting, small-cell suppression, k-anonymity, l-diversity, t-closeness, and differential privacy where appropriate.
  • Manage Unique Identifying Codes: if you generate a re-identification code, ensure it is not derived from personal attributes, keep the crosswalk separate, and never disclose it to recipients.
  • Test for re-identification risk: attempt linkage against public datasets (for example, voter files or commercial data) and document results.
  • Governance and training: define standard operating procedures, access controls, review boards, and audit trails; train staff on the 18 HIPAA Identifiers and LDS rules.
  • Iterate and monitor: re-evaluate risk when data accumulates, when new external data emerges, or when you expand sharing.

Implications for Research and Health Operations

Fully de-identified data enables broader sharing and innovation—analytics, AI model development, benchmarking—without HIPAA restrictions. However, its utility may be reduced by the removal or generalization of key fields.

A Limited Data Set preserves dates and local geography, supporting outcomes research, epidemiology, and quality improvement while remaining PHI. With a sound Data Use Agreement, you can achieve high utility and maintain compliance, provided you enforce safeguards and prohibit re-identification.

Conclusion

HIPAA de-identification requirements center on two pathways: remove the 18 identifiers under Safe Harbor or use Expert Determination to achieve a very small re-identification risk. Limited Data Sets offer a middle ground—more utility with a Data Use Agreement and strong controls. Document your choices, minimize data, and regularly test risk to keep data useful and compliant.

FAQs.

What are the 18 identifiers that must be removed for HIPAA de-identification?

The 18 HIPAA Identifiers are: names; geographic subdivisions smaller than a state (street address, city, county, precinct, ZIP code, and geocodes—with the three-digit ZIP exception); all elements of dates (except year) and ages over 89; telephone numbers; fax numbers; email addresses; Social Security numbers; medical record numbers; health plan beneficiary numbers; account numbers; certificate/license numbers; vehicle identifiers and serial numbers (including license plates); device identifiers and serial numbers; Web URLs; IP addresses; Biometric Identifiers (finger and voice prints); full-face photos and comparable images; and any other Unique Identifying Codes, numbers, or characteristics that could identify a person.

How does a limited data set differ from fully de-identified data?

A Limited Data Set is still PHI and may include city, state, ZIP code, and full dates (for example, admission or birth dates). It must exclude direct identifiers (names, contact info, SSNs, MRNs, and similar). Sharing an LDS requires a Data Use Agreement and HIPAA safeguards. Fully de-identified data, by contrast, either removes the 18 identifiers (Safe Harbor) or meets Expert Determination’s “very small risk” threshold, and it is not PHI under HIPAA.

What types of data use agreements are required for limited data sets?

You need a Data Use Agreement that defines permitted purposes (research, public health, or health care operations), specifies authorized users, mandates safeguards, requires reporting of any misuse, flows down restrictions to agents, prohibits re-identification and contacting individuals, and provides for return or destruction of the data when the project ends.

What are the penalties for failing to properly de-identify data under HIPAA?

Improper de-identification or misuse of a Limited Data Set can lead to HHS OCR investigations, corrective action plans, and tiered civil monetary penalties that scale with culpability and are adjusted annually. Willful or malicious disclosures may trigger criminal penalties, and state attorneys general can bring civil actions. Strong documentation and controls significantly reduce enforcement risk.

Share this article

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Related Articles