What Is the HIPAA De-Identification Safe Harbor? Definition, 18 Identifiers, and Examples

Check out the new compliance progress tracker


Product Pricing Demo Video Free HIPAA Training
LATEST
video thumbnail
Admin Dashboard Walkthrough Jake guides you step-by-step through the process of achieving HIPAA compliance
Ready to get started? Book a demo with our team
Talk to an expert

What Is the HIPAA De-Identification Safe Harbor? Definition, 18 Identifiers, and Examples

Kevin Henry

HIPAA

March 27, 2024

7 minutes read
Share this article
What Is the HIPAA De-Identification Safe Harbor? Definition, 18 Identifiers, and Examples

Overview of HIPAA Privacy Rule

The HIPAA Privacy Rule sets national standards for how covered entities and their business associates use and disclose protected health information (PHI). Its aim is to protect patient confidentiality while enabling health data to support care delivery, operations, public health, and research.

One way HIPAA enables responsible data use is through de-identification. When you transform PHI into de-identified health information, it is no longer subject to the Privacy Rule’s restrictions. HIPAA recognizes two paths to de-identification: a statistical “expert determination” and the rule-based “safe harbor” method described here.

Safe harbor focuses on health information non-identifiability by requiring protected health information removal across specific data elements. When you apply it correctly, you can share or analyze data more freely without exposing patient identities.

Explanation of Safe Harbor Method

The safe harbor method is a prescriptive checklist: remove all listed identifiers for the individual and for relatives, employers, and household members, and do not have actual knowledge that remaining data could identify the person. If both conditions are met, HIPAA deems the dataset de-identified.

Unlike expert determination, safe harbor does not require a statistician’s sign-off. Its strength is clarity and repeatability—you apply the same rules each time. Its limitation is rigidity: you must perform unique identifier elimination even when certain fields might seem low risk. The tradeoff is a consistent, defensible path aligned with HIPAA compliance guidelines.

What remains after safe harbor?

After removal, you may retain high-level information such as year (but not month or day) and state-level geography. Aggregations, such as “age 90+,” are allowed to prevent singling out exceptionally old individuals.

Detailed List of 18 Identifiers

  1. Names.
  2. All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and equivalent geocodes, except the initial three digits of a ZIP code if the geographic unit formed by all ZIP codes with those three digits contains more than 20,000 people; otherwise, the three-digit ZIP must be replaced with 000.
  3. All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, death date; and all ages over 89 and all elements of such ages (including year), which must be aggregated into a single category of age 90 or older.
  4. Telephone numbers.
  5. Fax numbers.
  6. Email addresses.
  7. Social Security numbers.
  8. Medical record numbers.
  9. Health plan beneficiary numbers.
  10. Account numbers.
  11. Certificate/license numbers.
  12. Vehicle identifiers and serial numbers, including license plate numbers.
  13. Device identifiers and serial numbers.
  14. Web URLs.
  15. Internet Protocol (IP) address numbers.
  16. Biometric identifiers, including finger and voice prints.
  17. Full-face photographic images and any comparable images.
  18. Any other unique identifying number, characteristic, or code, except a re-identification code that is not derived from PHI and is kept separately and not disclosed with the dataset.

Process of Removing Identifiers

A robust safe harbor workflow combines policy, automation, and verification. The goal is complete protected health information removal across all data types without damaging the dataset’s utility more than necessary.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Step-by-step approach

  • Scope the use case: confirm safe harbor is the appropriate path versus a limited data set or expert determination.
  • Inventory data: map all tables, schemas, columns, and unstructured sources (notes, images, audio, PDFs) where identifiers may appear.
  • Define transformation rules: codify exact rules for each of the 18 categories (e.g., year-only for dates, three-digit ZIP logic, age 90+ binning).
  • Automate structured fields: suppress or generalize direct identifiers, and normalize quasi-identifiers (e.g., convert detailed dates to year, and granular locations to state).
  • Scrub free text: use pattern matching and natural language processing to detect and redact names, contact details, addresses, and facility names that imply location.
  • Handle images and signals: remove full-face photographs and comparable images; strip EXIF/DICOM metadata; redact device and serial numbers; avoid voice prints.
  • Manage re-identification codes (optional): if you need longitudinal linkage, generate a random, non-derivable code; store the mapping separately and never release it with the data.
  • Quality assurance: sample records, perform human-in-the-loop review for high-risk sources, and verify there is no actual knowledge of identifiability.
  • Document and approve: record which fields were removed or transformed, the tools used, the versioned ruleset, and the release decision.
  • Maintain and monitor: re-run the process when schema changes, retrain detection models, and periodically audit outputs.

Common pitfalls to avoid

  • Leaving identifiers in comment fields, scanned documents, or image metadata.
  • Forgetting the three-digit ZIP population rule or the age 90+ aggregation.
  • Retaining a re-identification code in the same dataset or deriving it from PHI.

Examples of De-Identification

Example 1: Hospital discharge records

  • Name, MRN, account number, phone, and email fields are suppressed.
  • DOB 1950-03-09 → retain Year only: 1950; if age > 89 at the event date, recode to “90+.”
  • Admission 2025-04-15 and discharge 2025-04-18 → 2025 and 2025.
  • ZIP 12345 → 123 if the three-digit area exceeds 20,000 people; otherwise 000. State remains.

Example 2: Clinical notes

“Patient John Davis (cell: (555) 867-5309) lives at 742 Evergreen Terrace, Springfield.” → “Patient [NAME REMOVED] (cell: [REMOVED]) lives at [ADDRESS REMOVED], [STATE].” Dates become years only (e.g., “seen on May 5, 2025” → “seen in 2025”).

Example 3: Medical imaging

  • Remove full-face photographs and comparable images from the release set.
  • Strip DICOM headers: PatientName, PatientID, DeviceSerialNumber, and date-time elements are removed or generalized to year.
  • Ensure screenshots or overlays do not reveal names, IDs, or small-area locations.

Importance of De-Identification in Healthcare

Safe harbor empowers you to share data for research, quality improvement, AI model development, and operational analytics while protecting patients. By enforcing data anonymization standards across common identifiers, you reduce the likelihood of identity disclosure.

For organizations, de-identification lowers regulatory exposure and breach risk, enabling faster data collaboration with payers, vendors, and academic partners. For patients, it builds trust that their information is used responsibly.

Because safe harbor is standardized, it scales across projects and teams. Clear rules make it easier to train staff, evaluate tools, and validate outputs consistently.

Compliance Requirements for Safe Harbor

  • Remove all 18 categories of identifiers for the individual and for relatives, employers, and household members.
  • Retain only the year for any dates tied to the patient; recode ages over 89 to a single “90+” category.
  • Apply the geographic rule correctly: nothing smaller than state, with the specific three‑digit ZIP and 20,000‑person threshold requirement.
  • Ensure you have no actual knowledge that the remaining data could identify an individual, alone or in combination with other data you hold.
  • Document your process: data inventory, transformation rules, tools used, QA results, and approval records.
  • Train workforce members who prepare releases and audit outputs periodically.
  • If you maintain a re-identification code, keep it separate and undisclosed; do not derive it from PHI.
  • When using vendors, ensure contracts cover pre-release PHI handling and destruction after unique identifier elimination.

Conclusion

The HIPAA de-identification safe harbor gives you a clear, repeatable pathway to convert PHI into de-identified health information. By removing the 18 identifiers, honoring the date and geography rules, and documenting your process, you achieve health information non-identifiability that supports innovation while protecting privacy.

FAQs.

What is the purpose of HIPAA de-identification safe harbor?

Its purpose is to provide a straightforward, rule-based method to create de-identified health information by removing specified identifiers so you can use and share data while preserving patient privacy.

Which identifiers must be removed under the safe harbor method?

You must remove the 18 identifier categories listed above, including names; small-area geographies; all date elements except year (and ages over 89 recoded to “90+”); contact numbers and email; Social Security, medical record, and account numbers; certificate/license, vehicle, and device IDs; URLs and IP addresses; biometric and full-face images; and any other unique identifying number, characteristic, or code.

How does safe harbor protect patient privacy?

By eliminating direct and indirect identifiers and generalizing sensitive fields like dates and locations, safe harbor reduces the chance that a record can be linked back to a person, aligning data release with HIPAA compliance guidelines.

Can de-identified data be re-identified under HIPAA?

Technically, a covered entity may retain a separate, non-derivable code to re-link records if needed, but that code cannot be disclosed with the dataset. Although safe harbor greatly lowers risk, no method guarantees zero re-identification, so strong governance and controls remain essential.

Share this article

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Related Articles