Guide to HIPAA De-Identification: Requirements for Safe Harbor and Expert Determination
When you remove identifiers from protected health information to meet HIPAA de-identification standards, you reduce the risk that data can be tied back to a person while preserving analytic value. This guide explains the Safe Harbor method, the Expert Determination method, and the supporting practices that help you maintain Privacy Rule compliance.
Use it to plan controls, conduct expert analysis, and document decisions that withstand audits. You will see how statistical risk analysis, governance, and data use agreements work together to keep residual re-identification risk very small.
Safe Harbor Method Identifiers
Under Safe Harbor, you must remove specific identifiers and have no actual knowledge that remaining data could identify an individual. The following 18 identifiers must be removed or appropriately generalized before release.
The 18 identifiers
- Names
- All geographic subdivisions smaller than a state, including street address, city, county, precinct, and ZIP code (except the initial three digits if the combined area has more than 20,000 people; otherwise replace with 000)
- All elements of dates (except year) directly related to an individual, including birth, admission, discharge, death; and ages over 89 and related date elements, which must be aggregated as 90 or older
- Telephone numbers
- Fax numbers
- Email addresses
- Social Security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers, including license plates
- Device identifiers and serial numbers
- Web URLs
- IP address numbers
- Biometric identifiers, including finger and voice prints
- Full-face photographs and comparable images
- Any other unique identifying number, characteristic, or code (except a nondisclosable re-identification code maintained separately)
Practical considerations
- Address free-text fields that may contain identifiers; apply redaction or natural language processing before release.
- Generalize dates into years and collapse rare values (for example, top-code ages at 90+).
- Remove image metadata and scrub DICOM headers; exclude full-face photos entirely.
- Confirm and document the “no actual knowledge” determination before sharing.
Expert Determination Process
The Expert Determination method uses expert analysis grounded in statistical and scientific principles to conclude that the risk of re-identification is very small for the intended use and environment. A defensible process typically includes the steps below.
1) Define the use case and environment
Specify purpose, recipients, sharing channel, retention, and controls (for example, access limits and data use agreements). Clarify what data utility the users need so transformations preserve value.
2) Inventory direct and quasi-identifiers
Map variables that could identify someone alone or in combination: dates, geography, demographics, rare diagnoses, encounters, and device or provider attributes that increase linkability.
3) Conduct statistical risk analysis
Measure re-identification risk using accepted techniques, such as uniqueness analysis, equivalence class sizes (for k-anonymity style reasoning), outlier detection, and simulated linkage against plausible external data. Evaluate replicability, distinguishability, and availability of attributes in the data environment.
4) Apply privacy transformations
Iteratively generalize, bin, top-/bottom-code, suppress, or perturb values; consider noise addition, data swapping, or synthesis where needed. Recompute risk after each iteration to balance privacy and utility.
5) Assess contextual safeguards
Incorporate administrative, technical, and physical controls—access governance, training, contractual restrictions, and audit. Strong safeguards reduce overall risk without over-sanitizing the data.
6) Conclude and attest
When residual risk is very small for the defined context, the expert issues a written determination that documents methodology, results, and assumptions, along with conditions for ongoing use.
7) Monitor and refresh
Reassess when data, users, external linkage sources, or controls change. Periodic reviews keep the determination aligned with evolving risks.
Risk Assessment Criteria
Whether you use Safe Harbor or Expert Determination, risk must be evaluated systematically. The criteria below help you structure decisions and justify them.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.
- Identifiability: strength of direct identifiers, quasi-identifiers (ZIP, year, sex), and rare combinations that create uniqueness.
- Linkability: availability and quality of external datasets that an intruder could use for record linkage.
- Replicability and stability: how consistently attributes (e.g., chronic conditions, long-term residence) persist over time.
- Data distribution: sparsity, outliers, small cells, and the presence of rare events or procedures.
- Release context: public vs. restricted access, authentication, vetting, and oversight mechanisms.
- Safeguards: strength of administrative, technical, and physical controls governed by data use agreements.
- Utility requirements: the minimum precision needed to meet analytic goals without raising re-identification risk.
- Ongoing risk: frequency of refreshes, accumulation of longitudinal data, and potential composition effects from multiple releases.
Controls that reduce residual risk
- Contractual limits on re-identification attempts and redistribution, with clear penalties and audit rights.
- Secure enclaves, row-level access rules, and time-bound retention to limit exposure.
- Release only the variables necessary for the stated purpose; avoid high-risk free text.
Documentation Requirements
Strong documentation demonstrates Privacy Rule compliance and makes your de-identification repeatable. Keep records proportionate to the sensitivity of the data and the breadth of sharing.
Safe Harbor documentation
- Checklist confirming each of the 18 identifiers was removed or generalized as required.
- Evidence of the “no actual knowledge” assessment and sign-off by the data steward.
- Procedures for scrubbing free text, images, and metadata.
- Change log and versioned scripts used to transform the dataset.
Expert Determination documentation
- Scope: data elements, cohorts, time span, and intended use environment.
- Methodology: statistical techniques, risk metrics, assumptions, and validation tests.
- Transformations: generalization, suppression, perturbation rules, and utility impact.
- Results: risk estimates, sensitivity analyses, and basis for the “very small” conclusion.
- Attestation: expert’s signed report, effective date, conditions, and renewal cadence.
- Governance: data use agreements, user training, and monitoring/audit plan.
Re-identification code handling
- Store any internal re-identification code separately; do not disclose the algorithm or mapping to recipients.
- Ensure the code is not derived from disclosed data and is not a unique characteristic visible in the release.
Roles and Qualifications
Clear roles keep your program accountable and efficient. Define responsibilities before data moves.
Key roles
- Data steward: owns the dataset, verifies transformations, and approves release.
- Privacy/compliance lead: interprets de-identification standards and oversees Privacy Rule compliance.
- Security lead: implements access controls, logging, and secure transfer.
- Legal counsel: structures data use agreements and reviews risk acceptance.
- Analytic users: articulate utility needs and adhere to permitted uses.
- Independent expert (when using Expert Determination): performs the expert analysis and attests to residual risk.
Qualifications for the expert
- Demonstrated experience applying statistical and scientific principles to de-identification and re-identification risk.
- Depth in statistics, biostatistics, computer science, or a related quantitative field, with healthcare domain familiarity.
- Ability to design and validate risk models, explain assumptions, and defend conclusions.
- Knowledge of HIPAA, de-identification standards, and practical controls reflected in data use agreements.
- Professional independence and a documented track record of similar determinations.
Conclusion
Safe Harbor removes fixed identifiers for straightforward releases; Expert Determination uses expert analysis and contextual safeguards to reach a “very small” risk for more complex data. With sound documentation, strong controls, and clearly defined roles, you can share useful data while honoring HIPAA de-identification standards.
FAQs
What are the 18 identifiers removed in the Safe Harbor method?
They are: names; geographic subdivisions smaller than a state (street address, city, county, precinct, and ZIP—only the first three digits may be kept when the combined area exceeds 20,000 people; otherwise use 000); all elements of dates except year and any ages over 89 (aggregate as 90+); telephone numbers; fax numbers; email addresses; Social Security numbers; medical record numbers; health plan beneficiary numbers; account numbers; certificate/license numbers; vehicle identifiers and serial numbers (including plates); device identifiers and serial numbers; web URLs; IP addresses; biometric identifiers (finger and voice prints); full-face photographs and comparable images; and any other unique identifying number, characteristic, or code not permitted for disclosure.
How does the Expert Determination method assess risk?
An expert evaluates how identifiable the data are in context by analyzing quasi-identifiers, uniqueness, linkability to external sources, and stability of attributes. The expert then applies transformations (generalization, suppression, perturbation, or synthesis) and considers safeguards such as access controls and data use agreements. If the resulting residual risk of re-identification is very small for the defined use and environment, the expert issues a written determination with methods, results, and conditions.
What qualifications are needed for an expert?
The expert should have deep quantitative training and practical experience in de-identification and re-identification risk assessment, familiarity with HIPAA and de-identification standards, and the ability to design, validate, and explain statistical risk analysis. Independence, a record of relevant projects, and knowledge of contractual and technical safeguards reflected in data use agreements are also important.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.