Guide to HIPAA Identifiers Removal for De-Identification
Overview of HIPAA Privacy Rule
The HIPAA Privacy Rule allows you to share and use health information responsibly by applying De-Identification Standards to Protected Health Information (PHI). Properly de-identified data is no longer PHI, enabling analytics, quality improvement, and research while maintaining HIPAA Privacy Rule Compliance.
Under the rule, you can achieve de-identification using either the Safe Harbor Method or the Expert Determination approach. Both pathways aim to reduce Re-Identification Risk to a level that makes it impractical for recipients to link data back to an individual.
Safe Harbor Method Identifiers
The Safe Harbor Method requires two things: remove specific identifiers and ensure you have no actual knowledge that the remaining information could identify an individual. Below are the identifiers to remove before disclosure.
The 18 identifiers you must remove
- Names.
- All geographic subdivisions smaller than a state (street address, city, county, precinct, ZIP code, and equivalent geocodes) except the initial three digits of a ZIP code when the combined area has more than 20,000 people; otherwise use 000.
- All elements of dates (except year) directly related to an individual (for example, birth, admission, discharge, death) and all ages over 89; ages 90+ must be grouped as “90 or older.”
- Telephone numbers.
- Fax numbers.
- Email addresses.
- Social Security numbers.
- Medical record numbers.
- Health plan beneficiary numbers.
- Account numbers.
- Certificate/license numbers.
- Vehicle identifiers and serial numbers, including license plates.
- Device identifiers and serial numbers.
- Web URLs.
- IP address numbers.
- Biometric identifiers (for example, finger or voice prints).
- Full-face photographic images and any comparable images.
- Any other unique identifying number, characteristic, or code.
Implementation notes
- For item 18, do not disclose re-identification codes, algorithms, or keys, and ensure any internal code is not derived from personal data.
- Confirm downstream datasets cannot be triangulated with external sources to reveal identity; “no actual knowledge” is an explicit Safe Harbor requirement.
Expert Determination Method
Expert Determination relies on a qualified expert who uses accepted statistical and scientific methods to conclude that the likelihood of re-identification is very small. This route offers flexibility when you need finer detail than Safe Harbor allows, provided risk controls are robust and documented.
What it requires
- A defined use case and data environment (who will access the data, where it will reside, and for how long).
- Formal risk analysis covering external data availability, record linkage risk, and attack scenarios.
- Written determination explaining methods, assumptions, residual risk, and applicable Data Privacy Safeguards.
Common techniques
- Generalization and aggregation (coarsening dates, ages, and locations; top-coding ages).
- Suppression and masking (removing rare values, truncating IDs, binning outliers).
- Perturbation (adding bounded noise, date shifting, micro-aggregation) with utility-risk tradeoffs documented.
- Governance controls (access limits, audit, contractual terms) to keep contextual risk low.
Geographic Data De-Identification
Location data can be highly identifying. Apply the most restrictive rules that meet your needs while maintaining analytical value and low Re-Identification Risk.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.
Safe Harbor rules
- Remove all geographic subdivisions smaller than a state: street address, city, county, precinct, and full ZIP code.
- You may keep only the first three digits of a ZIP code if the combined three‑digit area has more than 20,000 people; if 20,000 or fewer, replace with 000.
Expert Determination options
- Use coarser geographies (for example, state, multi-state region) or privacy-preserving spatial jittering with parameters justified by the expert.
- Where necessary, allow limited sub-state geography with additional safeguards (tight access controls, aggregation thresholds, or cell-size suppression).
Practical tips
- Avoid publishing very small geographic cells; combine sparse areas or report at higher levels to maintain plausible deniability.
- Check that maps or heat tiles cannot be back-solved to exact coordinates.
Date and Age Data Handling
Temporal data often enables identity linkage. Treat dates and ages carefully to align with the chosen de-identification pathway.
Safe Harbor rules
- Keep only the year for dates tied to an individual (for example, birth, admission, discharge, death).
- Top-code ages: any age over 89 must be reported as “90 or older.”
Expert Determination options
- Date shifting with undisclosed, bounded offsets that preserve intervals but break exact linkage.
- Month-level or quarter-level reporting instead of exact days, when justified by risk and utility.
- Age banding (for example, 0–4, 5–9, …, 85–89, 90+) and suppression of sparse age-date combinations.
Operational practices
- Standardize time zones and truncation rules so derived fields don’t accidentally reintroduce fine-grained dates.
- Apply small-cell suppression to rare combinations of age, date, and geography.
Biometric and Visual Identifiers
Biometric and image data can uniquely identify a person even when traditional identifiers are removed. Treat these elements as high risk.
Safe Harbor removals
- Exclude biometric identifiers (for example, fingerprints, voiceprints).
- Exclude full-face photographs and comparable images that could enable recognition.
Expert Determination approaches
- De-face or crop images, blur identifiable features, and remove EXIF metadata.
- For audio, avoid storing or share only content transcripts after screening for names and other identifiers; do not retain voiceprints.
- Document that residual identification risk from remaining media is very small.
Risk Assessment and Safeguards
Strong technical and organizational controls are essential, especially for Expert Determination, but they also help you maintain Safe Harbor’s “no actual knowledge” condition.
Assessing re-identification risk
- Measure distinguishability (how unique a record is) and replicability (stability of quasi-identifiers over time).
- Consider data linkage scenarios using public and commercial datasets.
- Stress-test rare combinations (for example, extreme ages, uncommon procedures in small geographies).
Data Privacy Safeguards
- Data minimization: include only fields necessary for the purpose.
- Access controls: role-based permissions, least privilege, and encryption at rest/in transit.
- Contractual terms: data use agreements, re-identification prohibitions, and audit rights.
- Monitoring: logging, anomaly detection, and periodic revalidation of risk as context changes.
Conclusion
To de-identify health data responsibly, use the Safe Harbor Method when its rules meet your needs; otherwise, apply Expert Determination with rigorous analysis and documented controls. Handle geography, dates, ages, and biometric or visual elements conservatively, and pair technical steps with strong governance to keep Re-Identification Risk very small while enabling compliant data use.
FAQs.
What are the 18 HIPAA identifiers that must be removed?
- Names.
- Geographic subdivisions smaller than a state (street address, city, county, precinct, ZIP; only 3-digit ZIPs allowed when population > 20,000; otherwise 000).
- All elements of dates (except year) tied to an individual and all ages over 89 (report as 90+).
- Telephone numbers.
- Fax numbers.
- Email addresses.
- Social Security numbers.
- Medical record numbers.
- Health plan beneficiary numbers.
- Account numbers.
- Certificate/license numbers.
- Vehicle identifiers and serial numbers, including license plates.
- Device identifiers and serial numbers.
- Web URLs.
- IP address numbers.
- Biometric identifiers (for example, finger or voice prints).
- Full-face photographic images and comparable images.
- Any other unique identifying number, characteristic, or code.
How does the Safe Harbor method ensure de-identification?
Safe Harbor ensures de-identification by removing the 18 specified identifiers and confirming you have no actual knowledge that the remaining data could identify someone. When both conditions are met, the dataset is considered de-identified under HIPAA’s De-Identification Standards.
What is the role of the Expert Determination method?
Expert Determination engages a qualified expert to apply accepted scientific methods and conclude that the chance of re-identification is very small. The expert may use generalization, suppression, or perturbation and must document assumptions, techniques, and safeguards supporting the conclusion.
Can geographic data smaller than a state be included without removal?
Under Safe Harbor, no—sub-state geography must be removed, with the limited exception of 3-digit ZIPs when the combined area exceeds 20,000 people. Under Expert Determination, some sub-state geography may be allowed if the expert shows residual risk is very small and appropriate safeguards are in place.
How are ages over 89 treated for de-identification?
Safe Harbor requires that any age over 89 not be shown exactly; instead, you must report it as a single top-coded group, “90 or older.” Under Expert Determination, you may also top-code or band ages, provided the expert’s analysis supports a very small re-identification risk.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.