Safe Harbor Checklist: HIPAA Data Elements to Remove for De-Identification
Under the HIPAA Safe Harbor Rule, you can create de-identified health information by eliminating specified identifiers and ensuring you have no actual knowledge that remaining fields could identify someone. Use this checklist to remove required elements while controlling re-identification risk.
Removal of 18 HIPAA Identifiers
To comply with Safe Harbor, remove these identifiers wherever they appear—structured fields, free text, images, audio, video, and metadata.
- Names.
- All geographic subdivisions smaller than a state (street address, city, county, precinct, ZIP code, and equivalent geocodes; see ZIP exception below).
- All elements of dates (except year) directly related to an individual (for example, birth, admission, discharge, death), and treat all ages over 89 as a single category of “90 or older.”
- Telephone numbers.
- Fax numbers.
- Email addresses.
- Social Security numbers.
- Medical record numbers.
- Health plan beneficiary numbers.
- Account numbers.
- Certificate/license numbers.
- Vehicle identifiers and serial numbers, including license plate numbers.
- Device identifiers and serial numbers.
- Web URLs.
- IP address numbers.
- Biometric identifiers (for example, finger and voice prints).
- Full-face photographs and any comparable images.
- Any other unique identifying number, characteristic, or code, except a permitted re-identification code that is not derived from information about the individual.
Before sharing, verify no Unique Identifying Codes remain that could link back to a person. If you keep an internal code, store the key separately and never disclose it.
Geographic Data Restrictions
Geographic subdivisions smaller than a state are identifiers and must be removed. This includes street address, city, county, precinct, ZIP code, and equivalent geocodes (for example, census blocks or precise coordinates).
You may disclose the state, and in some contexts broad multi-state regions, but avoid releasing small-area geography that, combined with clinical details, could raise re-identification risk.
- Strip location clues from free text, headers, footers, image overlays, and filenames.
- If location is essential, use state-level only or broad regions rather than specific places.
- For multi-site data, prefer facility type and region over named facility addresses.
Date Element Redaction
For de-identified releases, retain only the year for dates directly related to the individual. Remove month, day, and any associated time stamps in fields or metadata that could reveal full dates.
- Keep “2019,” not “06/14/2019” or “June 14, 2019 08:42.”
- Apply redaction to birth, admission, discharge, death, service, collection, and appointment dates.
- For anyone older than 89, do not disclose specific dates or even the year that signals their exact age; use “90 or older.”
- Durations and intervals (for example, “length of stay 4 days”) are acceptable when they cannot be reverse-engineered to exact dates.
Handling of Age Data
Age can be shared in years up to 89. Beyond that, Safe Harbor requires age aggregation to protect privacy.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.
- Report whole-year ages for individuals 0–89. Avoid granular units (days, weeks, months) when they could enable triangulation with other fields.
- Top-code ages: use a single category, “90 or older,” for anyone aged 90+; do not list 90, 91, etc.
- For small cohorts or rare conditions, consider broader age bands to further reduce re-identification risk.
- Ensure accompanying dates do not indirectly reveal a more precise age than permitted.
Prohibition of Re-identification
Safe Harbor has two prongs: remove the identifiers and have no actual knowledge that remaining information could identify a person. You must also avoid practices that invite re-identification.
- Do not attempt linkage to other data sources; prohibit recipients from re-identifying or contacting individuals.
- If you use internal linkage keys, ensure they are not derived from PHI, store the key separately, and never disclose it.
- Apply disclosure controls—minimum cell sizes, suppression of outliers, and review of free text—to keep re-identification risk low.
- Document your de-identification process to demonstrate adherence to the HIPAA Safe Harbor Rule.
Exceptions for Zip Codes
ZIP codes are identifiers, but Safe Harbor permits a limited exception for three-digit ZIPs when populations are sufficiently large.
- You may include only the first three digits if the combined area of all ZIP codes sharing those digits has a population greater than 20,000.
- If that area has 20,000 or fewer people, replace the three digits with 000.
- Do not pair three-digit ZIPs with other granular location details (for example, city or county) that would narrow geography.
- Favor state-level or broader regions when three-digit ZIPs are borderline or unnecessary.
Treat the three-digit ZIP exception as a narrow allowance, not an invitation to add more precision.
Biometric and Image Data Removal
Remove biometric identifiers and images that enable recognition. These categories are explicitly disallowed in de-identified outputs under Safe Harbor.
- Delete full-face photographs and comparable images (for example, profile images or distinctive marks that reveal identity).
- Remove biometric identifiers such as fingerprints, palm prints, voice prints, iris or retinal scans, and facial geometry templates.
- Strip device identifiers and serial numbers from imaging files and metadata.
- Crop or mask images only if you are confident no identity can be inferred after processing; otherwise, exclude the file entirely.
Summary
Use this Safe Harbor checklist to create de-identified health information: remove all 18 identifiers, restrict geographic subdivisions, redact dates to the year, apply age aggregation (90+), prohibit re-identification, and apply the ZIP three-digit exception carefully. Pair technical scrubbing with policy controls to maintain a low re-identification risk.
FAQs
What are the 18 HIPAA identifiers that must be removed?
The 18 identifiers are: names; all geographic subdivisions smaller than a state; all elements of dates (except year) related to the individual, with ages over 89 aggregated to “90 or older”; telephone numbers; fax numbers; email addresses; Social Security numbers; medical record numbers; health plan beneficiary numbers; account numbers; certificate/license numbers; vehicle identifiers and serial numbers (including license plates); device identifiers and serial numbers; web URLs; IP addresses; biometric identifiers (such as finger and voice prints); full-face photographs and comparable images; and any other unique identifying number, characteristic, or code not permitted as a re-identification code.
How should geographic data be treated under HIPAA Safe Harbor?
Remove all geographic subdivisions smaller than a state—street address, city, county, precinct, ZIP code, and equivalent geocodes. You may share state-level location, and you may disclose only the first three digits of a ZIP code when the combined area exceeds 20,000 people; otherwise use 000.
What is the rule for age data in de-identified information?
Report ages in whole years up to 89. For anyone older than 89, do not share specific ages or dates that reveal age; instead, aggregate to a single category labeled “90 or older.” When risk is still high, use broader age bands.
How is re-identification risk assessed under HIPAA?
After removing the 18 identifiers, you must have no actual knowledge that remaining fields could identify someone alone or in combination. Assess risk by checking small cell sizes, rare diagnoses or procedures, granular geography or time fields, and by enforcing policies that prohibit linkage or re-identification attempts.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.