What Is De-Identified Information Under HIPAA? Definition, Safe Harbor vs. Expert Determination, and Examples
Definition of De-Identified Information
Under the HIPAA Privacy Rule, de-identified information is data that does not identify an individual and for which there is no reasonable basis to believe it can be used to identify an individual. Once de-identified, the data is no longer Protected Health Information (PHI) and falls outside HIPAA’s use and disclosure requirements.
HIPAA recognizes two pathways to achieve de-identification: the Safe Harbor method (identifier removal) and the Expert Determination method (statistical de-identification). Both approaches aim to reduce the risk of re-identification to an acceptable level, but they differ in process and flexibility.
Safe Harbor Method Requirements
The Safe Harbor method requires removing specific direct identifiers of the individual and of relatives, employers, or household members, and having no actual knowledge that the remaining information could identify the person. The following 18 categories must be removed:
- Names
- All geographic subdivisions smaller than a state (street address, city, county, precinct, ZIP code, and equivalent geocodes), except the initial three digits of a ZIP code if the combined area has more than 20,000 people; otherwise use 000
- All elements of dates (except year) directly related to an individual (for example, birth, admission, discharge, death), and ages over 89 aggregated to “age 90 or older”
- Telephone numbers
- Fax numbers
- Email addresses
- Social Security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers, including license plates
- Device identifiers and serial numbers
- Web URLs
- IP addresses
- Biometric identifiers (for example, finger and voice prints)
- Full-face photographs and comparable images
- Any other unique identifying number, characteristic, or code
You may assign a re-identification code for internal linkage if the code is not derived from information about the individual, is not disclosed externally, and is not used to contact the individual. After removal, you must still ensure you have no actual knowledge that the remaining data could identify someone.
Expert Determination Process
The Expert Determination method uses statistical de-identification. A qualified expert applies generally accepted statistical and scientific principles to determine and document that the risk of re-identification is very small, considering both the data and the context in which it will be shared or used.
Typical steps
- Scope and context: Define data elements, intended uses, recipients, and reasonable external data sources.
- Risk analysis: Identify quasi-identifiers (for example, year of birth, 3-digit ZIP, rare diagnoses) and assess linkage risks.
- Transformations: Apply techniques such as generalization, suppression, binning, top/bottom coding, perturbation/noise, hashing tokens, or record swapping.
- Validation: Measure residual Risk of Re-Identification under assumed attacker models and data recipient controls.
- Documentation: Record methods, assumptions, tests, residual risk, and conditions of use; issue a written determination.
- Governance: Specify monitoring, re-review triggers (for example, data refreshes), and documentation retention practices.
Examples of Removed Identifiers
Below are practical examples of how values change when applying Safe Harbor identifier removal while retaining data utility where permissible:
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.
- Name: “John A. Smith” → removed
- Street address: “742 Evergreen Terrace” → removed; City/ZIP → removed; 3-digit ZIP “021” retained only if population threshold is met, otherwise “000”
- Date of birth: “07/14/1976” → “1976” (year only)
- Admission/discharge dates: “03/02/2025–03/09/2025” → “2025–2025” (years only)
- Age 92 → “Age 90 or older”
- Phone/fax/email: “(555) 555-1212” / “555-1213” / “j.smith@example.com” → removed
- Identifiers: SSN, MRN, health plan number, bank/account numbers, license/cert numbers → removed
- Vehicle/device IDs: “VIN 1HGCM82633A…” / device serial “SN-48392” → removed
- Online identifiers: URL “exampleclinic.com/patient/123”, IP “203.0.113.7” → removed
- Biometrics: fingerprint or voice print → removed
- Images: full-face photo or comparable image → removed
- Other unique codes: any unique number/characteristic that could identify someone → removed (except a non-derivable internal re-identification code retained solely by the data holder)
Re-Identification Risk Assessment
Risk assessment focuses on whether the data could identify a person when combined with reasonably available information. Key drivers include uniqueness (rare combinations like age, region, and condition), small cell sizes, outliers, and the presence of stable quasi-identifiers.
Experts often evaluate k-anonymity, l-diversity, or t-closeness style measures to gauge indistinguishability across records. They also weigh contextual controls—contractual limits, access controls, aggregation rules, and audit capabilities—that reduce the Risk of Re-Identification in practice.
Risk is dynamic. New external datasets or broader access can increase linkage potential, so periodic review and, if needed, additional transformation are prudent when data is refreshed or circumstances change.
Documentation for Expert Determination
Maintain clear, reproducible documentation showing how Statistical De-Identification was achieved and why the residual risk is very small. Strong documentation supports compliance, defensibility, and consistent reuse.
What to include
- Dataset description, provenance, versions, and intended uses
- Inventory of identifiers and quasi-identifiers evaluated
- Transformation methods applied and parameters (for example, generalization hierarchies, suppression thresholds, noise calibration)
- Risk models, metrics, attacker assumptions, and validation results
- Contextual controls (access restrictions, Data Use Agreements, auditing) relied upon
- Residual risk statement, limitations, and conditions for use or sharing
- Expert’s credentials, date of determination, and sign-off
- Re-review triggers and change-management procedures
Documentation retention: keep required documentation for at least six years from creation or last effective date, and longer if your policies or agreements require. Retain any re-identification code mapping separately with strict access controls.
Uses and Compliance Considerations
De-identified data is not PHI and is not subject to HIPAA use/disclosure rules. You may use or disclose it for analytics, research, product development, or operations without individual authorization, provided you do not attempt re-identification or contact individuals.
Limited Data Set versus de-identified data: a Limited Data Set still contains some identifiers (for example, dates, city/state/ZIP, and other non-direct identifiers) and remains PHI. It requires a Data Use Agreement and is restricted to research, public health, or health care operations. Fully de-identified data, by contrast, has undergone Identifier Removal or Expert Determination and generally has fewer HIPAA constraints.
Good governance matters. Combine technical measures with policy controls to minimize Risk of Re-Identification. If data is re-identified or linked back to a person, it becomes PHI again and HIPAA obligations reattach immediately.
Bottom line: choose the Safe Harbor method for clear-cut Identifier Removal, or use Expert Determination when you need to retain more utility under documented, context-aware risk controls. Align with the HIPAA Privacy Rule, maintain Documentation Retention, and keep risks continuously managed.
FAQs.
What distinguishes de-identified information from protected health information?
Protected Health Information (PHI) is individually identifiable health information subject to HIPAA. De-identified information has been processed so it no longer identifies an individual and cannot reasonably be used to do so; therefore it is not PHI and is generally outside HIPAA’s use and disclosure restrictions.
How does the Safe Harbor method ensure de-identification?
Safe Harbor requires removing 18 specific identifiers (for the individual and related persons) and confirming you have no actual knowledge that the remaining data could identify someone. When applied correctly, this identifier removal achieves de-identification under the HIPAA Privacy Rule.
What qualifications are required for an expert determination?
A qualified expert is someone with appropriate knowledge and experience applying generally accepted statistical and scientific methods to de-identify data and assess Risk of Re-Identification. Typical credentials include advanced training in statistics, data privacy, epidemiology, or related fields, plus documented experience conducting de-identification risk assessments and producing defensible reports.
Can de-identified information be used for research without authorization?
Yes. Because de-identified information is not PHI, HIPAA does not require individual authorization for its research use or disclosure. You must still honor any contractual limits, avoid re-identification, and comply with other applicable laws and institutional policies.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.