HIPAA PII Identifiers: Mapping General PII to PHI’s 18 Identifiers and De‑Identification Steps
Overview of HIPAA PII and PHI Identifiers
HIPAA focuses on Protected Health Information (PHI), which is health-related data that can identify an individual. While “PII” is a broader privacy term, much of what you think of as PII maps directly to PHI when it appears in a healthcare context under the HIPAA Privacy Rule.
How general PII maps to PHI’s identifiers
- Names → Identifier 1 (direct identifiers).
- Addresses, cities, ZIP codes → Identifier 2 (geographic subdivisions smaller than a state).
- Dates like birth, admission, discharge, death → Identifier 3 (all elements of dates except year; ages over 89 handled specially).
- Phones, faxes, emails → Identifiers 4–6 (contact details).
- SSN, license numbers → Identifiers 7 and 11 (government IDs).
- Medical record and health plan numbers → Identifiers 8 and 9 (health system IDs).
- Financial account numbers → Identifier 10.
- Vehicle and device serials → Identifiers 12 and 13.
- URLs and IP addresses → Identifiers 14 and 15 (digital identifiers).
- Biometric and full-face images → Identifiers 16 and 17.
- Any other unique code that can identify a person → Identifier 18.
This mapping lets you quickly inventory data fields and decide whether a dataset is PHI, guiding your De-Identification Standards and downstream controls.
Detailed Explanation of the 18 HIPAA Identifiers
- Names of the individual or relatives, employers, or household members.
- Geographic subdivisions smaller than a state: street address, city, county, precinct, ZIP code, and similar geocodes. The first three ZIP digits may be retained only when the combined area has more than 20,000 people; otherwise replace with 000.
- All elements of dates (except year) tied to an individual: birth, admission, discharge, death, and similar. Ages over 89 and any elements (including year) that reveal such age must be grouped as “90 or older.”
- Telephone numbers.
- Fax numbers.
- Email addresses.
- Social Security numbers.
- Medical record numbers.
- Health plan beneficiary numbers.
- Account numbers (e.g., financial, patient portal accounts).
- Certificate/license numbers (professional, driver’s license, etc.).
- Vehicle identifiers and serial numbers, including license plates.
- Device identifiers and serial numbers (e.g., implant or equipment serials).
- Web URLs.
- IP address numbers.
- Biometric identifiers, including finger and voice prints.
- Full-face photographic images and comparable images.
- Any other unique identifying number, characteristic, or code that can identify a person (excluding a properly managed, non-derivable re-identification code kept separately).
Safe Harbor De-Identification Method
What the Safe Harbor Method requires
- Remove all 18 identifiers from the dataset.
- Generalize dates to year only; convert ages over 89 to “90 or older.”
- Redact geographic data smaller than a state; apply the three-digit ZIP rule and replace with 000 when the population threshold isn’t met.
- Ensure you have no actual knowledge that the remaining data could identify an individual alone or in combination.
Practical steps you can follow
- Inventory fields and map each to the identifiers list.
- Apply systematic removal or generalization (e.g., suppress names; keep only year; drop house number; mask account numbers).
- Scrub free text using rules and Data Anonymization Techniques (NER, pattern matching) to catch hidden identifiers.
- Conduct a reasonableness check for quasi-identifiers (small cells, rare events) and document the “no actual knowledge” determination.
- Optionally assign a non-derivable re-identification code; store the mapping separately with strict access controls.
- Record your process and approvals as part of your De-Identification Standards.
Safe Harbor is fast and repeatable but may reduce data utility, especially where dates and locations are analytically important.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.
Expert Determination De-Identification Method
What the Expert Determination Method entails
A qualified expert applies Statistical Risk Assessment to conclude the re-identification risk is “very small,” given the specific data, recipients, and release environment. The HIPAA Privacy Rule does not mandate a numeric threshold, so the expert justifies methods and risk metrics appropriate to your use case.
Typical workflow
- Define context: purpose, audience, access controls, and potential external data sources available to an attacker.
- Profile the dataset (direct and quasi-identifiers, rare combinations, outliers).
- Select risk models and Data Anonymization Techniques (k-anonymity, l-diversity, t-closeness, generalization, suppression, noise addition, differential privacy, tokenization).
- Transform data iteratively to reduce risk while preserving utility; validate using holdout tests and linkage simulations.
- Produce a written determination with assumptions, controls, residual risk, and any conditions for data use.
- Monitor drift and periodically reassess if the dataset or context changes.
Expert Determination can retain finer-grained dates or locations under controlled release, often yielding higher analytic value than the Safe Harbor Method.
Comparison of De-Identification Techniques
When Safe Harbor shines
- Standardized public releases where uniform rules are essential.
- Small programs without resources for a statistical review.
- Low tolerance for subjective judgments; easy to audit for compliance.
When Expert Determination is preferable
- Research and analytics requiring detailed dates, geography, or longitudinal linkage.
- High-dimensional data (EHRs, device telemetry) where Safe Harbor would overly suppress.
- Controlled environments with contracts, access controls, and monitoring.
Trade-offs to consider
- Compliance certainty vs. data utility.
- Speed and cost vs. tailored Statistical Risk Assessment.
- Public release vs. governed sharing with enforceable restrictions.
Compliance Requirements for PHI Handling
- Apply the Minimum Necessary standard for use, disclosure, and requests.
- Implement administrative, physical, and technical safeguards (access controls, MFA, encryption in transit and at rest, auditing, facility security).
- Execute Business Associate Agreements when vendors handle PHI.
- Maintain policies for risk analysis, workforce training, sanctions, incident response, and breach notification.
- Use Data Use Agreements and governance for limited data sets and de-identified data to prevent re-identification attempts.
- Manage re-identification codes separately; restrict and log any re-linking operations.
- Define retention and secure disposal for PHI and derivative datasets.
Best Practices for Data Privacy and Security
Build privacy into your data lifecycle
- Maintain a live data inventory and map fields to the 18 identifiers.
- Automate detection of identifiers in structured and unstructured data.
- Adopt tiered releases (public de-identified, controlled de-identified, limited data set) aligned to risk.
Strengthen technical controls
- Encrypt data in transit and at rest; enforce strong key management.
- Use tokenization or pseudonymization; never store tokens and keys together.
- Apply cell suppression, generalization, rounding, and noise where appropriate.
- Harden analytics platforms with RBAC/ABAC, least privilege, and immutable logging.
Operationalize responsible sharing
- Combine contractual controls (DUAs) with monitoring to deter re-identification.
- Limit free-text; if needed, run redaction pipelines and perform human spot checks.
- Continuously reassess risk as external data and threats evolve.
In practice, you will map PII to the 18 PHI identifiers, choose the Safe Harbor Method or Expert Determination Method based on your goals, and implement layered safeguards. This balanced approach protects privacy while preserving data utility for legitimate use.
FAQs.
What are the 18 HIPAA identifiers?
They include: names; geographic subdivisions smaller than a state (with the three-digit ZIP rule); all elements of dates (except year) tied to an individual plus ages over 89 grouped as “90+”; telephone, fax, and email; SSN; medical record and health plan numbers; account numbers; certificate/license numbers; vehicle and device identifiers; URLs and IP addresses; biometric identifiers; full-face photos and comparable images; and any other unique identifying number, characteristic, or code.
How does the Safe Harbor Method protect PHI?
It mandates removing all 18 identifiers and ensuring you have no actual knowledge that remaining data could identify someone. By standardizing what must be removed (e.g., names, detailed addresses, specific dates), the Safe Harbor Method offers a clear, auditable path to de-identification.
What is the Expert Determination Method for de-identification?
A qualified expert evaluates the dataset and its sharing context, applies Statistical Risk Assessment and Data Anonymization Techniques, and documents that the likelihood of re-identification is very small. The method is flexible, allowing more useful data when appropriate controls are in place.
When should each de-identification method be used?
Use Safe Harbor for routine, standardized releases where simplicity and speed matter. Choose Expert Determination when you need higher data utility (e.g., detailed dates or geography) or when sharing in controlled environments where tailored safeguards and a formal risk analysis are feasible.
Table of Contents
- Overview of HIPAA PII and PHI Identifiers
- Detailed Explanation of the 18 HIPAA Identifiers
- Safe Harbor De-Identification Method
- Expert Determination De-Identification Method
- Comparison of De-Identification Techniques
- Compliance Requirements for PHI Handling
- Best Practices for Data Privacy and Security
- FAQs.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.