PHI Definition Under HIPAA: Legal Criteria, Exclusions, and De‑Identification Explained
Definition of PHI Under HIPAA
Protected Health Information (PHI) is a subset of Individually Identifiable Health Information that a Covered Entity or its Business Associate creates, receives, maintains, or transmits. It identifies a person—or there is a reasonable basis to believe it could—and it relates to health status, health care, or payment for health care.
PHI can exist in any medium: electronic records, paper files, images, audio, or spoken word. If the information can reasonably identify an individual and it is handled by a Covered Entity or Business Associate in connection with care, coverage, or payment, it is PHI.
Key elements
- Individually Identifiable Health Information (identity known or reasonably inferable).
- Created or received by a Covered Entity or Business Associate.
- Relates to health condition, health care, or payment for care.
- Maintained or transmitted in any form or medium.
Common examples
- A claim file with name, medical record number, diagnoses, and dates of service.
- Provider notes, lab results, X‑ray images with embedded identifiers, or pharmacy dispensing logs.
- Call recordings and appointment schedules that include patient names and phone numbers.
Legal Criteria for PHI
Legally, information is PHI when it meets all of the following: it is Individually Identifiable Health Information; it is created or received by a Covered Entity (health plans, most health care providers, health care clearinghouses) or a Business Associate acting for them; it pertains to an individual’s past, present, or future health, care, or payment; and it is maintained or transmitted in any form.
The “reasonable basis” test matters. Even without explicit identifiers, a data set can be PHI if, combined with other available information, it could reasonably identify a person in that context.
Covered Entities and Business Associates
A Covered Entity includes health plans, health care clearinghouses, and health care providers who conduct standard electronic transactions. A Business Associate is a vendor or partner—such as a cloud host, billing service, or analytics firm—that handles PHI on behalf of a Covered Entity.
Individually Identifiable Health Information
Individually Identifiable Health Information (IIHI) links health details to an identifiable person (directly or indirectly). Names and medical record numbers are obvious; combinations like rare conditions plus small‑area geography may also identify someone when context makes re‑identification reasonably possible.
Exclusions from PHI
Some information that involves health is not PHI because of how the law draws boundaries. These exclusions prevent HIPAA from overlapping with other regimes or non‑health‑care contexts.
- De‑identified information: Data that meets HIPAA’s de‑identification standards is not PHI.
- Education records and student treatment records covered by FERPA: Health information in school files is typically governed by FERPA, not HIPAA.
- Employment records held by an employer (including a Covered Entity in its role as employer): Medical certifications and workplace accommodations, including Family and Medical Leave Act documentation, are employment records—therefore not PHI—though other laws still protect them.
- Information about a person deceased for more than 50 years.
- Health information held by entities that are not Covered Entities or Business Associates and are not acting on their behalf (for example, many direct‑to‑consumer fitness apps in a purely consumer context).
Note: A Limited Data Set (a research/public health option that retains certain dates and geography) is still PHI and requires a data use agreement; it is not an exclusion.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.
De-Identification of PHI
Once PHI is properly de‑identified, it is no longer PHI and HIPAA’s privacy rules no longer apply to the resulting data. HIPAA recognizes two pathways: the Safe Harbor Method and the Statistical De‑Identification (Expert Determination) method.
Under Safe Harbor, you remove specific direct identifiers and have no actual knowledge that remaining data could identify an individual. Under Statistical De‑Identification, a qualified expert applies accepted principles and determines the re‑identification risk is very small, documenting the methods and results.
De‑identification differs from creating a Limited Data Set. A Limited Data Set still contains identifiers like certain dates and geographic details and remains PHI subject to a data use agreement; fully de‑identified data does not.
Methods of De-Identification
Safe Harbor Method (remove specific direct identifiers)
Remove all of the following from the data and ensure you have no actual knowledge of residual identifiability:
- Names.
- All geographic subdivisions smaller than a state, including street address, city, county, precinct, and ZIP Code (except the initial three digits if the combined 3‑digit area has more than 20,000 people; otherwise use 000).
- All elements of dates (except year) directly related to an individual (e.g., birth date, admission, discharge, death); plus aggregate any age over 89 into a single “age 90 or older” category.
- Telephone numbers.
- Fax numbers.
- Email addresses.
- Social Security numbers.
- Medical record numbers.
- Health plan beneficiary numbers.
- Account numbers.
- Certificate/license numbers.
- Vehicle identifiers and serial numbers, including license plates.
- Device identifiers and serial numbers.
- Web URLs.
- IP addresses.
- Biometric identifiers (e.g., finger and voice prints).
- Full‑face photographs and comparable images.
- Any other unique identifying number, characteristic, or code (except a re‑identification code permitted by HIPAA that is not derived from the removed identifiers).
Statistical De-Identification (Expert Determination)
A qualified expert with appropriate knowledge applies accepted statistical and scientific principles to conclude that the risk of re‑identification is very small, given the data, recipients, and context. The expert must document the methods, assumptions, and results.
Typical techniques
- Generalization and binning (e.g., age bands, broader geography).
- Suppression or masking of rare combinations and outliers.
- Noise addition, data swapping, and rounding.
- k‑anonymity, l‑diversity, t‑closeness, or differential privacy approaches where appropriate.
- Risk assessment against external data sources and anticipated data linkages.
Limited Data Set (for contrast)
A Limited Data Set removes most direct identifiers but may retain dates, city, state, ZIP Code, and ages. It is still PHI and requires a Data Use Agreement specifying permitted uses, safeguards, and no re‑identification or contact without authorization.
Practical tips and common pitfalls
- Scrub free‑text fields for hidden identifiers; they often contain names, initials, or locations.
- Inspect image metadata (e.g., DICOM tags) and document properties for embedded identifiers.
- Under Safe Harbor, ensure small‑area ZIP rules and the “age 90+” aggregation are applied consistently.
- For Expert Determination, tailor risk analysis to the release environment (public vs. restricted).
- If creating a re‑identification code, ensure it is not derived from removed identifiers and store the key separately.
Conclusion
In practice, PHI hinges on identifiability, context, and who holds the data. Understand the legal criteria, recognize exclusions like FERPA records and employment files under the Family and Medical Leave Act, and choose the appropriate de‑identification pathway—Safe Harbor or Statistical De‑Identification—based on your data and risk tolerance.
FAQs.
What information qualifies as PHI under HIPAA?
PHI is Individually Identifiable Health Information that a Covered Entity or Business Associate creates, receives, maintains, or transmits, relating to an individual’s health status, health care, or payment for care. It includes obvious identifiers (names, medical record numbers) and any data that could reasonably identify a person in context.
What are the main exclusions from PHI?
De‑identified data, education and student treatment records covered by FERPA, employment records (including Family and Medical Leave Act documentation) held by an employer, certain information about individuals deceased more than 50 years, and health information held solely by non‑covered entities not acting for a Covered Entity are not PHI.
How is PHI de-identified according to HIPAA?
HIPAA allows two options: remove specified identifiers under the Safe Harbor Method and ensure you have no actual knowledge of residual identifiability, or obtain an expert’s Statistical De‑Identification determination that the re‑identification risk is very small, with documented methods and results.
What methods are approved for de-identifying PHI?
The approved methods are the Safe Harbor Method (removal of 18 direct identifiers with “no actual knowledge” of identifiability) and Statistical De‑Identification (Expert Determination) using accepted scientific techniques such as generalization, suppression, and controlled noise to reduce re‑identification risk to a very small level.
Ready to simplify HIPAA compliance?
Join thousands of organizations that trust Accountable to manage their compliance needs.