Databricks HIPAA Compliance: Requirements, BAA, and Best Practices for Protecting PHI

Check out the new compliance progress tracker


Product Pricing Demo Video Free HIPAA Training
LATEST
video thumbnail
Admin Dashboard Walkthrough Jake guides you step-by-step through the process of achieving HIPAA compliance
Ready to get started? Book a demo with our team
Talk to an expert

Databricks HIPAA Compliance: Requirements, BAA, and Best Practices for Protecting PHI

Kevin Henry

HIPAA

May 04, 2025

9 minutes read
Share this article
Databricks HIPAA Compliance: Requirements, BAA, and Best Practices for Protecting PHI

HIPAA Overview and Applicability

Who is covered and when Databricks applies

HIPAA applies to covered entities and their business associates whenever Protected Health Information (PHI) is created, received, maintained, or transmitted. If you use Databricks to ingest, transform, analyze, or serve data that includes PHI, your Databricks workspace becomes part of your HIPAA-regulated environment and must be configured accordingly.

Typical use cases include claims analytics, clinical quality measurement, patient engagement, and machine learning on de-identified datasets with selective re-identification under strict controls. Because Databricks provides platform services, you must combine platform capabilities with your organizational HIPAA program to satisfy the HIPAA Security Rule.

Key HIPAA Security Rule concepts

The HIPAA Security Rule requires Administrative Safeguards (risk analysis, policies, workforce training), Physical Safeguards (facility and device protections handled largely by the cloud provider), and Technical Safeguards (access controls, encryption, integrity, audit controls). On Databricks, you address these by enforcing identity and access management, enabling encryption, centralizing audit logs, and hardening workspaces with the Compliance Security Profile.

Databricks and PHI scope

Only deploy PHI to HIPAA-eligible workspaces and services covered by your Business Associate Agreement (BAA). Use dedicated workspaces for regulated workloads, segregate development and production, and ensure downstream systems that consume outputs are also HIPAA-ready. Treat notebooks, job outputs, logs, and temporary storage as potentially sensitive unless explicitly de-identified.

Business Associate Agreement (BAA) Requirements

When a BAA is required

A BAA is required any time Databricks, as a service provider, may create, receive, maintain, or transmit PHI on your behalf. If you are a covered entity or a business associate handling PHI, you must have a signed BAA with Databricks before placing PHI in the platform.

What the BAA should cover

Your BAA should define permitted uses and disclosures, mandate safeguards consistent with the HIPAA Security Rule, require breach reporting timelines, flow down obligations to subcontractors, and specify return or destruction of PHI upon termination. It should also outline which features, regions, and services are in scope for HIPAA use.

Service scope and operational guardrails

BAAs typically limit use to HIPAA-eligible services and may exclude certain preview or experimental features. Confirm that your workspaces are flagged for HIPAA use, that the Compliance Security Profile is enabled, and that you follow customer responsibilities for access controls, encryption, logging, and network isolation described in your agreement.

Enabling HIPAA Compliance Controls on Databricks

Prerequisites

  • A signed Business Associate Agreement (BAA) with Databricks covering your intended regions and services.
  • Dedicated HIPAA workspaces (separate from non-regulated projects) and an approved architecture for private networking.
  • Customer-Managed Encryption Keys (CMK) available in your cloud KMS for workspace and storage encryption.

Account and workspace configuration

  • Enable the Compliance Security Profile to apply hardened defaults, restrict risky features, enforce secure cluster policies, and improve auditability.
  • Integrate identity via SSO and SCIM to centralize authentication and provisioning; disable local users and require MFA through your IdP.
  • Adopt least-privilege RBAC with Unity Catalog, using groups, catalogs, schemas, and tables to model access boundaries aligned to data sensitivity.
  • Define cluster policies that prevent insecure settings (e.g., unrestricted libraries, wide-open networks, or long-running interactive clusters handling PHI).

Data governance and access control

  • Implement fine-grained permissions, dynamic views for row/column filters, and masking for direct identifiers.
  • Use separate storage locations for PHI and non-PHI; apply table- and storage-level authorization consistently.
  • Prohibit PHI in notebook titles, comments, or tags; use parameterization to avoid embedding identifiers in code.

Network isolation and egress control

  • Deploy private networking with VPC/VNet injection and private endpoints to keep control, data, and compute planes off the public internet.
  • Restrict egress with allowlists, NAT rules, or firewall policies; disable outbound internet where not required.
  • Isolate data pipelines from user internet access; prefer job clusters and service principals for machine-to-machine workloads.

Audit logging and evidence collection

  • Enable audit logs at the account level and forward them to a secure, immutable location for retention and monitoring.
  • Collect system, job, query, and access logs; monitor changes to permissions, clusters, secrets, and network settings.
  • Automate evidence generation for periodic audits (e.g., access reviews, key rotation, and policy compliance checks).

Shared Responsibility Model for HIPAA Compliance

Databricks responsibilities

Databricks is responsible for the security of the managed platform services, including control plane protections, patching of managed components, service availability, and platform-level encryption in supported regions. Under the Compliance Security Profile, Databricks enforces hardened defaults and disables features that could undermine safeguards.

Your responsibilities

You are responsible for classifying data, executing a risk analysis, enabling Technical Safeguards (access control, encryption, audit), implementing Administrative Safeguards (policies, training), configuring network isolation, managing keys, and validating that only HIPAA-eligible services are used. You must also secure code, notebooks, libraries, and downstream integrations and maintain incident response and breach notification processes.

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Best Practices for Protecting PHI in Databricks Workspaces

Identity, authorization, and segregation

  • Use SSO with SCIM, enforce MFA, and adopt least privilege with group-based RBAC in Unity Catalog.
  • Segregate environments: dedicated HIPAA workspaces; separate dev/test from production; restrict interactive clusters for PHI.
  • Implement break-glass access with strict approvals and short, audited elevations.

Compute, clusters, and notebooks

  • Prefer ephemeral job clusters for PHI pipelines; enforce auto-termination and minimal local storage.
  • Lock down cluster policies: disallow public IPs, require approved images, restrict init scripts and library sources.
  • Avoid PHI in notebook text, comments, or display outputs; purge cached results and control export features.

Data minimization and de-identification

  • Apply the minimum necessary standard; tokenize or hash direct identifiers, and store linkage keys separately.
  • Use dynamic data masking and column-level controls to restrict access to direct identifiers and sensitive attributes.
  • When feasible, work with de-identified datasets and re-identify only in tightly controlled, audited steps.

Key management and secrets

  • Use Customer-Managed Encryption Keys for workspace, managed services data, and object storage; rotate keys on a defined cadence.
  • Store credentials in secrets; disable plaintext credentials in notebooks; audit secret access regularly.
  • Limit service principal permissions; prefer short-lived tokens and scoped credentials.

Data lifecycle and retention

  • Define retention schedules for raw, curated, and derived PHI; apply lifecycle policies to storage locations.
  • Automate secure deletion or archival; ensure backup/restore processes preserve encryption and access controls.
  • Validate that lineage and tags for PHI propagate across pipelines to prevent accidental exposure.

Encryption and Security Controls

Encryption at rest

Enable server-side encryption for all storage that may contain PHI and use Customer-Managed Encryption Keys to control cryptographic material, support key rotation, and satisfy separation-of-duties requirements. Ensure that both metastore/managed services data and external object storage are covered.

Encryption in transit

Require TLS for all connections to control and data planes. For internal services, use private endpoints and disallow downgrade or insecure ciphers. Validate certificate pinning and modern protocol versions in client libraries where configurable.

Secrets and credential hygiene

Use secret scopes to store database credentials, service keys, and tokens. Rotate credentials, restrict read access to least privilege, and prevent secrets from appearing in logs or query history via redaction and coding patterns.

Network security and isolation

Adopt private networking, deny public IPs for compute, and control outbound egress with allowlists. Use firewall rules, security groups, and route tables to prevent exfiltration and to limit access to only approved data stores and services.

Logging, integrity, and tamper resistance

Centralize audit logs in a write-once or tamper-evident store. Capture admin, data access, permission changes, cluster events, and notebook activity. Implement integrity checks and alerts for anomalous access patterns and configuration drift.

Managing Compliance Risk and Monitoring

Risk analysis and controls mapping

Perform a formal risk analysis that inventories PHI data flows, identifies threats, and maps mitigations to Administrative Safeguards and Technical Safeguards. Document control ownership, validation steps, and acceptance of residual risk.

Continuous monitoring and detection

Stream audit logs to your SIEM; create alerts for high-risk events such as permission changes, failed logins, unrestricted clusters, or data exfiltration attempts. Periodically test access controls, review entitlements, and verify cluster policy compliance.

Evidence, training, and audits

Automate evidence collection for keys, policies, and access reviews. Train data engineers and scientists on PHI handling, masking, and secure coding. Schedule internal audits to confirm adherence to your policies and your BAA obligations.

Incident response and resilience

Define triage and escalation playbooks, run tabletop exercises, and test backup/restore for critical datasets. Ensure breach notification procedures and contacts are current, and that business continuity plans address regulated workloads.

Conclusion

Databricks HIPAA compliance hinges on the right contract (BAA), the right controls (Compliance Security Profile, CMK, RBAC, private networking), and disciplined operations (monitoring, key rotation, access reviews). By aligning platform capabilities to the HIPAA Security Rule and following best practices, you can protect PHI while enabling secure analytics and machine learning.

FAQs.

What is required to enable HIPAA compliance controls in Databricks?

You need a signed BAA, dedicated HIPAA workspaces, the Compliance Security Profile enabled, SSO/SCIM for identity, Customer-Managed Encryption Keys for at-rest encryption, private networking with controlled egress, and centralized audit logs. You must also enforce cluster policies and Unity Catalog RBAC for least-privilege access.

How does the shared responsibility model affect HIPAA compliance?

Databricks secures the managed platform and enforces hardened defaults, while you configure and operate controls for your data, identities, networks, keys, and workloads. Compliance depends on how you implement Administrative and Technical Safeguards and on using only HIPAA-eligible services under your BAA.

What best practices ensure PHI protection on Databricks?

Use least-privilege RBAC with Unity Catalog, segregate HIPAA workspaces, prefer ephemeral job clusters, prevent PHI in notebooks and logs, tokenize or mask identifiers, manage secrets centrally, apply Customer-Managed Encryption Keys, restrict egress via private networking, and monitor audit logs with policy-driven alerts.

Is a Business Associate Agreement mandatory with Databricks for handling PHI?

Yes. If Databricks will create, receive, maintain, or transmit PHI on your behalf, a Business Associate Agreement is mandatory. Do not store or process PHI in Databricks until your BAA is executed and your workspaces and controls are configured for HIPAA use.

Share this article

Ready to simplify HIPAA compliance?

Join thousands of organizations that trust Accountable to manage their compliance needs.

Related Articles