HIPAA Compliance in the Cloud
Christopher Crosbie & Jonathan Fritz
CHRISTOPHER CROSBIE MPH, MS
HEALTHCARE AND LIFE SCIENCE SOLUTION ARCHITECT MANAGER
ccrosbie@amazon.com
An Expansive Ecosystem
Industry and world-spanning ecosystem
Cloud Computing: Rx for Healthcare
~83% of Healthcare organizations are using cloud services
and use is expected to grow in the future.
The most frequent uses today include hosting clinical
applications and/or data and the most common model seen
is SaaS.
Nearly all of the healthcare organizations presently using
cloud services plan to expand use of cloud services in the
future.*
* 2014 HIMSS Analytics Cloud Survey.
Collaborative Medical Research on AWS
Management
Application on
Amazon EC2
AWS Direct
Connect
bucket with
objects
vault
Metadata on
DynamoDB
Metadata
exposure via
Amazon
CloudSearch
Research
center data
center
External Researchers
RDS for Data
permission
management
Internet
gateway Analytics
Processing on
multiple
clusters
Lifecycle
polices
Collaborative Medical Research on AWS
Management
Application on
Amazon EC2
AWS Direct
Connect
bucket with
objects
vault
Metadata on
DynamoDB
Metadata
exposure via
Amazon
CloudSearch
Research
center data
center
External Researchers
RDS for Data
permission
management
Internet
gateway Analytics
Processing on
multiple
clusters
Lifecycle
polices
Amazon EMR – Hadoop in the Cloud
• Managed platform
• Launch a cluster in minutes
• Leverage the elasticity of the cloud
• Baked in security features
• Pay by the hour and save with Spot
• Flexibility to customize
HIPAA controls for Hadoop are relevant no matter
which distribution or cloud vendor you choose
Why is HIPAA compliance such
a hot topic with Hadoop?
Because it’s important, and it’s hard
HIPAA 101
• It’s HIPAA, not HIPPA
• HIPAA stands for the Health Insurance Portability and Accountability Act.
• HIPAA regulation, terms you should know
• Privacy rule
• Protected Health Information (PHI)
• Security rule
• Breach Notification rules
• Enforcement rules
• HHS Office for Civil Rights (OCR) conducts audits
• The Office of the National Coordinator for Health Information Technology (ONC)
• Omnibus Rule (2013)
A data storage company that has access to protected health information (whether digital or hard copy) qualifies as a business
associate, even if the entity does not view the information or only does so on a random or infrequent basis. Thus, document
storage companies maintaining 26 protected health information on behalf of covered entities are considered business associates,
regardless of whether they actually view the information they hold. To help clarify this point, WE HAVE MODIFIED THE DEFINITION
OF “BUSINESS ASSOCIATE” to generally provide that a business associate includes a person who “creates, receives, MAINTAINS,
OR TRANSMITS” protected health information on behalf of a covered entity.
Who is a Business Associate?
• A third party that creates, receives, maintains, or transmits protected
health information(PHI) on behalf of a health care provider,
clearinghouse or health plan. (covered entity)
• i.e. your cloud provider
https://aws.amazon.co
m/compliance/hipaa-
compliance/
https://cloud.google.co
m/security/compliance
https://www.microsoft.com/en-
us/TrustCenter/Compliance/HIPAA
Meeting BAA Requirements Example
AWS HIPAA Configuration Requirements
Customers must encrypt ePHI in transit and at rest
Customers must use EC2 Dedicated Instances for instances processing, storing, or transmitting
ePHI
Customers must record and retain activity related to use of and access to ePHI
11
Why can this be hard to meet with Hadoop?
Secure Infrastructure
Data Protection
Access Controls
Monitoring
Relies on the traditional data-center model
Data at rest (HDFS-TDE)
Data in-transit (Fragmented)
Authentication: MIT Kerberos !!!
Authorization (In-consistent)
Multiple options (Ganglia, Yarn Logs, Ambari)
HIPAA shouldn’t mean giving up on ease of use or introducing complexity
Hadoop in the cloud…
• Hadoop (and security) was designed for processing and assuming a
dedicated cluster and multi-user tenancy.
VS
• In the Cloud, resources are ephemeral and offers the most utilization
on a service/use based model
Encryption ComplianceSecurity
Fundamentals
• Private Subnets in VPC
• EC2 Security Groups
• Identity and Access
Management (IAM) policies
• Bucket policies
• Access Control Lists (ACLs)
• Query string authentication
• SSL endpoints
• Server Side Encryption
(SSE-S3)
• Server Side Encryption with
provided keys (SSE-C, SSE-
KMS)
• Client-side Encryption
• S3 bucket access logs
• Lifecycle management
policies
• Access Control Lists (ACLs)
• Versioning & MFA deletes
• Certifications – HIPAA, PCI,
SOC 1/2/3 etc.
Data Encryption
Amazon S3
Local FS
HDFS
Data Encryption At-Rest – Amazon S3 and EMRFS
Server-Side Encryption
- S3 managed keys (SSE-S3), AWS Key
Management Service keys (SSE-KMS), or
customer managed key (SSE-C)
- S3 Client with extra metadata
Client-Side Encryption
- Customer managed keys or AWS Key
Management Service
- Use a custom Encryption Materials
Provider with the S3 Encryption Client
S3 uses AES-256 with envelope encryption.
EMRFS makes S3 encryption transparent for
applications on your cluster.
Amazon S3
Data Encryption At-Rest – On Cluster
Local FS
- Need to encrypt scratch directories
- LUKS using random key or AWS Key
Management Service key
HDFS
- Need to encrypt intermediates or data
stored in HDFS
- HDFS transparent data encryption (HDFS-
6134)
- Use Hadoop KMS or Ranger KMS
Local FS
HDFS
Data at Rest– HDFS TDE
• HDFS encryption zones - encryption zone key
(EZK)
• Each File - unique data encryption key (DEK),
which is encrypted (EDEK)
• End-to-end (at-rest and in-transit) when
data is written to an encryption zone
• Uses Hadoop KMS with the Java
Cryptography Extension KeyStore (JCEKS)
EZK
DEK
EDEK
Data Encryption In-Flight
MapReduce Shuffle (Shuffle Service)
- Encrypted shuffle using SSL
Spark Shuffle (BlockTransferService)
- SASL encryption (digest-MD5)
- SSL for Akka and HTTP (for broadcast and fileServer)
HDFS Data Transfer
- Use HDFS TDE (encrypts client side)
- Or encrypt RPC (hadoop.rpc.protection) and Data
Transfer (dfs.encrypt.data.transfer)
Web UIs and clients
- HTTPS (if supported)
- Use SSH tunnels and port forwarding
SSL
Access Control
Different permissions in a cloud environment
• Who can launch a cluster?
• What other cloud services can a cluster access?
• What permissions do multiple users on a cluster have?
• How can permissions be stateless when clusters can be transient?
• You get to control who can do what in your
AWS environment when and from where
• Limit permissions using IAM users and account
federation with IAM roles
• Fine-grained control of your AWS cloud with
multi-factor authentication
• Integrate with your existing Active Directory
using federation and single sign-on
AWS account
owner
Network management Security management Server management Storage management
Control access and segregate duties everywhere
VPC private subnets to isolate network
• Use Amazon S3 Endpoints for connectivity
to S3
• Use Managed NAT for connectivity to other
services or the Internet
• Control the traffic using Security Groups
• ElasticMapReduce-Master-Private
• ElasticMapReduce-Slave-Private
• ElasticMapReduce-ServiceAccess
IAM roles limit service and cluster permissions
Service Role Cloud Resources
Kerberos for general on-cluster authentication
Automated scripts in Apache Bigtop to enable Kerberos and create trust with
AWS Directory Service or Active Directory (AWS Big Data Blog post coming soon).
LDAP authentication for secure entry points
https://blogs.aws.amazon.com/bigdata/post/Tx3J2RL8V6N72G7/Using-LDAP-via-AWS-Directory-
Service-to-Access-and-Administer-Your-Hadoop-Enviro
- Direct integration with:
HiveServer2, Presto, Hue,
Zeppelin (coming soon),
Phoenix, and other tools
- Easier to set up than
Kerberos, but more limited
Fine-Grained Access Controls / Authorization
HiveServer2
- SQL-standards based authorization on Hive tables and views
HBase
- Cell level access control
Ranger / Sentry + RecordService
- Plug-ins for a variety of Hadoop ecosystem projects
- Column level control for Hive tables
- Ranger bootstrap action for EMR available (AWS Big Data
Blog coming soon!)
3rd Party Solutions for access control and data masking
- BlueTalon, DataGuise, and more!
Monitoring and Auditing
Monitoring and auditing
Interaction with AWS environment
- AWS CloudTrail will record access to API calls and save logs in
your S3 buckets, no matter how those API calls were made
Access to objects in S3
- EMR can log user-defined information in S3 audit logs to track
which application accessed object
Hadoop ecosystem audit logging
- Access to logs generated by each application
- Ranger and Sentry also generate audit logs from activity
Ganglia and AWS CloudWatch for general monitoring
Conclusions
Security is critical
AWS has tools to make it easier
You can move fast and stay safe
Get started in minutes with EMR 4.7
Spark 1.6.1, Hadoop 2.7.2, Hive 1.0, Presto 0.147, HBase 1.2.1, Tez 0.8.3, Phoenix 4.7.0, Oozie 4.2.0, Zeppelin
0.5.6, Pig 0.14.0, Hue 3.7.1, Mahout 0.12.0, Sqoop 1.4.6, Hcatalog 1.0.0, ZooKeeper 3.4.8
Jon Fritz - jonfritz@amazon.com
Senior Product Manager
aws.amazon.com/emr

HIPAA Compliance in the Cloud

  • 1.
    HIPAA Compliance inthe Cloud Christopher Crosbie & Jonathan Fritz
  • 2.
    CHRISTOPHER CROSBIE MPH,MS HEALTHCARE AND LIFE SCIENCE SOLUTION ARCHITECT MANAGER ccrosbie@amazon.com
  • 3.
    An Expansive Ecosystem Industryand world-spanning ecosystem
  • 4.
    Cloud Computing: Rxfor Healthcare ~83% of Healthcare organizations are using cloud services and use is expected to grow in the future. The most frequent uses today include hosting clinical applications and/or data and the most common model seen is SaaS. Nearly all of the healthcare organizations presently using cloud services plan to expand use of cloud services in the future.* * 2014 HIMSS Analytics Cloud Survey.
  • 5.
    Collaborative Medical Researchon AWS Management Application on Amazon EC2 AWS Direct Connect bucket with objects vault Metadata on DynamoDB Metadata exposure via Amazon CloudSearch Research center data center External Researchers RDS for Data permission management Internet gateway Analytics Processing on multiple clusters Lifecycle polices
  • 6.
    Collaborative Medical Researchon AWS Management Application on Amazon EC2 AWS Direct Connect bucket with objects vault Metadata on DynamoDB Metadata exposure via Amazon CloudSearch Research center data center External Researchers RDS for Data permission management Internet gateway Analytics Processing on multiple clusters Lifecycle polices
  • 7.
    Amazon EMR –Hadoop in the Cloud • Managed platform • Launch a cluster in minutes • Leverage the elasticity of the cloud • Baked in security features • Pay by the hour and save with Spot • Flexibility to customize HIPAA controls for Hadoop are relevant no matter which distribution or cloud vendor you choose
  • 8.
    Why is HIPAAcompliance such a hot topic with Hadoop? Because it’s important, and it’s hard
  • 9.
    HIPAA 101 • It’sHIPAA, not HIPPA • HIPAA stands for the Health Insurance Portability and Accountability Act. • HIPAA regulation, terms you should know • Privacy rule • Protected Health Information (PHI) • Security rule • Breach Notification rules • Enforcement rules • HHS Office for Civil Rights (OCR) conducts audits • The Office of the National Coordinator for Health Information Technology (ONC) • Omnibus Rule (2013) A data storage company that has access to protected health information (whether digital or hard copy) qualifies as a business associate, even if the entity does not view the information or only does so on a random or infrequent basis. Thus, document storage companies maintaining 26 protected health information on behalf of covered entities are considered business associates, regardless of whether they actually view the information they hold. To help clarify this point, WE HAVE MODIFIED THE DEFINITION OF “BUSINESS ASSOCIATE” to generally provide that a business associate includes a person who “creates, receives, MAINTAINS, OR TRANSMITS” protected health information on behalf of a covered entity.
  • 10.
    Who is aBusiness Associate? • A third party that creates, receives, maintains, or transmits protected health information(PHI) on behalf of a health care provider, clearinghouse or health plan. (covered entity) • i.e. your cloud provider https://aws.amazon.co m/compliance/hipaa- compliance/ https://cloud.google.co m/security/compliance https://www.microsoft.com/en- us/TrustCenter/Compliance/HIPAA
  • 11.
    Meeting BAA RequirementsExample AWS HIPAA Configuration Requirements Customers must encrypt ePHI in transit and at rest Customers must use EC2 Dedicated Instances for instances processing, storing, or transmitting ePHI Customers must record and retain activity related to use of and access to ePHI 11
  • 12.
    Why can thisbe hard to meet with Hadoop? Secure Infrastructure Data Protection Access Controls Monitoring Relies on the traditional data-center model Data at rest (HDFS-TDE) Data in-transit (Fragmented) Authentication: MIT Kerberos !!! Authorization (In-consistent) Multiple options (Ganglia, Yarn Logs, Ambari) HIPAA shouldn’t mean giving up on ease of use or introducing complexity
  • 13.
    Hadoop in thecloud… • Hadoop (and security) was designed for processing and assuming a dedicated cluster and multi-user tenancy. VS • In the Cloud, resources are ephemeral and offers the most utilization on a service/use based model
  • 14.
    Encryption ComplianceSecurity Fundamentals • PrivateSubnets in VPC • EC2 Security Groups • Identity and Access Management (IAM) policies • Bucket policies • Access Control Lists (ACLs) • Query string authentication • SSL endpoints • Server Side Encryption (SSE-S3) • Server Side Encryption with provided keys (SSE-C, SSE- KMS) • Client-side Encryption • S3 bucket access logs • Lifecycle management policies • Access Control Lists (ACLs) • Versioning & MFA deletes • Certifications – HIPAA, PCI, SOC 1/2/3 etc.
  • 15.
  • 16.
    Data Encryption At-Rest– Amazon S3 and EMRFS Server-Side Encryption - S3 managed keys (SSE-S3), AWS Key Management Service keys (SSE-KMS), or customer managed key (SSE-C) - S3 Client with extra metadata Client-Side Encryption - Customer managed keys or AWS Key Management Service - Use a custom Encryption Materials Provider with the S3 Encryption Client S3 uses AES-256 with envelope encryption. EMRFS makes S3 encryption transparent for applications on your cluster. Amazon S3
  • 17.
    Data Encryption At-Rest– On Cluster Local FS - Need to encrypt scratch directories - LUKS using random key or AWS Key Management Service key HDFS - Need to encrypt intermediates or data stored in HDFS - HDFS transparent data encryption (HDFS- 6134) - Use Hadoop KMS or Ranger KMS Local FS HDFS
  • 18.
    Data at Rest–HDFS TDE • HDFS encryption zones - encryption zone key (EZK) • Each File - unique data encryption key (DEK), which is encrypted (EDEK) • End-to-end (at-rest and in-transit) when data is written to an encryption zone • Uses Hadoop KMS with the Java Cryptography Extension KeyStore (JCEKS) EZK DEK EDEK
  • 19.
    Data Encryption In-Flight MapReduceShuffle (Shuffle Service) - Encrypted shuffle using SSL Spark Shuffle (BlockTransferService) - SASL encryption (digest-MD5) - SSL for Akka and HTTP (for broadcast and fileServer) HDFS Data Transfer - Use HDFS TDE (encrypts client side) - Or encrypt RPC (hadoop.rpc.protection) and Data Transfer (dfs.encrypt.data.transfer) Web UIs and clients - HTTPS (if supported) - Use SSH tunnels and port forwarding SSL
  • 20.
  • 21.
    Different permissions ina cloud environment • Who can launch a cluster? • What other cloud services can a cluster access? • What permissions do multiple users on a cluster have? • How can permissions be stateless when clusters can be transient?
  • 22.
    • You getto control who can do what in your AWS environment when and from where • Limit permissions using IAM users and account federation with IAM roles • Fine-grained control of your AWS cloud with multi-factor authentication • Integrate with your existing Active Directory using federation and single sign-on AWS account owner Network management Security management Server management Storage management Control access and segregate duties everywhere
  • 23.
    VPC private subnetsto isolate network • Use Amazon S3 Endpoints for connectivity to S3 • Use Managed NAT for connectivity to other services or the Internet • Control the traffic using Security Groups • ElasticMapReduce-Master-Private • ElasticMapReduce-Slave-Private • ElasticMapReduce-ServiceAccess
  • 24.
    IAM roles limitservice and cluster permissions Service Role Cloud Resources
  • 25.
    Kerberos for generalon-cluster authentication Automated scripts in Apache Bigtop to enable Kerberos and create trust with AWS Directory Service or Active Directory (AWS Big Data Blog post coming soon).
  • 26.
    LDAP authentication forsecure entry points https://blogs.aws.amazon.com/bigdata/post/Tx3J2RL8V6N72G7/Using-LDAP-via-AWS-Directory- Service-to-Access-and-Administer-Your-Hadoop-Enviro - Direct integration with: HiveServer2, Presto, Hue, Zeppelin (coming soon), Phoenix, and other tools - Easier to set up than Kerberos, but more limited
  • 27.
    Fine-Grained Access Controls/ Authorization HiveServer2 - SQL-standards based authorization on Hive tables and views HBase - Cell level access control Ranger / Sentry + RecordService - Plug-ins for a variety of Hadoop ecosystem projects - Column level control for Hive tables - Ranger bootstrap action for EMR available (AWS Big Data Blog coming soon!) 3rd Party Solutions for access control and data masking - BlueTalon, DataGuise, and more!
  • 28.
  • 29.
    Monitoring and auditing Interactionwith AWS environment - AWS CloudTrail will record access to API calls and save logs in your S3 buckets, no matter how those API calls were made Access to objects in S3 - EMR can log user-defined information in S3 audit logs to track which application accessed object Hadoop ecosystem audit logging - Access to logs generated by each application - Ranger and Sentry also generate audit logs from activity Ganglia and AWS CloudWatch for general monitoring
  • 30.
    Conclusions Security is critical AWShas tools to make it easier You can move fast and stay safe Get started in minutes with EMR 4.7 Spark 1.6.1, Hadoop 2.7.2, Hive 1.0, Presto 0.147, HBase 1.2.1, Tez 0.8.3, Phoenix 4.7.0, Oozie 4.2.0, Zeppelin 0.5.6, Pig 0.14.0, Hue 3.7.1, Mahout 0.12.0, Sqoop 1.4.6, Hcatalog 1.0.0, ZooKeeper 3.4.8 Jon Fritz - jonfritz@amazon.com Senior Product Manager aws.amazon.com/emr

Editor's Notes

  • #4 Some other companies to highlight besides the obvious: IMS Health; Medidata (clinical trials management); DNANexus and Seven Bridges Genomics (Genomics pipelines)
  • #5 Healthcare organizations are using cloud services and use is expected to grow in the future. Eighty- three (83) percent of respondents use cloud services in some capacity at their organization, with the most frequent use being to host clinical applications and/or data. Most healthcare organizations use a SaaS model to support their cloud services. Nearly all of the healthcare organizations presently using cloud services plan to expand use of cloud services in the future. 2014 HIMSS Analytics Cloud Survey.
  • #6 More focus on security and control.
  • #7 More focus on security and control.
  • #9 By unpacking this question of why security comes first we’ll see how you can attain greater security on AWS. We’ll take a 5 why’s approach. This is the first one. We bring it up – it’s job zero for AWS and we don’t have a business if we can’t secure our services. We know it’s going to come up eventually /…./ the CISO will find out and we want to get ahead of it… - prevent anti-patterns from emerging and introduce new ways of to improve security as you transition workloads to AWS. - Most of all //…/ we bring it up to earn the trust of our customers. Your data on AWS represents your business and your livelihood, the trust with your customers and the privacy and security of real people. For many enterprises, even those several months /…/ or even a couple of years into their moves to the cloud /…./ it’s still new territory. If you’ve operated data centers and traditional networks for 40 years /…/ some of this is going to feel new. Customers want to know that they can maintain a sense of control /…/ so we help guide them as they chart new territory. Lastly /…/ enterprise security has always been just been plain hard and our customers want to get ahead of it. When did you wrap up an audit with regulators /…/ or even internal audit and say, /…./ “wow, that was sure easy!” - It’s a cost center /…/ so getting prioritization requires a Herculean leadership effort. - The threat landscape and sophistication of attacks is always changing /…./ and very fast - Regulatory environment changes/…/ especially as a business expands globally /…/ and enters new markets with different data and privacy requirements. So enterprise security comes first … … because it’s so hard? [NEXT SLIDE]
  • #10 OCR moving to phase II ONC makes the rules
  • #23 With AWS Identity and Access Management tools you get to define which of your users get to do what – in the same was as you define role-based access controls within your environment today. You can use hardware token based or software/mobile based multifactor authentication to add an extra level of assurance for your more sensitive applications, This can all integrate with your on premises environment by integrating with your existing corporate directory, and implementing federation and single sign on so that this becomes a seamless experience for your customers.
  • #25 With AWS Identity and Access Management tools you get to define which of your users get to do what – in the same was as you define role-based access controls within your environment today. You can use hardware token based or software/mobile based multifactor authentication to add an extra level of assurance for your more sensitive applications, This can all integrate with your on premises environment by integrating with your existing corporate directory, and implementing federation and single sign on so that this becomes a seamless experience for your customers.
  • #30 We also have a number of tools for monitoring activity in the environment. CloudTrail is our service that logs all API calls, including console activities, command line instructions. It logs exactly who did what, when and from where. That means you have full visibility into and accesses, changes or activity within your AWS environment. You can save these logs into your S3 buckets, and the only cost to you is the cost of that storage. A growing number of AWS services are CloudTrail enabled including EC2, EBS, VPC IAM and RedShift. This means that you can easily aggregate logs and track activity If you already have a SIEM or log management solution, then a growing number of them support collecting CloudTrail logs. This includes Splunk, AlertLogic and SumoLogic