Hardening Hadoop for Healthcare with Project Rhino

  • 1,130 views
Uploaded on

Big Data analytics is estimated to save over $450B in healthcare costs, and there is exciting adoption of big data platforms with healthcare payers and providers. Hadoop and cloud computing have …

Big Data analytics is estimated to save over $450B in healthcare costs, and there is exciting adoption of big data platforms with healthcare payers and providers. Hadoop and cloud computing have emerged as one of the most promising technologies for implementing big data at scale for healthcare workloads in production, using Hadoop as a service. Common considerations in the healthcare industry include privacy and data security, and the challenges of regulatory compliance with HIPAA and HITECH. Intel provides a security framework for Hadoop that enables enterprises to deploy big data analytics without compromising performance or security. Intel is contributing to a common security framework for Apache Hadoop, in the form of Project Rhino, which enables Hadoop to run workloads without compromising performance or security. Join this session to learn how your enterprise can take advantage of the security capabilities in the Intel Data Platform running on AWS to analyze healthcare data while ensuring technical safeguards that help you remain in compliance.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • I am a little disappointed the following was never mentioned in yesterday's talk: http://venturebeat.com/2014/03/26/intel-cloudera-hadoop/
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
1,130
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
94
Comments
1
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Secure Hadoop as a Service Vin Sharma, Intel March 26, 2014
  • 2. Who needs Hadoop security?
  • 3. Big Data Analytics in Health and Life Sciences Now: Disparate streams of data Next: Integrated computing and data Genomics Clinical Claims & transactions Meds & labs Patient experience Personal data Better decisions and outcomes at reduced cost Clinical Analysis Genomic Analysis From population- to person-based treatment
  • 4. Cost Savings via Big Data Analytics Provider Patient Payer Producer Regulator Personalized medicine Data-driven adherence Proven Pathways of care Co-ordinated across providers Shift volume to right setting Reducing ER (re)admit rates Provider / performance transparency & payment innovation Accelerated Approval Accelerated Discovery $180B $100B$100B $70B
  • 5. Compliance Requirements • HIPAA – Privacy Rule – Security Rule • Administrative Safeguards • Physical Safeguards • Technical Safeguards • Others… Provider Patient Payer Producer Regulator
  • 6. Technical Safeguards Access Control A covered entity must implement technical policies and procedures that allow only authorized persons to access electronic protected health information (e-PHI). Audit Controls A covered entity must implement hardware, software, and/or procedural mechanisms to record and examine access and other activity in information systems that contain or use e-PHI. Integrity Controls A covered entity must implement policies and procedures to ensure that e-PHI is not improperly altered or destroyed. Electronic measures must be put in place to confirm that e-PHI has not been improperly altered or destroyed. Transmission Security A covered entity must implement technical security measures that guard against unauthorized access to e-PHI that is being transmitted over an electronic network.
  • 7. Hadoop Security Challenges
  • 8. Hadoop Security Challenges HiveQL SqoopFlume Zookeeper Pig YARN (MRv2) HDFS 2.0 R connectorsGiraph HCatalog Hive HBase Coprocessors HBase Mahout Oozie Components of a typical Hadoop stack
  • 9. Hadoop Security Challenges Components sharing an authentication framework HiveQL SqoopFlume Zookeeper Pig YARN (MRv2) HDFS 2.0 R connectorsGiraph HCatalog Metadata Hive HBase Coprocessors HBase Mahout Oozie Data flow
  • 10. Hadoop Security Challenges Components capable of access control HiveQL SqoopFlume Zookeeper Pig YARN (MRv2) HDFS 2.0 R connectorsGiraph HCatalog Hive HBase Coprocessors HBase Mahout Oozie
  • 11. Hadoop Security Challenges Components capable of admission control HiveQL SqoopFlume Zookeeper Pig YARN (MRv2) HDFS 2.0 R connectorsGiraph HCatalog Hive HBase Coprocessors HBase Mahout Oozie
  • 12. Hadoop Security Challenges Components capable of (transparent) encryption HiveQL SqoopFlume Zookeeper Pig HDFS 2.0 R connectorsGiraph HCatalog Hive HBase Coprocessors HBase Mahout Oozie YARN (MRv2)
  • 13. Hadoop Security Challenges Components sharing a common policy engine HiveQL SqoopFlume Zookeeper Pig HDFS 2.0 R connectorsGiraph HCatalog Hive HBase Coprocessors HBase Mahout Oozie YARN (MRv2)
  • 14. Hadoop Security Challenges Components sharing a common audit log format HiveQL SqoopFlume Zookeeper Pig HDFS 2.0 R connectorsGiraph HCatalog Metadata Hive HBase Coprocessors HBase Mahout Data mining Oozie YARN (MRv2)
  • 15. Hardening Hadoop from within
  • 16. Project Rhino Encryption and Key Management Role Based Access Control Common Authorization Consistent Auditing
  • 17. Deliver defense in depth Firewall Gateway Authn AuthZ Encryption Audit & Alerts Isolation
  • 18. Protect Hadoop APIs • Enforces consistent security policies across all Hadoop services • Serves as a trusted proxy to Hadoop, Hbase, and WebHDFS APIs • Common Criteria EAL4+, HSM, FIPS 140-2 certified • Deploys as software, virtual appliance, or hardware appliance • Available on AWS Marketplace Hcatalog Stargate WebHDFS
  • 19. Provide role-based access control AuthZ • File, table, and cell-level access control in HBase • JIRA HBASE-6222: Add per-KeyValue security _acl_tabl
  • 20. Provide encryption for data at rest MapReduce RecordReader Map Combiner Partitioner Local Merge & Sort Reduce RecordWriter Decrypt Encrypt Derivative Encrypt Derivative Decrypt HDFS • Extends compression codec into crypto codec • Provides an abstract API for general use
  • 21. Provide encryption for data at rest HBase • Transparent table/CF encryption HBase- 7544
  • 22. Pig & Hive Encryption • Pig Encryption Capabilities – Support of text file and Avro* file format – Intermediate job output file protection – Pluggable key retrieving and key resolving – Protection of key distribution in cluster • Hive Encryption Capabilities – Support of RC file and Avro file format – Intermediate and final output data encryption – Encryption is transparent to end user without changing existing SQL
  • 23. Crypto Codec Framework • Extends compression codec • Establishes a common abstraction of the API level that can be shared by all crypto codec implementations CryptoCodec cryptoCodec = (CryptoCodec) ReflectionUtils.newInstance(codecClass, conf); CryptoContext cryptoContext = new CryptoContext(); ... cryptoCodec.setCryptoContext(cryptoContext); CompressionInputStream input = cryptoCodec.createInputStream(inputStream); ... • Provides a foundation for other components in Hadoop* such as MapReduce or HBase* to support encryption features
  • 24. Key Distribution • Enabling crypto codec in a MapReduce job • Enabling different key storage or management systems • Allowing different stages and files to use different keys • API to integrate with external key manage system
  • 25. Crypto Software Optimization Multi-Buffer Crypt • Process multiple independent data buffers in parallel • Improves cryptographic functionality up to 2-9X
  • 26. Intel® Data Protection Technology AES-NI • Processor assistance for performing AES encryption • Makes enabled encryption software faster and stronger Secure Key (DRNG) • Processor-based true random number generator • More secure, standards compliance, high performance Internet Data in Motion Secure transactions used pervasively in ecommerce, banking, etc. Data in Process Most enterprise and cloud applications offer encryption options to secure information and protect confidentiality Data at Rest Full disk encryption software protects data while saving to disk AES-NI - Advanced Encryption Standard New Instructions Secure Key - previously known as Intel Digital Random Number Generator (DRNG)
  • 27. Intel® AES-NI Accelerated Encryption 18.2x/19.8x Non Intel® AES-NI With Intel® AES-NI Intel® AES-NI Multi-Buffer 5.3x/19.8x Encryption Decryption Encryption Decryption AES-NI - Advanced Encryption Standard New Instructions 20X Faster Crypto Relative speed of crypto functions Higher is better Based on Intel tests
  • 28. Cloud Platform for secure Hadoop Intel® Xeon® Processors • E7 Family • E5 Family • E3 Family Amazon • EC2 Reserved Instances • EC2 Dedicated Instances
  • 29. 20 more at aws.amazon.com/ec2/instance-types Amazon EC2 Instances with AES-NI
  • 30. Resources
  • 31. For more information • intel.com/bigdata • intel.com/healthcare/bigdata • github.com/intel-hadoop/project-rhino/ • aws.amazon.com/compliance/ • aws.amazon.com/ec2/instance-types/
  • 32. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Secure Hadoop as a Service Vin Sharma, Intel March 26, 2014 Thank you!