AWS Government, Education, &
Nonprofits Symposium
Canberra, Australia | May 20, 2014
Secure Hadoop as a Service
Peter Kern...
Who needs Hadoop security?
Big Data Analytics in Health and Life Sciences
Now: Disparate
streams of data
Genomics
Clinical
Claims &
transactions
Meds...
Cost Savings via Big Data Analytics
Provider
Patient
Payer
Producer
RegulatorAccelerated Approval
$70B
Accelerated Discove...
Compliance Requirements
•  HIPAA
–  Privacy Rule
–  Security Rule
•  Administrative Safeguards
•  Physical Safeguards
•  T...
Technical Safeguards
Access Control A covered entity must implement technical policies and
procedures that allow only auth...
Hadoop Security Challenges
Hadoop Security Challenges
HiveQL	
  
Sqoop	
  	
  Flume	
  
Zookeeper	
  
Pig	
  
YARN	
  (MRv2)	
  
HDFS	
  2.0	
  
R	
 ...
Hadoop Security Challenges
Components sharing an authentication framework
HiveQL	
  
Sqoop	
  	
  Flume	
  	
  
Zookeeper	...
Hadoop Security Challenges
Components capable of access control
HiveQL	
  
Sqoop	
  	
  Flume	
  	
  
Zookeeper	
  
	
  
P...
Hadoop Security Challenges
Components capable of admission control
HiveQL	
  
Sqoop	
  	
  Flume	
  	
  
Zookeeper	
  
Pig...
Hadoop Security Challenges
Components capable of (transparent) encryption
HiveQL	
  
Sqoop	
  	
  Flume	
  	
  
Zookeeper	...
Hadoop Security Challenges
Components sharing a common policy engine
HiveQL	
  
Sqoop	
  	
  Flume	
  	
  
Zookeeper	
  
P...
Hadoop Security Challenges
Components sharing a common audit log format
HiveQL	
  
Sqoop	
  	
  Flume	
  	
  
Zookeeper	
 ...
Hardening Hadoop from within
Project Rhino
Encryption and Key Management
Role Based Access Control
Common Authorization
Consistent Auditing
Deliver def...
Protect Hadoop APIs
•  Enforces consistent security policies across all Hadoop
services
•  Serves as a trusted proxy to Ha...
Provide role-based access control
AuthZ
•  File, table, and cell-level
access control in HBase
•  JIRA HBASE-6222:
Add per...
Provide encryption for data at rest
MapReduce
RecordReader
Map
Combiner
Partitioner
Local
Merge & Sort
Reduce
RecordWriter...
Provide encryption for data at rest
HBase •  Transparent table/CF encryption
HBase-7544
Pig & Hive Encryption
•  Pig Encryption Capabilities
–  Support of text file and Avro* file format
–  Intermediate job out...
Crypto Codec Framework
•  Extends compression codec
•  Establishes a common abstraction of the API level that can be share...
Key Distribution
•  Enabling crypto codec in a MapReduce job
•  Enabling different key storage or management systems
•  Al...
Crypto Software Optimization
Multi-Buffer Crypt
•  Process multiple independent
data buffers in parallel
•  Improves crypt...
Intel® Data Protection Technology
AES-NI
•  Processor assistance for
performing AES encryption
•  Makes enabled encryption...
Intel® AES-NI Accelerated Encryption
18.2x/19.8x
Non Intel®
AES-NI
With Intel®
AES-NI
Intel® AES-NI
Multi-Buffer
5.3x/19.8...
Cloud Platform for secure Hadoop
Intel® Xeon® Processors
•  E7 Family
•  E5 Family
•  E3 Family
Amazon
•  EC2 Reserved Ins...
20 more at aws.amazon.com/ec2/instance-types
Amazon EC2 Instances with AES-NI
Resources
For more information
•  intel.com/bigdata
•  intel.com/healthcare/bigdata
•  github.com/intel-hadoop/project-rhino/
•  aws...
THANK YOU
Please give us your feedback by filling out the Feedback Forms
AWS Government, Education, &
Nonprofits Symposium...
Upcoming SlideShare
Loading in …5
×

AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service

724 views
563 views

Published on

Intel is contributing to a common security framework for Apache Hadoop, in the form of Project Rhino, which enables Hadoop to run workloads without compromising performance or security. Join this session to learn how your enterprise can take advantage of the security capabilities in the Intel Data Platform running on AWS to analyze data while ensuring technical safeguards that help you remain in compliance.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
724
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

AWS Public Sector Symposium 2014 Canberra | Secure Hadoop as a Service

  1. 1. AWS Government, Education, & Nonprofits Symposium Canberra, Australia | May 20, 2014 Secure Hadoop as a Service Peter Kerney Senior Solutions Architect, Intel
  2. 2. Who needs Hadoop security?
  3. 3. Big Data Analytics in Health and Life Sciences Now: Disparate streams of data Genomics Clinical Claims & transactions Meds & labs Patient experience Personal data Next: Integrated computing and data Better decisions and outcomes at reduced cost Clinical Analysis Genomic Analysis From population- to person-based treatment
  4. 4. Cost Savings via Big Data Analytics Provider Patient Payer Producer RegulatorAccelerated Approval $70B Accelerated Discovery $100B Provider / performance transparency & payment innovation $100B Personalized medicine Data-driven adherence $180B Proven Pathways of care Co-ordinated across providers Shift volume to right setting Reducing ER (re)admit rates
  5. 5. Compliance Requirements •  HIPAA –  Privacy Rule –  Security Rule •  Administrative Safeguards •  Physical Safeguards •  Technical Safeguards •  Others… Provider Patient Payer Producer Regulator
  6. 6. Technical Safeguards Access Control A covered entity must implement technical policies and procedures that allow only authorized persons to access electronic protected health information (e-PHI). Audit Controls A covered entity must implement hardware, software, and/or procedural mechanisms to record and examine access and other activity in information systems that contain or use e-PHI. Integrity Controls A covered entity must implement policies and procedures to ensure that e-PHI is not improperly altered or destroyed. Electronic measures must be put in place to confirm that e-PHI has not been improperly altered or destroyed. Transmission Security A covered entity must implement technical security measures that guard against unauthorized access to e-PHI that is being transmitted over an electronic network.
  7. 7. Hadoop Security Challenges
  8. 8. Hadoop Security Challenges HiveQL   Sqoop    Flume   Zookeeper   Pig   YARN  (MRv2)   HDFS  2.0   R  connectors  Giraph   HCatalog   Hive   HBase  Coprocessors   HBase   Mahout   Oozie   Components of a typical Hadoop stack
  9. 9. Hadoop Security Challenges Components sharing an authentication framework HiveQL   Sqoop    Flume     Zookeeper   Pig   YARN  (MRv2)   HDFS  2.0   R  connectors  Giraph   HCatalog   Metadata   Hive   HBase  Coprocessors   HBase   Mahout   Oozie   Data  flow  
  10. 10. Hadoop Security Challenges Components capable of access control HiveQL   Sqoop    Flume     Zookeeper     Pig   YARN  (MRv2)   HDFS  2.0   R  connectors     Giraph   HCatalog   Hive   HBase  Coprocessors   HBase   Mahout   Oozie  
  11. 11. Hadoop Security Challenges Components capable of admission control HiveQL   Sqoop    Flume     Zookeeper   Pig   YARN  (MRv2)   HDFS  2.0   R  connectors  Giraph   HCatalog   Hive   HBase  Coprocessors   HBase   Mahout   Oozie  
  12. 12. Hadoop Security Challenges Components capable of (transparent) encryption HiveQL   Sqoop    Flume     Zookeeper   Pig   HDFS  2.0   R  connectors  Giraph   HCatalog   Hive   HBase  Coprocessors   HBase   Mahout   Oozie   YARN  (MRv2)  
  13. 13. Hadoop Security Challenges Components sharing a common policy engine HiveQL   Sqoop    Flume     Zookeeper   Pig   HDFS  2.0   R  connectors  Giraph   HCatalog   Hive   HBase  Coprocessors   HBase   Mahout   Oozie   YARN  (MRv2)  
  14. 14. Hadoop Security Challenges Components sharing a common audit log format HiveQL   Sqoop    Flume     Zookeeper   Pig   HDFS  2.0   R  connectors  Giraph   HCatalog   Metadata   Hive   HBase  Coprocessors   HBase   Mahout   Data  mining   Oozie   YARN  (MRv2)  
  15. 15. Hardening Hadoop from within
  16. 16. Project Rhino Encryption and Key Management Role Based Access Control Common Authorization Consistent Auditing Deliver defense in depth Firewall Gateway Authn AuthZ Encryption Audit & Alerts Isolation
  17. 17. Protect Hadoop APIs •  Enforces consistent security policies across all Hadoop services •  Serves as a trusted proxy to Hadoop, Hbase, and WebHDFS APIs •  Common Criteria EAL4+, HSM, FIPS 140-2 certified •  Deploys as software, virtual appliance, or hardware appliance •  Available on AWS Marketplace Hcatalog Stargate WebHDFS
  18. 18. Provide role-based access control AuthZ •  File, table, and cell-level access control in HBase •  JIRA HBASE-6222: Add per-KeyValue security _acl_table
  19. 19. Provide encryption for data at rest MapReduce RecordReader Map Combiner Partitioner Local Merge & Sort Reduce RecordWriter Decrypt Encrypt Derivative Encrypt Derivative Decrypt HDFS •  Extends compression codec into crypto codec •  Provides an abstract API for general use
  20. 20. Provide encryption for data at rest HBase •  Transparent table/CF encryption HBase-7544
  21. 21. Pig & Hive Encryption •  Pig Encryption Capabilities –  Support of text file and Avro* file format –  Intermediate job output file protection –  Pluggable key retrieving and key resolving –  Protection of key distribution in cluster •  Hive Encryption Capabilities –  Support of RC file and Avro file format –  Intermediate and final output data encryption –  Encryption is transparent to end user without changing existing SQL
  22. 22. Crypto Codec Framework •  Extends compression codec •  Establishes a common abstraction of the API level that can be shared by all crypto codec implementations CryptoCodec cryptoCodec = (CryptoCodec) ReflectionUtils.newInstance(codecClass, conf); CryptoContext cryptoContext = new CryptoContext(); ... cryptoCodec.setCryptoContext(cryptoContext); CompressionInputStream input = cryptoCodec.createInputStream(inputStream); ... •  Provides a foundation for other components in Hadoop* such as MapReduce or HBase* to support encryption features
  23. 23. Key Distribution •  Enabling crypto codec in a MapReduce job •  Enabling different key storage or management systems •  Allowing different stages and files to use different keys •  API to integrate with external key manage system
  24. 24. Crypto Software Optimization Multi-Buffer Crypt •  Process multiple independent data buffers in parallel •  Improves cryptographic functionality up to 2-9X
  25. 25. Intel® Data Protection Technology AES-NI •  Processor assistance for performing AES encryption •  Makes enabled encryption software faster and stronger Secure Key (DRNG) •  Processor-based true random number generator •  More secure, standards compliance, high performance Internet Data in Motion Secure transactions used pervasively in ecommerce, banking, etc. Data in Process Most enterprise and cloud applications offer encryption options to secure information and protect confidentiality Data at Rest Full disk encryption software protects data while saving to disk AES-NI - Advanced Encryption Standard New Instructions Secure Key - previously known as Intel Digital Random Number Generator (DRNG)
  26. 26. Intel® AES-NI Accelerated Encryption 18.2x/19.8x Non Intel® AES-NI With Intel® AES-NI Intel® AES-NI Multi-Buffer 5.3x/19.8x Encryption Decryption Encryption Decryption AES-NI - Advanced Encryption Standard New Instructions 20X Faster Crypto Relative speed of crypto functions Higher is better Based on Intel tests
  27. 27. Cloud Platform for secure Hadoop Intel® Xeon® Processors •  E7 Family •  E5 Family •  E3 Family Amazon •  EC2 Reserved Instances •  EC2 Dedicated Instances
  28. 28. 20 more at aws.amazon.com/ec2/instance-types Amazon EC2 Instances with AES-NI
  29. 29. Resources
  30. 30. For more information •  intel.com/bigdata •  intel.com/healthcare/bigdata •  github.com/intel-hadoop/project-rhino/ •  aws.amazon.com/compliance/ •  aws.amazon.com/ec2/instance-types/
  31. 31. THANK YOU Please give us your feedback by filling out the Feedback Forms AWS Government, Education, & Nonprofits Symposium Canberra, Australia | May 20, 2014

×