Secure Hadoop as a Service 
Peter Kerney, Solutions Architect, Intel 
© 2014 Amazon.com, Inc. and its affiliates. All righ...
Who needs Hadoop security?
Big Data Analytics in Health and Life Sciences 
Now: Disparate 
streams of data 
Genomics 
Clinical 
Claims & 
Meds & tran...
Cost Savings via Big Data Analytics 
Provider 
Proven Pathways of care 
Co-ordinated across providers 
Shift volume to rig...
Compliance Requirements 
• HIPAA 
– Privacy Rule 
– Security Rule 
• Administrative Safeguards 
• Physical Safeguards 
• T...
Technical Safeguards 
Access Control 
A covered entity must implement technical policies and 
procedures that allow only a...
Hadoop Security Challenges
Hadoop Security Challenges 
HiveQL 
Sqoop 
Flume 
Zookeeper 
Pig 
Oozie 
YARN 
(MRv2) 
HDFS 
2.0 
Giraph 
HCatalog 
R 
con...
Hadoop Security Challenges 
Components sharing an authentication framework 
HiveQL 
Sqoop 
Flume 
Zookeeper 
Pig 
Oozie 
D...
Hadoop Security Challenges 
Components capable of access control 
HiveQL 
Sqoop 
Flume 
Zookeeper 
Pig 
Oozie 
YARN 
(MRv2...
Hadoop Security Challenges 
Components capable of admission control 
HiveQL 
Sqoop 
Flume 
Zookeeper 
Pig 
Oozie 
YARN 
(M...
Hadoop Security Challenges 
Components capable of (transparent) encryption 
HiveQL 
Sqoop 
Flume 
Zookeeper 
Pig 
HDFS 
2....
Hadoop Security Challenges 
Components sharing a common policy engine 
HiveQL 
Sqoop 
Flume 
Zookeeper 
Pig 
HDFS 
2.0 
Gi...
Hadoop Security Challenges 
Components sharing a common audit log format 
HiveQL 
Sqoop 
Flume 
Zookeeper 
Pig 
Hive 
HDFS...
Hardening Hadoop from within
Deliver defense in depth 
Project Rhino 
Firewall 
Encryption and Key Management 
Gateway 
Role Based Access Control 
Auth...
Protect Hadoop APIs 
Hcatalog 
Stargate 
WebHDFS 
• Enforces consistent security policies across all Hadoop 
services 
• S...
Provide role-based access control 
AuthZ 
_acl_table 
• File, table, and cell-level 
access control in HBase 
• JIRA HBASE...
Provide encryption for data at rest 
MapReduce 
RecordReader 
Map 
Combiner 
Partitioner 
Local 
Merge & Sort 
Reduce 
Rec...
Provide encryption for data at rest 
HBase • Transparent table/CF encryption 
HBase-7544
Pig & Hive Encryption 
• Pig Encryption Capabilities 
– Support of text file and Avro* file format 
– Intermediate job out...
Crypto Codec Framework 
• Extends compression codec 
• Establishes a common abstraction of the API level that can be share...
Key Distribution 
• Enabling crypto codec in a MapReduce job 
• Enabling different key storage or management systems 
• Al...
Crypto Software Optimization 
Multi-Buffer Crypt 
• Process multiple independent 
data buffers in parallel 
• Improves cry...
Intel® Data Protection Technology 
AES-NI 
• Processor assistance for 
performing AES encryption 
• Makes enabled encrypti...
Intel® AES-NI Accelerated Encryption 
18.2x/19.8x 
Non Intel® 
AES-NI 
With Intel® 
AES-NI 
Intel® AES-NI 
Multi-Buffer 
5...
Cloud Platform for secure Hadoop 
Intel® Xeon® Processors 
• E7 Family 
• E5 Family 
• E3 Family 
Amazon 
• EC2 Reserved I...
Amazon EC2 Instances with AES-NI 
20 more at aws.amazon.com/ec2/instance-types
Resources
For more information 
• intel.com/bigdata 
• intel.com/healthcare/bigdata 
• github.com/intel-hadoop/project-rhino/ 
• aws...
Thank you.
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or i...
Upcoming SlideShare
Loading in...5
×

Secure Hadoop as a Service - Session Sponsored by Intel

332

Published on

AWS Summit 2014 Melbourne - Breakout 2

Intel is contributing to a common security framework for Apache Hadoop, in the form of Project Rhino, which enables Hadoop to run workloads without compromising performance or security. Join this session to learn how your enterprise can take advantage of the security capabilities in the Intel Data Platform running on AWS to analyze data while ensuring technical safeguards that help you remain in compliance.

Presenter: Peter Kerney, Senior Solution Architect, Intel

Published in: Technology

Secure Hadoop as a Service - Session Sponsored by Intel

  1. 1. Secure Hadoop as a Service Peter Kerney, Solutions Architect, Intel © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. Who needs Hadoop security?
  3. 3. Big Data Analytics in Health and Life Sciences Now: Disparate streams of data Genomics Clinical Claims & Meds & transactions labs Patient experience Personal data Next: Integrated computing and data Better decisions and outcomes at reduced cost Clinical Analysis Genomic Analysis From population- to person-based treatment
  4. 4. Cost Savings via Big Data Analytics Provider Proven Pathways of care Co-ordinated across providers Shift volume to right setting Reducing ER (re)admit rates Patient Payer Accelerated Approval Regulator Producer $70B Accelerated Discovery $100B Provider / performance transparency & payment innovation $180B Personalized medicine Data-driven adherence $100B
  5. 5. Compliance Requirements • HIPAA – Privacy Rule – Security Rule • Administrative Safeguards • Physical Safeguards • Technical Safeguards • Others… Provider Patient Payer Regulator Producer
  6. 6. Technical Safeguards Access Control A covered entity must implement technical policies and procedures that allow only authorized persons to access electronic protected health information (e-PHI). Audit Controls A covered entity must implement hardware, software, and/or procedural mechanisms to record and examine access and other activity in information systems that contain or use e-PHI. Integrity Controls A covered entity must implement policies and procedures to ensure that e-PHI is not improperly altered or destroyed. Electronic measures must be put in place to confirm that e-PHI has not been improperly altered or destroyed. Transmission Security A covered entity must implement technical security measures that guard against unauthorized access to e-PHI that is being transmitted over an electronic network.
  7. 7. Hadoop Security Challenges
  8. 8. Hadoop Security Challenges HiveQL Sqoop Flume Zookeeper Pig Oozie YARN (MRv2) HDFS 2.0 Giraph HCatalog R connectors Hive HBase Coprocessors HBase Mahout Components of a typical Hadoop stack
  9. 9. Hadoop Security Challenges Components sharing an authentication framework HiveQL Sqoop Flume Zookeeper Pig Oozie Data flow YARN (MRv2) Hive HDFS 2.0 Giraph HCatalog R connectors Metadata HBase Coprocessors HBase Mahout
  10. 10. Hadoop Security Challenges Components capable of access control HiveQL Sqoop Flume Zookeeper Pig Oozie YARN (MRv2) Hive HDFS 2.0 R connectors Giraph HCatalog HBase Coprocessors HBase Mahout
  11. 11. Hadoop Security Challenges Components capable of admission control HiveQL Sqoop Flume Zookeeper Pig Oozie YARN (MRv2) HDFS 2.0 Giraph HCatalog R connectors Hive HBase Coprocessors HBase Mahout
  12. 12. Hadoop Security Challenges Components capable of (transparent) encryption HiveQL Sqoop Flume Zookeeper Pig HDFS 2.0 Giraph HCatalog R connectors Hive HBase Coprocessors HBase Oozie Mahout YARN (MRv2)
  13. 13. Hadoop Security Challenges Components sharing a common policy engine HiveQL Sqoop Flume Zookeeper Pig HDFS 2.0 Giraph HCatalog R connectors Hive HBase Coprocessors HBase Oozie Mahout YARN (MRv2)
  14. 14. Hadoop Security Challenges Components sharing a common audit log format HiveQL Sqoop Flume Zookeeper Pig Hive HDFS 2.0 Giraph HCatalog R connectors Metadata HBase Coprocessors HBase Oozie Mahout Data mining YARN (MRv2)
  15. 15. Hardening Hadoop from within
  16. 16. Deliver defense in depth Project Rhino Firewall Encryption and Key Management Gateway Role Based Access Control Authn Common Authorization AuthZ Consistent Auditing Isolation Encryption Audit & Alerts
  17. 17. Protect Hadoop APIs Hcatalog Stargate WebHDFS • Enforces consistent security policies across all Hadoop services • Serves as a trusted proxy to Hadoop, Hbase, and WebHDFS APIs • Common Criteria EAL4+, HSM, FIPS 140-2 certified • Deploys as software, virtual appliance, or hardware appliance • Available on AWS Marketplace
  18. 18. Provide role-based access control AuthZ _acl_table • File, table, and cell-level access control in HBase • JIRA HBASE-6222: Add per-KeyValue security
  19. 19. Provide encryption for data at rest MapReduce RecordReader Map Combiner Partitioner Local Merge & Sort Reduce RecordWriter Encrypt Decrypt Derivative Decrypt Derivative Encrypt HDFS • Extends compression codec into crypto codec • Provides an abstract API for general use
  20. 20. Provide encryption for data at rest HBase • Transparent table/CF encryption HBase-7544
  21. 21. Pig & Hive Encryption • Pig Encryption Capabilities – Support of text file and Avro* file format – Intermediate job output file protection – Pluggable key retrieving and key resolving – Protection of key distribution in cluster • Hive Encryption Capabilities – Support of RC file and Avro file format – Intermediate and final output data encryption – Encryption is transparent to end user without changing existing SQL
  22. 22. Crypto Codec Framework • Extends compression codec • Establishes a common abstraction of the API level that can be shared by all crypto codec implementations CryptoCodec cryptoCodec = (CryptoCodec) ReflectionUtils.newInstance(codecClass, conf); CryptoContext cryptoContext = new CryptoContext(); ... cryptoCodec.setCryptoContext(cryptoContext); CompressionInputStream input = cryptoCodec.createInputStream(inputStream); ... • Provides a foundation for other components in Hadoop* such as MapReduce or HBase* to support encryption features
  23. 23. Key Distribution • Enabling crypto codec in a MapReduce job • Enabling different key storage or management systems • Allowing different stages and files to use different keys • API to integrate with external key manage system
  24. 24. Crypto Software Optimization Multi-Buffer Crypt • Process multiple independent data buffers in parallel • Improves cryptographic functionality up to 2-9X
  25. 25. Intel® Data Protection Technology AES-NI • Processor assistance for performing AES encryption • Makes enabled encryption software faster and stronger Secure Key (DRNG) • Processor-based true random number generator • More secure, standards compliance, high performance Data in Motion Secure transactions used pervasively in ecommerce, banking, etc. Data at Rest Full disk encryption software protects data while saving to disk Internet Data in Process Most enterprise and cloud applications offer encryption options to secure information and protect confidentiality AES-NI - Advanced Encryption Standard New Instructions Secure Key - previously known as Intel Digital Random Number Generator (DRNG)
  26. 26. Intel® AES-NI Accelerated Encryption 18.2x/19.8x Non Intel® AES-NI With Intel® AES-NI Intel® AES-NI Multi-Buffer 5.3x/19.8x Encryption Decryption Encryption Decryption AES-NI - Advanced Encryption Standard New Instructions 20X Faster Crypto Relative speed of crypto functions Higher is better Based on Intel tests
  27. 27. Cloud Platform for secure Hadoop Intel® Xeon® Processors • E7 Family • E5 Family • E3 Family Amazon • EC2 Reserved Instances • EC2 Dedicated Instances
  28. 28. Amazon EC2 Instances with AES-NI 20 more at aws.amazon.com/ec2/instance-types
  29. 29. Resources
  30. 30. For more information • intel.com/bigdata • intel.com/healthcare/bigdata • github.com/intel-hadoop/project-rhino/ • aws.amazon.com/compliance/ • aws.amazon.com/ec2/instance-types/
  31. 31. Thank you.
  32. 32. © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

×