SlideShare a Scribd company logo
1 of 20
1© Cloudera, Inc. All rights reserved.
Charles Lamb
HDFS Transparent Encryption
SFHUG
2© Cloudera, Inc. All rights reserved.
Overview
• Done under open source (HDFS-6134)
• Data read from and written to certain directories is transparently encrypted
• No changes to user code
• Encryption/decryption always done by client
• HDFS never handles unencrypted data or unencrypted keys
• Helps applications be regulation-compliant (HIPAA, PCI DSS, FISMA, etc.)
3© Cloudera, Inc. All rights reserved.
Background
• Encryption can happen at any of several levels:
• Application: most secure and flexible, but hardest to do
• Adding encryption to legacy applications may be difficult
• Database: most DBMSs have this, but may incur performance penalties
• Secondary indices can not generally be encrypted
• Filesystem: high performance, transparent, but may not be flexible enough
• Multi-tenancy vs per-user encryption policies
• Disk: high performance but only really protects against physical theft
• HDFS encryption is somewhere between Filesystem and Database level
4© Cloudera, Inc. All rights reserved.
Design Goals
• Performance and scalability
• Transparent to applications, including legacy apps
• End-to-end
• Data should be encrypted on the network and ‘at-rest’
• Compartmentalization
• Key management independent of HDFS management
• Includes preventing HDFS admins and root users from accessing sensitive data
• Compatibility with HDFS access methods: WebHDFS, HttpFS, FUSE, NFS, hftp, har,
etc.
5© Cloudera, Inc. All rights reserved.
Architectural Concepts
• Key Management Server
• Encryption Zones
• Keys
6© Cloudera, Inc. All rights reserved.
Key Management Server
7© Cloudera, Inc. All rights reserved.
Key Management Server (KMS)
• KMS sits between client and key server
• E.g. Cloudera Navigator Key Trustee
• Provides a unified API and scalability
• REST API
• Does not actually store keys (backend does that), but does cache them
• ACLs on per-key basis
8© Cloudera, Inc. All rights reserved.
Encryption Zones
• An HDFS directory in which the contents (including subdirs) are encrypted on
write and decrypted on read.
• An EZ begins life as an empty directory
• Renames in/out of an EZ are prohibited
• Encryption is transparent to application with no code changes
9© Cloudera, Inc. All rights reserved.
Keys
• Every Encryption Zone has a key (“EZ Key”)
• Every file in an Encryption Zone has a unique key (“Data Encryption Key” or
“DEK”)
• The HDFS NameNode stores the name of the EZ Key in an Xattr of the EZ Dir
• The actual EZ Key is stored in the Key Server
• The NameNode stores the DEK in an Xattr of the file, but only in encrypted form
• Encrypted Data Encryption Key, or “EDEK”
• The NameNode never touches decrypted data or decrypted keys
10© Cloudera, Inc. All rights reserved.
EZ Keys, Data Encryption Keys, and Encrypted Data
Encryption Keys
11© Cloudera, Inc. All rights reserved.
Key Handling
12© Cloudera, Inc. All rights reserved.
Design
• End-to-end encryption
• Encryption occurs on the client and decrypted data is never touched by HDFS
• Protects against network sniffing, evil HDFS admins, and hard drive theft
• HDFS never touches key material (DEK’s or EZ keys)
• Compromising an HDFS daemon is not a viable attack vector
• HDFS handles encrypted Keys (EDEKs), but never in decrypted form (DEKs)
• Key permissions are handled by the KMS ACLs
• Each file is encrypted with a unique DEK
13© Cloudera, Inc. All rights reserved.
HDFS Encryption Configuration
• hadoop key create <keyname>
• hdfs dfs –mkdir <path>
• hdfs crypto –createZone –keyName <keyname> -path <path>
14© Cloudera, Inc. All rights reserved.
KMS Per-User ACL Configuration
• White lists (check for inclusion) and black lists (check for exclusion)
• etc/hadoop/kms-acls.xml
• hadoop.kms.acl.CREATE
• hadoop.kms.blacklist.CREATE
• … DELETE, ROLLOVER, GET, GET_KEYS, GET_METADATA,
GENERATE_EEK, DECRYPT_EEK
15© Cloudera, Inc. All rights reserved.
KMS Per-Key ACL Configuration
• etc/hadoop/kms-acls.xml
• hadoop.kms.acl.<keyname>.<operation>
• MANAGEMENT – createKey, deleteKey, rolloverNewVersion
• GENERATE_EEK – generateEncryptedKey,
warmUpEncryptedKeys
• DECRYPT_EEK – decryptEncryptedKey
• READ – getKeyVersion, getKeyVersions, getMetadata,
getKeysMetadata, getCurrentKey
• ALL – all of the above
16© Cloudera, Inc. All rights reserved.
Performance
• AES-CTR, 128 or 256 (with unlimited strength JCE installed)
• AES-NI available
• Negligible overhead on writes and 7.5% impact on reads for datasets larger
than memory
17© Cloudera, Inc. All rights reserved.
DistCp
• Encryption Zone to Encryption Zone
• use –update –skipcrccheck
• Admins use special /.reserved/raw path prefix
• /.reserved/raw is only available to root and provides the encrypted
contents
18© Cloudera, Inc. All rights reserved.
Exceptions
• Hive: may not be able to do a query that combines data from more than one
encryption zone
19© Cloudera, Inc. All rights reserved.
HDFS Encryption - Summary
• Good performance (4-10% hit)
• No mods to existing applications
• Prevents attacks at the filesystem and below
• OS and filesystem only see encrypted bytes
• Data is encrypted all the way to the client
• Secure ‘at rest’ and in transit
• Key management is independent of HDFS
• Key admin != HDFS admin
• Can prevent HDFS admin from accessing secure data
20© Cloudera, Inc. All rights reserved.
Questions

More Related Content

What's hot

SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 

What's hot (20)

Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table Snapshots
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Overview of new features in Apache Ranger
Overview of new features in Apache RangerOverview of new features in Apache Ranger
Overview of new features in Apache Ranger
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage Engine
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Apache Kafka Security
Apache Kafka Security Apache Kafka Security
Apache Kafka Security
 
Apache Superset - open source data exploration and visualization (Conclusion ...
Apache Superset - open source data exploration and visualization (Conclusion ...Apache Superset - open source data exploration and visualization (Conclusion ...
Apache Superset - open source data exploration and visualization (Conclusion ...
 
Caching Strategies
Caching StrategiesCaching Strategies
Caching Strategies
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache Superset
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 

Viewers also liked

Viewers also liked (20)

Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
 
Cómo sacar rendimiento al PCI DSS. SafeNet.
Cómo sacar rendimiento al PCI DSS. SafeNet.Cómo sacar rendimiento al PCI DSS. SafeNet.
Cómo sacar rendimiento al PCI DSS. SafeNet.
 
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
Cloud at massive scale and incredible speed, Ekkard Schnedermann berichtet vo...
 
How PCI And PA DSS will change enterprise applications
How PCI And PA DSS will change enterprise applicationsHow PCI And PA DSS will change enterprise applications
How PCI And PA DSS will change enterprise applications
 
La práctica de Machine Learning en la empresa
La práctica de Machine Learning en la empresaLa práctica de Machine Learning en la empresa
La práctica de Machine Learning en la empresa
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetup
 
AWS re:Invent Recap from AWS User Group UK meetup #8
AWS re:Invent Recap from AWS User Group UK meetup #8AWS re:Invent Recap from AWS User Group UK meetup #8
AWS re:Invent Recap from AWS User Group UK meetup #8
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Hadoop and Financial Services
Hadoop and Financial ServicesHadoop and Financial Services
Hadoop and Financial Services
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
Deploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for HadoopDeploying Enterprise-grade Security for Hadoop
Deploying Enterprise-grade Security for Hadoop
 
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud WorldPart 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
Part 1: Cloudera’s Analytic Database: BI & SQL Analytics in a Hybrid Cloud World
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark Applications
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 

Similar to Overview of HDFS Transparent Encryption

How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
BlueData, Inc.
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
DataWorks Summit
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Dinesh Chitlangia
 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
DataWorks Summit
 

Similar to Overview of HDFS Transparent Encryption (20)

Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Accumulo Summit 2015: Attempting to answer unanswerable questions: Key manage...
Accumulo Summit 2015: Attempting to answer unanswerable questions: Key manage...Accumulo Summit 2015: Attempting to answer unanswerable questions: Key manage...
Accumulo Summit 2015: Attempting to answer unanswerable questions: Key manage...
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Hadoop Meetup Jan 2019 - Hadoop Encryption
Hadoop Meetup Jan 2019 - Hadoop EncryptionHadoop Meetup Jan 2019 - Hadoop Encryption
Hadoop Meetup Jan 2019 - Hadoop Encryption
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
 
Ozone: Evolution of HDFS
Ozone: Evolution of HDFSOzone: Evolution of HDFS
Ozone: Evolution of HDFS
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
 
Security best practices for informix
Security best practices for informixSecurity best practices for informix
Security best practices for informix
 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
 
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
 

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Overview of HDFS Transparent Encryption

  • 1. 1© Cloudera, Inc. All rights reserved. Charles Lamb HDFS Transparent Encryption SFHUG
  • 2. 2© Cloudera, Inc. All rights reserved. Overview • Done under open source (HDFS-6134) • Data read from and written to certain directories is transparently encrypted • No changes to user code • Encryption/decryption always done by client • HDFS never handles unencrypted data or unencrypted keys • Helps applications be regulation-compliant (HIPAA, PCI DSS, FISMA, etc.)
  • 3. 3© Cloudera, Inc. All rights reserved. Background • Encryption can happen at any of several levels: • Application: most secure and flexible, but hardest to do • Adding encryption to legacy applications may be difficult • Database: most DBMSs have this, but may incur performance penalties • Secondary indices can not generally be encrypted • Filesystem: high performance, transparent, but may not be flexible enough • Multi-tenancy vs per-user encryption policies • Disk: high performance but only really protects against physical theft • HDFS encryption is somewhere between Filesystem and Database level
  • 4. 4© Cloudera, Inc. All rights reserved. Design Goals • Performance and scalability • Transparent to applications, including legacy apps • End-to-end • Data should be encrypted on the network and ‘at-rest’ • Compartmentalization • Key management independent of HDFS management • Includes preventing HDFS admins and root users from accessing sensitive data • Compatibility with HDFS access methods: WebHDFS, HttpFS, FUSE, NFS, hftp, har, etc.
  • 5. 5© Cloudera, Inc. All rights reserved. Architectural Concepts • Key Management Server • Encryption Zones • Keys
  • 6. 6© Cloudera, Inc. All rights reserved. Key Management Server
  • 7. 7© Cloudera, Inc. All rights reserved. Key Management Server (KMS) • KMS sits between client and key server • E.g. Cloudera Navigator Key Trustee • Provides a unified API and scalability • REST API • Does not actually store keys (backend does that), but does cache them • ACLs on per-key basis
  • 8. 8© Cloudera, Inc. All rights reserved. Encryption Zones • An HDFS directory in which the contents (including subdirs) are encrypted on write and decrypted on read. • An EZ begins life as an empty directory • Renames in/out of an EZ are prohibited • Encryption is transparent to application with no code changes
  • 9. 9© Cloudera, Inc. All rights reserved. Keys • Every Encryption Zone has a key (“EZ Key”) • Every file in an Encryption Zone has a unique key (“Data Encryption Key” or “DEK”) • The HDFS NameNode stores the name of the EZ Key in an Xattr of the EZ Dir • The actual EZ Key is stored in the Key Server • The NameNode stores the DEK in an Xattr of the file, but only in encrypted form • Encrypted Data Encryption Key, or “EDEK” • The NameNode never touches decrypted data or decrypted keys
  • 10. 10© Cloudera, Inc. All rights reserved. EZ Keys, Data Encryption Keys, and Encrypted Data Encryption Keys
  • 11. 11© Cloudera, Inc. All rights reserved. Key Handling
  • 12. 12© Cloudera, Inc. All rights reserved. Design • End-to-end encryption • Encryption occurs on the client and decrypted data is never touched by HDFS • Protects against network sniffing, evil HDFS admins, and hard drive theft • HDFS never touches key material (DEK’s or EZ keys) • Compromising an HDFS daemon is not a viable attack vector • HDFS handles encrypted Keys (EDEKs), but never in decrypted form (DEKs) • Key permissions are handled by the KMS ACLs • Each file is encrypted with a unique DEK
  • 13. 13© Cloudera, Inc. All rights reserved. HDFS Encryption Configuration • hadoop key create <keyname> • hdfs dfs –mkdir <path> • hdfs crypto –createZone –keyName <keyname> -path <path>
  • 14. 14© Cloudera, Inc. All rights reserved. KMS Per-User ACL Configuration • White lists (check for inclusion) and black lists (check for exclusion) • etc/hadoop/kms-acls.xml • hadoop.kms.acl.CREATE • hadoop.kms.blacklist.CREATE • … DELETE, ROLLOVER, GET, GET_KEYS, GET_METADATA, GENERATE_EEK, DECRYPT_EEK
  • 15. 15© Cloudera, Inc. All rights reserved. KMS Per-Key ACL Configuration • etc/hadoop/kms-acls.xml • hadoop.kms.acl.<keyname>.<operation> • MANAGEMENT – createKey, deleteKey, rolloverNewVersion • GENERATE_EEK – generateEncryptedKey, warmUpEncryptedKeys • DECRYPT_EEK – decryptEncryptedKey • READ – getKeyVersion, getKeyVersions, getMetadata, getKeysMetadata, getCurrentKey • ALL – all of the above
  • 16. 16© Cloudera, Inc. All rights reserved. Performance • AES-CTR, 128 or 256 (with unlimited strength JCE installed) • AES-NI available • Negligible overhead on writes and 7.5% impact on reads for datasets larger than memory
  • 17. 17© Cloudera, Inc. All rights reserved. DistCp • Encryption Zone to Encryption Zone • use –update –skipcrccheck • Admins use special /.reserved/raw path prefix • /.reserved/raw is only available to root and provides the encrypted contents
  • 18. 18© Cloudera, Inc. All rights reserved. Exceptions • Hive: may not be able to do a query that combines data from more than one encryption zone
  • 19. 19© Cloudera, Inc. All rights reserved. HDFS Encryption - Summary • Good performance (4-10% hit) • No mods to existing applications • Prevents attacks at the filesystem and below • OS and filesystem only see encrypted bytes • Data is encrypted all the way to the client • Secure ‘at rest’ and in transit • Key management is independent of HDFS • Key admin != HDFS admin • Can prevent HDFS admin from accessing secure data
  • 20. 20© Cloudera, Inc. All rights reserved. Questions