Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hadoop Meetup Jan 2019 - Hadoop Encryption

240 views

Published on

A presentation by Wei-Chiu Chuang of Cloudera regarding the state of Hadoop encryption, with a particular eye towards the Key Management Service (KMS).

This is taken from the Apache Hadoop Contributors Meetup on January 30, hosted by LinkedIn in Mountain View.

Published in: Technology
  • Be the first to comment

Hadoop Meetup Jan 2019 - Hadoop Encryption

  1. 1. © Cloudera, Inc. All rights reserved. Hadoop Encryption Wei-Chiu Chuang, Cloudera
  2. 2. © Cloudera, Inc. All rights reserved. Why Encryption • Information leaks affect 10s to 100s of millions of people • Personally identifiable information (PII) • Credit cards, SSNs, account logins • Encryption would have prevented some of these leaks • Encryption is a regulatory requirement for many business sectors • Finance (PCI DSS) • Government (Data Protection Directive) • Healthcare (DPD, HIPAA)
  3. 3. © Cloudera, Inc. All rights reserved. Related Technologies Data In-Motion Encryption • SSL/TLS • Hadoop Data Transfer Encryption • Hadoop RPC Encryption Data At-Rest Encryption • At Linux volume level • Transparent Encryption at HDFS level • HBase Column Family level • Parquet Column level Encryption Xinli
  4. 4. © Cloudera, Inc. All rights reserved. HDFS Transparent Encryption: In a Nutshell HDFS Namespace / /data /tmp /data/1 /data/f2 Encryption zone Encryption zone key Data Encryption Key (per file)
  5. 5. © Cloudera, Inc. All rights reserved. HDFS Transparent Encryption: In a Nutshell Client KMS MS NN DN NameNode DataNode Key Management Server in EZ?
  6. 6. © Cloudera, Inc. All rights reserved. Features • Minor performance impact on HDFS reads and writes • OpenSSL and AES-NI acceleration • 7.5% for reads, ~0% for writes • Key ACLs • Warm-up/Caching (*) • Key rollover
  7. 7. © Cloudera, Inc. All rights reserved. Dev History • First released in Hadoop 2.6.0/ CDH5.3 in 2014 December • Many, many bug fixes and enhancements • Functional bugs, failure handling bugs, scale bugs • Stable after Hadoop 2.8 / CDH5.11-ish
  8. 8. © Cloudera, Inc. All rights reserved. Lesson Learned Scale-out is not easy to deploy Security Endurance, scale tests are essential Too little emphasis on KMS as a performance bottleneck FileSystem#getDelegationToken() API/integration High throughput REST API layer is hard
  9. 9. © Cloudera, Inc. All rights reserved. Status Quo Among Cloudera’s customers (pre-merger): • 14% Data Transfer Encryption • 16% Data at Rest Encryption • 19% RPC Encryption • 44% Kerberized Largest at-rest encryption cluster: ~1,000 nodes, > 50PB
  10. 10. © Cloudera, Inc. All rights reserved. Troubleshooting Performance anomaly • Openssl-devel lib • Entropy • rng-tools • Secure Random • hadoop.security.secure.random.impl = org.apache.hadoop.crypto.random .OpensslSecureRandom Proxy user configuration
  11. 11. © Cloudera, Inc. All rights reserved. Bad Practices • No KMS HA • KMS enabled, RPC encryption not enabled • KMS enabled, but no Kerberos • KMS w/o SSL • Data transfer encryption is enabled, but using an unoptimized crypto algorithm • 3DES, RC4, AES-NI
  12. 12. © Cloudera, Inc. All rights reserved. Challenges KMS Low Throughput • NN can sustain > 100 thousand RPC ops/second • namespace ops, block reports • KMS: at most a few thousand RPC ops/second • create, append, read, reencrypt • 3-4 KMS servers not uncommon • Jetty • SSL Handshake • Impala/Parquet with wide tables (> 100 columns)
  13. 13. © Cloudera, Inc. All rights reserved. Future Pluggable KMS ACL Framework (HADOOP-14951) WebHDFS At Rest Encryption Support (HDFS-12355) NFS Gateway At Rest Encryption Support (HDFS-13521) Performance Improvements (HADOOP-15743, HADOOP-15811) KMS Benchmark Tool (HADOOP-15967) KMS over Hadoop RPC?
  14. 14. © Cloudera, Inc. All rights reserved. ●Current KMS Transport Layer KMSClient Jetty http client Name Node http client REST API/HTTP
  15. 15. © Cloudera, Inc. All rights reserved. KMS over Hadoop RPC Benefit of KMS over Hadoop RPC: • Proven performance • Code reuse KMSClient Name Node Hadoop RPCHadoop RPC Hadoop RPC Hadoop RPC

×