:
HDFS and Security
Daryn Sharp
:
Security checklist
• Strong authentication with kerberos.

• RPC and REST protocols.

• RPC supports SASL for authentication, privacy, integrity. 

• REST supports end to end TLS.

• Proxy user support for privileged services.

• Encryption at rest.
!2
:
What needs work…
• RPC privacy performance.

• Spnego doesn’t use mutual auth.

• TLS performance is poor.

• TLS mutual authentication needs to be
easier to use.

• KMS client reliability.

• KMS server scalability.

• AuthenticatedURL & AuthFilter, and
token variants, are brittle.

• Proxy user parity between REST and
RPC.

• RPC call queue throughput under
varying workloads.

• RPC replay cache.

• Reduce server-side recursive operations.

• Block report processing.

• Decommissioning overhead.

• Block placement policy and topology
overhead.

• Garbage generation.

• List goes on…

!3
:
RPC
• Throughput approaches 400k ops/sec for read intensive loads.

• Generally sub-millisecond latency.

• Fair Call queue is crucial to managing excessive load.

• Excellent for heavy read loads but degrades under heavy write loads.

• QoS for privacy imposes significant performance penalties.

• Kerberos uses strong ciphers. Tokens use a weak cipher (RC4).

• Plan to extend QoS with TLS via netty’s native engine and support TLS
mutual auth (finally implement CERTIFICATE auth type).
!4
:
Encryption Zones
KMS
• KMS throughput scales to only a few thousand ops/sec vs the NN’s hundreds of
thousands opens/sec.

• Connections are not reused depending on java version.

• ECDH algorithm performance is terrible. Use AES+SHA.

• SSL session cache is expensive. Reduce size and timeout.

• Uses /dev/random (blocks for entropy) instead of /dev/urandom. Avoids use of
synchronized blinding randoms.

• RSACore “Montgomery” math is expensive. Enable intrinsics.

• Probably more…

• Jackson is too expensive for the simple REST api.
!6
:
Encryption Zones
Namenode
• FEInfo memory consumption must be decreased.

• Every FEInfo duplicates the key name and key version.

• Key version redundantly includes the key name…

• Need a key re-encryptor that doesn’t use a recursive descent or
impose limits like no renames during re-encrypt of EDEKs.

• Better support for non-superusers to copy encrypted data w/o
decrypting. 

• Need to avoid blocking rpc handlers during EDEK cache miss.
Throw retriable.
!7
:
HDFS Improvements
• Avoid heavyweight niche features. Use external plugins.

• Reduce overhead of block reports. Improve lock usage.

• Reduce decommissioning overhead.

• Reduce block placement and topology overhead.

• Recursive operations should be client-side.

• Reduce excessive garbage for better GC performance.
!8

HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath

  • 1.
  • 2.
    : Security checklist • Strongauthentication with kerberos. • RPC and REST protocols. • RPC supports SASL for authentication, privacy, integrity. • REST supports end to end TLS. • Proxy user support for privileged services. • Encryption at rest. !2
  • 3.
    : What needs work… •RPC privacy performance. • Spnego doesn’t use mutual auth. • TLS performance is poor. • TLS mutual authentication needs to be easier to use. • KMS client reliability. • KMS server scalability. • AuthenticatedURL & AuthFilter, and token variants, are brittle. • Proxy user parity between REST and RPC. • RPC call queue throughput under varying workloads. • RPC replay cache. • Reduce server-side recursive operations. • Block report processing. • Decommissioning overhead. • Block placement policy and topology overhead. • Garbage generation. • List goes on…
 !3
  • 4.
    : RPC • Throughput approaches400k ops/sec for read intensive loads. • Generally sub-millisecond latency. • Fair Call queue is crucial to managing excessive load. • Excellent for heavy read loads but degrades under heavy write loads. • QoS for privacy imposes significant performance penalties. • Kerberos uses strong ciphers. Tokens use a weak cipher (RC4). • Plan to extend QoS with TLS via netty’s native engine and support TLS mutual auth (finally implement CERTIFICATE auth type). !4
  • 5.
    : Encryption Zones KMS • KMSthroughput scales to only a few thousand ops/sec vs the NN’s hundreds of thousands opens/sec. • Connections are not reused depending on java version. • ECDH algorithm performance is terrible. Use AES+SHA. • SSL session cache is expensive. Reduce size and timeout. • Uses /dev/random (blocks for entropy) instead of /dev/urandom. Avoids use of synchronized blinding randoms. • RSACore “Montgomery” math is expensive. Enable intrinsics. • Probably more… • Jackson is too expensive for the simple REST api. !6
  • 6.
    : Encryption Zones Namenode • FEInfomemory consumption must be decreased. • Every FEInfo duplicates the key name and key version. • Key version redundantly includes the key name… • Need a key re-encryptor that doesn’t use a recursive descent or impose limits like no renames during re-encrypt of EDEKs. • Better support for non-superusers to copy encrypted data w/o decrypting. • Need to avoid blocking rpc handlers during EDEK cache miss. Throw retriable. !7
  • 7.
    : HDFS Improvements • Avoidheavyweight niche features. Use external plugins. • Reduce overhead of block reports. Improve lock usage. • Reduce decommissioning overhead. • Reduce block placement and topology overhead. • Recursive operations should be client-side. • Reduce excessive garbage for better GC performance. !8