2. :
Security checklist
• Strong authentication with kerberos.
• RPC and REST protocols.
• RPC supports SASL for authentication, privacy, integrity.
• REST supports end to end TLS.
• Proxy user support for privileged services.
• Encryption at rest.
!2
3. :
What needs work…
• RPC privacy performance.
• Spnego doesn’t use mutual auth.
• TLS performance is poor.
• TLS mutual authentication needs to be
easier to use.
• KMS client reliability.
• KMS server scalability.
• AuthenticatedURL & AuthFilter, and
token variants, are brittle.
• Proxy user parity between REST and
RPC.
• RPC call queue throughput under
varying workloads.
• RPC replay cache.
• Reduce server-side recursive operations.
• Block report processing.
• Decommissioning overhead.
• Block placement policy and topology
overhead.
• Garbage generation.
• List goes on…
!3
4. :
RPC
• Throughput approaches 400k ops/sec for read intensive loads.
• Generally sub-millisecond latency.
• Fair Call queue is crucial to managing excessive load.
• Excellent for heavy read loads but degrades under heavy write loads.
• QoS for privacy imposes significant performance penalties.
• Kerberos uses strong ciphers. Tokens use a weak cipher (RC4).
• Plan to extend QoS with TLS via netty’s native engine and support TLS
mutual auth (finally implement CERTIFICATE auth type).
!4
5. :
Encryption Zones
KMS
• KMS throughput scales to only a few thousand ops/sec vs the NN’s hundreds of
thousands opens/sec.
• Connections are not reused depending on java version.
• ECDH algorithm performance is terrible. Use AES+SHA.
• SSL session cache is expensive. Reduce size and timeout.
• Uses /dev/random (blocks for entropy) instead of /dev/urandom. Avoids use of
synchronized blinding randoms.
• RSACore “Montgomery” math is expensive. Enable intrinsics.
• Probably more…
• Jackson is too expensive for the simple REST api.
!6
6. :
Encryption Zones
Namenode
• FEInfo memory consumption must be decreased.
• Every FEInfo duplicates the key name and key version.
• Key version redundantly includes the key name…
• Need a key re-encryptor that doesn’t use a recursive descent or
impose limits like no renames during re-encrypt of EDEKs.
• Better support for non-superusers to copy encrypted data w/o
decrypting.
• Need to avoid blocking rpc handlers during EDEK cache miss.
Throw retriable.
!7
7. :
HDFS Improvements
• Avoid heavyweight niche features. Use external plugins.
• Reduce overhead of block reports. Improve lock usage.
• Reduce decommissioning overhead.
• Reduce block placement and topology overhead.
• Recursive operations should be client-side.
• Reduce excessive garbage for better GC performance.
!8