Apache Commons Crypto is a cryptographic library optimized with AES-NI (Advanced Encryption Standard New Instructions). It provides Java API for both cipher level and Java stream level. Developers can use it to implement high performance AES encryption/decryption with the minimum code and effort. Please note that Apache Commons Crypto doesn't implement the cryptographic algorithm such as AES directly. It wraps to Openssl or JCE which implement the algorithms.
2. About us
Dapeng Sun @Intel
• Apache Commons Committer
• Apache Sentry PMC
Xianda Ke @Intel
• Apache Commons Crypto
• Apache Pig(Pig on Spark)
• OpenJDK
3. Agenda
• Brief introduction to cryptography
• What’s Commons Crypto?
• Why did we create another wheel?
• Performance secrets
• Securing big data & API samples
4. A glance at cryptographic primitives
• Symmetric encryption(Secret Key) : Confidentiality
DES/3DES, RC4, AES, …
• Asymmetric encryption (Public Key)
Authentication / Key exchange : RSA, Diffie-Hellman, ECC, ...
• Hash algorithms: Integrity
MD5 , SHA-1, SHA-2, GMAC(a variant of the GCM), …
• Authenticated Encryption(AE): Confidentiality + Integrity
RC4 + HmacMD5, RC4 + HMAC-SHA-1, AES-GCM, …
• Random Number Generator
PRNG(Pseudo) / TRNG(True)
• mode of operation: blurs the cipher output
5. What is Commons Crypto?(1)
• High-performance Java cryptography library optimized with
AES-NI
• It wraps OpenSSL as the engine(via JNI)
6. History of Commons Crypto(1)
• HDFS Crypto codec
• Intel AES-NI
• Native library(OpenSSL) support
• 17x performance improvement
7. History of Commons Crypto(2)
• Promote the advancement to other projects
• Intel’s big data team created Chimera library
• https://github.com/intel-hadoop/Chimera
• Easily contribute towards the advancement of AES-NI
8. History of Commons Crypto(3)
•Worked with Commons community contribute to
Apache Community
• https://commons.apache.org/proper/commons-crypto/
9. What is Commons Crypto? (2)
• Low level Cipher API
Only support AES algorithm now
• Stream API
To encrypt Stream/Channel
• Random API
True Random Number Generator
11. Don’t end up in a bottleneck…
Some Java cryptographic protocols(e.g. SASL) are using
inefficient or weak low-level algorithms:
3DES, RC4, …
Hadoop/Spark users own very ‘big’ data…
• User’s data is vulnerable to attack
• Performance of JCE is the bottleneck!
Take security & performance into account,
JCE is impractical
12. Choosing algorithm for data encryption
DES is broken 3DES is too slow
RC4 is ‘fast’, but weak. AES is secure
prohibited by RFC 7465
18. Using Commons Crypto to secure your big
data, and no performance bottleneck
You might ask what makes crypto so fast?
19. AES algorithm: each round consists of several processing steps:
MixColumn, ShiftRow, Substitute bytes
All these steps in a round can be done by a SINGLE instruction
Performance Secret (1): AES in hardware (Intel®AES-NI)
AESENC xmm1, xmm2
20. Performance Secret (2): Pipelining(instruction-level parallelism)
AESENC instruction takes 7 CPU clock cycles
data dependence exists, instructions have to be
executed one by one
Pipelining instructions,
6 instructions take only
12 clock cycles
21. Other performance Secrets:
Well-optimized assembly code in OpenSSL
• Better SSE/AVX registers usage (x86 platform)
• Data locality, etc…
Hardware acceleration in OpenSSL
• Intel® Carry-Less Multiplication Instruction (PCLMULQDQ for GCM)
• Intel® Secure Key (RDRAND, Random Number Generator)
• in other architectures
HotSpot Intrinsics for AES are NOT well-optimized (JDK 8)
22. JDK 9 is becoming fast…
• JDK-8073108 enables PCLMULQDQ for AES-GCM,
~60X gain, still falls far behind Commons Crypto
We’ve contributed two patches to OpenJDK (HotSpot):
• JDK-8143925 x86 AES-CTR, 5~8x gain
• JDK-8152354 x86 AES-CBC Decryption, 15%~50% gain
But we have to wait for a few years to adopt
Java 9 in our software production…
23. Big Data components are using Commons Crypto/AES-NI
• AES support for over-the-wire encryption(SPARK-13331)
• Add encrypted shuffle in spark (SPARK-5682)
• brings 12.5% overall performance gain
• Optimize HDFS Encrypted Transport performance (HDFS-6606)
• brings 17x overall performance gain
• Improve transparent table/CF encryption with Commons Crypto (HBASE-16463)
• showed 4% to 42% gain for HFile encryption using HfilePerformanceEvaluation tool
• Improve performance for RPC encryption (HBASE-16414)
• improves performance about 9x when enable delegation token
(in development) :
Optimize Hadoop RPC encryption performance(HADOOP-10768)
Replace Hadoop Common crypto code with Apache Commons Crypto(HADOOP-13635)
27. Status & Future work:
Status:
• v1.0.0 is released. AES-CBC, AES-CTR are supported
• AES-GCM ( will be released in next release)
Future work : to cover more facets :
• Support Asymmetric key algorithms: RSA, DSA
• …
How to contribute:
• SCM: https://github.com/apache/commons-crypto
• Apache Bugtracker (JIRA): https://issues.apache.org/jira/