Apache Direct Memory is an open source implementation of off-heap caching that uses ByteBuffer.allocateDirect to store objects in off-heap memory without degrading JVM performance. It provides a multi-layered caching solution and can be used to build a standalone cache server similar to Memcached. Current use cases include integrating with Ehcache for multi-level caching and implementing an off-heap output stream to process streaming data without filling heap memory. Future work includes benchmarking, improving the API, and integrating with more libraries.
1. JUG Lausanne
8. March 2012
Apache Direct Memory
Reducing Heap Memory Stress
The next battle horse for JVM
performance tuning
2. JUG Lausanne
8. March 2012
About me
• Benoit Perroud
• Apache Direct Memory Commiter
• bperroud@apache.org
• @killerwhile
• Software craftsman
• BigData Engineer @
3. JUG Lausanne
8. March 2012
Today's Agenda
• Off Heap Caching
– Java Memory
– Garbage Collector (GC)
– Cache On-heap vs. Off-heap Caching
• Apache Direct Memory
– Design and principles
– Uses cases
• Multi layered cache
• Standalone server “à la memcache”
– Next steps
• Questions
4. JUG Lausanne
8. March 2012
Before starting
• Sorry for my bad English and my poor French
• Interrupt me anytime
• I have nothing to sell. It's just worth while sharing
• Please do ask questions
5. JUG Lausanne
8. March 2012
Java Memory
• Automatic memory allocation
• Garbage collector (GC)
6. JUG Lausanne
8. March 2012
Garbage Collector
• Several types of GC
– Serial GC
– Parallel GC (throughput collector)
– Concurrent Mark & Sweep GC (concurrent
low pause collector)
– G1 GC (low latency concurrent M&S)
7. JUG Lausanne
8. March 2012
Garbage Collector
• But all GC have a stop-the-world
behavior
• Proportional to the memory's size
• Resulting in application
unresponsiveness
– A pain when dealing with tight SLAs
8. JUG Lausanne
8. March 2012
Cache On-Heap vs. Off-Heap
• On-heap
– Objects tends to be promoted into tenured
memory
– GC storm effect when using refreshing
cache
– No overhead (for caching by reference)
9. JUG Lausanne
8. March 2012
On-Heap vs. Off-Heap
• Off-heap
– Object payload is no more affecting GC
– Serialization/Deserialization overhead
• Hopefully lots of work on serialization has been
done (Protobuf, Avro, Thrift, msgpack,
BSON, ...)
10. JUG Lausanne
8. March 2012
Apache Direct Memory
Apache Direct Memory is a multi
layered cache implementation featuring
off-heap memory storage to enable
caching of java objects without
degrading jvm performance.
→ Opensource implementation of
Terracotta BigMemory.
11. JUG Lausanne
8. March 2012
Apache Direct Memory
• Apache Software Foundation Incubator project
• Met the Incubator falls 2011
• 12 developers ATM, 10+ contributors
• I joined 1st January 2012
– was the good achievement of my Hacky Christmas Holiday :)
• Disclaimer : Under heavy development
– I rewrote most of the memory allocation service
– APIs are subject to changes, and bugs to be found
12. JUG Lausanne
8. March 2012
Design & Principles
• ByteBuffer.allocateDirect is the
foundation of the cache
• ByteBuffers are allocated in big chunk
and then splitted for internal use
13. JUG Lausanne
8. March 2012
Design & Principles
• Build on layers :
– CachingService
• Serialize object (pluggable)
– MemoryManagerService
• Compute access statistics
– ByteBufferAllocatorService
• Eventually deal with ByteBuffers
14. JUG Lausanne
8. March 2012
ByteBuffers Allocation
2 different allocation's strategies
• Merging ByteBuffers allocation
– No memory wasted
– Free at creation
– Suffer from fragmentation
– Need synchronization at allocation and
deallocation
15. JUG Lausanne
8. March 2012
ByteBuffers Allocation
• Fixed size ByteBuffers allocation
– Linux kernel SLAB's style allocation
• Select a set of fixed sizes
• Split bigger buffers (1MB+) in that size
– Allocation is really fast and good concurrency
• All structures is pre-instanciated
– Creation (or buffer's size increase) has a cost
• 1GB split in 128 bytes slabs is 8M+ buffers created
– Do not suffer from fragmentation
– Waste memory if the selected size is not relevant
• Work really well in HDFS where all blocks are of the same size
16. JUG Lausanne
8. March 2012
Use case 1 : Multi layers
cache
• Idea : most used objects are cached on-heap,
the rest off-heap, may overflown to disk.
• Sounds like BigMemory.
• See
net.sf.ehcache.store.offheap.OffHeapStore
• Actually we inject DM in ehcache like do
BigMemory. Ouch ;)
• Comparison needs to be done
17. JUG Lausanne
8. March 2012
Use case 2 : OffHeap Output
Stream
• Idea : read Twitter firehose stream without
filling the precious heap memory
– OOM will lead to unpredictable behavior else where in the
application
• From your socket directly write off-heap using
OutputStream style
– allocate a fixed size temporary buffer of your choice
• Read from this stream
– InputAndOutputStream parent class that holds both
OutputStream and InputStream instances
18. JUG Lausanne
8. March 2012
Use case 3 : Standalone
cache server
• Idea : replace Memcached :)
– But with native plain REST API
• DM has all the building blocks to implement
such server, worth while trying
• See the server submodule
19. JUG Lausanne
8. March 2012
Next Steps
• JSR 107
• Real Benchmarks
• Builder patterns
• Integration with more libs (Spring, Guice, …)
• Implementations with DM lib (Cassandra (wip), Lucene,
Tomcat, …)
• Cache Resizing
• Management and monitoring
• ...
• https://issues.apache.org/jira/browse/DIRECTMEMORY
20. JUG Lausanne
8. March 2012
Questions ?
• Thanks for you attention