Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)


Published on

Published in: Technology

Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

  1. Dealing with JVM limitationsin Apache CassandraJonathan Ellis / @spyced
  2. Pain points for Java databases✤ GC✤ GC✤ GC
  3. Pain points for Java databases✤ GC✤ Platform specific code
  4. GC✤ Concurrent and compacting: choose one ✤ G1 ✤ Azul C4 / Zing?
  5. Fragmentation✤ Bloom filter arrays✤ Compression offsets
  6. Automatic mitigation?✤✤
  7. Fragmentation, 2✤ Arena allocation for memtables
  8. (Memtables?) write( k1 , c1:v1 ) Memory Memtable Commit log Hard drive
  9. write( k1 , c1:v ) Memory k1 c1:v Memtable k1 c1:vCommit log Hard drive
  10. write( k1 , c2:v ) Memory k1 c1:v c2:v k1 c1:v k1 c2:v Hard drive
  11. write( k2 , c1:v c2:v ) Memory k1 c1:v c2:v k2 c1:v c2:v k1 c1:v k1 c2:v k2 c1:v c2:v Hard drive
  12. write( k1 , c1:v c3:v ) Memory k1 c1:v c2:v c3:v k2 c1:v c2:v k1 c1:v k1 c2:v k2 c1:v c2:v k1 c1:v c3:v Hard drive
  13. Memory flush indexcleanup k1 c1:v c2:v c3:v k2 c1:v c2:v SSTable Hard drive
  14. “Java is a memory hog”✤ Large overhead for typical objects and collections✤ How large?✤ java.lang.instrument.Instrumentation ✤ JAMM: Java Agent for Memory Measurements ✤
  15. org.apache.cassandra.cache.SerializingCache✤ Live objects are about 85% JVM bookeeping✤ org.apache.cassandra.cache.FreeableMemory using reference counting✤ Considering doing reference-counted, off-heap memtables as well
  16. Don’t forget about young gen✤ Always stop-the-world for ~100ms
  17. Platform-specific code✤ OS✤ JVM
  18. m[un]map✤ Log-structured storage wants to remove old files post- compaction; some platforms disallow deleting open files✤ Old workaround (pre-1.0): ✤ use PhantomReference to tell when mmap’d file is GC (hence unmapped) ✤ Poor user experience and messy corner cases✤ New workaround: ✤ Class.forName("").getMethod("cleaner")
  19. mmap part 2✤ 2GB limit via ByteBuffer: public abstract byte get(int index)✤ Workaround: MmappedSegmentedFile public Iterator<DataInput> iterator(long position)
  20. link✤ Used for snapshots✤ Old workaround: JNA✤ New workaround: supported directly by Java7
  21. mlockall✤ swappiness: pissing off database developers since 2001 (?)✤ mlockall(MCL_CURRENT)
  22. Low-level i/o✤ posix_fadvise✤ mincore/fincore✤ fctl✤ ... JNA
  23. A plug for JNA✤ static { try { Native.register("c"); ... private static native int mlockall(int flags) throws LastErrorException;
  24. The fallacy of choosing portability over power✤ Applets have been dead for years✤ Python gets it right ✤ import readline
  25. The fallacy of choosing safety over power✤ Allowing munmap would expose developers to segfaults✤ But, relying on the GC to clean up external resources is a well-known antipattern ✤ File.close✤ We need munmap badly enough that we resort to unnatural and unportable code to get it ✤ You haven’t kept us from risking segfaults, you’ve just made us miserable
  26. Compatibility through obscurity?✤ sun.misc.Unsafe✤ Used by high-profile libraries like high-scale-lib
  27. ... even public options
  28. Too negative?
  29. Still true✤ "Many concurrent algorithms are very easy to write with a GC and totally hard (to down right impossible) using explicit free." -- Cliff Click