Your SlideShare is downloading. ×
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)


Published on

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Dealing with JVM limitationsin Apache CassandraJonathan Ellis / @spyced
  • 2. Pain points for Java databases✤ GC✤ GC✤ GC
  • 3. Pain points for Java databases✤ GC✤ Platform specific code
  • 4. GC✤ Concurrent and compacting: choose one ✤ G1 ✤ Azul C4 / Zing?
  • 5. Fragmentation✤ Bloom filter arrays✤ Compression offsets
  • 6. Automatic mitigation?✤✤
  • 7. Fragmentation, 2✤ Arena allocation for memtables
  • 8. (Memtables?) write( k1 , c1:v1 ) Memory Memtable Commit log Hard drive
  • 9. write( k1 , c1:v ) Memory k1 c1:v Memtable k1 c1:vCommit log Hard drive
  • 10. write( k1 , c2:v ) Memory k1 c1:v c2:v k1 c1:v k1 c2:v Hard drive
  • 11. write( k2 , c1:v c2:v ) Memory k1 c1:v c2:v k2 c1:v c2:v k1 c1:v k1 c2:v k2 c1:v c2:v Hard drive
  • 12. write( k1 , c1:v c3:v ) Memory k1 c1:v c2:v c3:v k2 c1:v c2:v k1 c1:v k1 c2:v k2 c1:v c2:v k1 c1:v c3:v Hard drive
  • 13. Memory flush indexcleanup k1 c1:v c2:v c3:v k2 c1:v c2:v SSTable Hard drive
  • 14. “Java is a memory hog”✤ Large overhead for typical objects and collections✤ How large?✤ java.lang.instrument.Instrumentation ✤ JAMM: Java Agent for Memory Measurements ✤
  • 15. org.apache.cassandra.cache.SerializingCache✤ Live objects are about 85% JVM bookeeping✤ org.apache.cassandra.cache.FreeableMemory using reference counting✤ Considering doing reference-counted, off-heap memtables as well
  • 16. Don’t forget about young gen✤ Always stop-the-world for ~100ms
  • 17. Platform-specific code✤ OS✤ JVM
  • 18. m[un]map✤ Log-structured storage wants to remove old files post- compaction; some platforms disallow deleting open files✤ Old workaround (pre-1.0): ✤ use PhantomReference to tell when mmap’d file is GC (hence unmapped) ✤ Poor user experience and messy corner cases✤ New workaround: ✤ Class.forName("").getMethod("cleaner")
  • 19. mmap part 2✤ 2GB limit via ByteBuffer: public abstract byte get(int index)✤ Workaround: MmappedSegmentedFile public Iterator<DataInput> iterator(long position)
  • 20. link✤ Used for snapshots✤ Old workaround: JNA✤ New workaround: supported directly by Java7
  • 21. mlockall✤ swappiness: pissing off database developers since 2001 (?)✤ mlockall(MCL_CURRENT)
  • 22. Low-level i/o✤ posix_fadvise✤ mincore/fincore✤ fctl✤ ... JNA
  • 23. A plug for JNA✤ static { try { Native.register("c"); ... private static native int mlockall(int flags) throws LastErrorException;
  • 24. The fallacy of choosing portability over power✤ Applets have been dead for years✤ Python gets it right ✤ import readline
  • 25. The fallacy of choosing safety over power✤ Allowing munmap would expose developers to segfaults✤ But, relying on the GC to clean up external resources is a well-known antipattern ✤ File.close✤ We need munmap badly enough that we resort to unnatural and unportable code to get it ✤ You haven’t kept us from risking segfaults, you’ve just made us miserable
  • 26. Compatibility through obscurity?✤ sun.misc.Unsafe✤ Used by high-profile libraries like high-scale-lib
  • 27. ... even public options
  • 28. Too negative?
  • 29. Still true✤ "Many concurrent algorithms are very easy to write with a GC and totally hard (to down right impossible) using explicit free." -- Cliff Click