Your SlideShare is downloading. ×
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

13,133
views

Published on

Published in: Technology

0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
13,133
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
141
Comments
0
Likes
15
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Dealing with JVM limitationsin Apache CassandraJonathan Ellis / @spyced
  • 2. Pain points for Java databases✤ GC✤ GC✤ GC
  • 3. Pain points for Java databases✤ GC✤ Platform specific code
  • 4. GC✤ Concurrent and compacting: choose one ✤ G1 ✤ Azul C4 / Zing?
  • 5. Fragmentation✤ Bloom filter arrays✤ Compression offsets
  • 6. Automatic mitigation?✤ http://www.research.ibm.com/people/d/dfb/papers/Bacon03Controlling.pdf✤ http://researcher.ibm.com/files/us-hirzel/pldi10-arraylets.pdf
  • 7. Fragmentation, 2✤ Arena allocation for memtables
  • 8. (Memtables?) write( k1 , c1:v1 ) Memory Memtable Commit log Hard drive
  • 9. write( k1 , c1:v ) Memory k1 c1:v Memtable k1 c1:vCommit log Hard drive
  • 10. write( k1 , c2:v ) Memory k1 c1:v c2:v k1 c1:v k1 c2:v Hard drive
  • 11. write( k2 , c1:v c2:v ) Memory k1 c1:v c2:v k2 c1:v c2:v k1 c1:v k1 c2:v k2 c1:v c2:v Hard drive
  • 12. write( k1 , c1:v c3:v ) Memory k1 c1:v c2:v c3:v k2 c1:v c2:v k1 c1:v k1 c2:v k2 c1:v c2:v k1 c1:v c3:v Hard drive
  • 13. Memory flush indexcleanup k1 c1:v c2:v c3:v k2 c1:v c2:v SSTable Hard drive
  • 14. “Java is a memory hog”✤ Large overhead for typical objects and collections✤ How large?✤ java.lang.instrument.Instrumentation ✤ JAMM: Java Agent for Memory Measurements ✤ https://github.com/jbellis/jamm
  • 15. org.apache.cassandra.cache.SerializingCache✤ Live objects are about 85% JVM bookeeping✤ org.apache.cassandra.cache.FreeableMemory using reference counting✤ Considering doing reference-counted, off-heap memtables as well
  • 16. Don’t forget about young gen✤ Always stop-the-world for ~100ms
  • 17. Platform-specific code✤ OS✤ JVM
  • 18. m[un]map✤ Log-structured storage wants to remove old files post- compaction; some platforms disallow deleting open files✤ Old workaround (pre-1.0): ✤ use PhantomReference to tell when mmap’d file is GC (hence unmapped) ✤ Poor user experience and messy corner cases✤ New workaround: ✤ Class.forName("sun.nio.ch.DirectBuffer").getMethod("cleaner")
  • 19. mmap part 2✤ 2GB limit via ByteBuffer: public abstract byte get(int index)✤ Workaround: MmappedSegmentedFile public Iterator<DataInput> iterator(long position)
  • 20. link✤ Used for snapshots✤ Old workaround: JNA✤ New workaround: supported directly by Java7
  • 21. mlockall✤ swappiness: pissing off database developers since 2001 (?)✤ mlockall(MCL_CURRENT)
  • 22. Low-level i/o✤ posix_fadvise✤ mincore/fincore✤ fctl✤ ... JNA
  • 23. A plug for JNA✤ https://github.com/twall/jna static { try { Native.register("c"); ... private static native int mlockall(int flags) throws LastErrorException;
  • 24. The fallacy of choosing portability over power✤ Applets have been dead for years✤ Python gets it right ✤ import readline
  • 25. The fallacy of choosing safety over power✤ Allowing munmap would expose developers to segfaults✤ But, relying on the GC to clean up external resources is a well-known antipattern ✤ File.close✤ We need munmap badly enough that we resort to unnatural and unportable code to get it ✤ You haven’t kept us from risking segfaults, you’ve just made us miserable
  • 26. Compatibility through obscurity?✤ sun.misc.Unsafe✤ Used by high-profile libraries like high-scale-lib
  • 27. ... even public options http://blogs.oracle.com/dave/entry/false_sharing_induced_by_card
  • 28. Too negative?
  • 29. Still true✤ "Many concurrent algorithms are very easy to write with a GC and totally hard (to down right impossible) using explicit free." -- Cliff Click