• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)

Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)






Total Views
Views on SlideShare
Embed Views



20 Embeds 2,870

http://nosql.mypopescu.com 2540
http://www.scoop.it 108
http://irr.posterous.com 92
http://a0.twimg.com 42
http://feeds.feedburner.com 26
http://localhost 12
http://us-w1.rockmelt.com 12
https://twitter.com 12
http://www.hanrss.com 8
http://www.newsblur.com 6
http://10gt.com 2
https://si0.twimg.com 2
https://twimg0-a.akamaihd.net 1
http://www.linkedin.com 1
http://posterous.com 1
http://tumblr.hootsuite.com 1
http://feedproxy.google.com 1
http://paper.li 1
http://tweetedtimes.com 1
http://webcache.googleusercontent.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Dealing with JVM limitations in Apache Cassandra (Fosdem 2012) Dealing with JVM limitations in Apache Cassandra (Fosdem 2012) Presentation Transcript

    • Dealing with JVM limitationsin Apache CassandraJonathan Ellis / @spyced
    • Pain points for Java databases✤ GC✤ GC✤ GC
    • Pain points for Java databases✤ GC✤ Platform specific code
    • GC✤ Concurrent and compacting: choose one ✤ G1 ✤ Azul C4 / Zing?
    • Fragmentation✤ Bloom filter arrays✤ Compression offsets
    • Automatic mitigation?✤ http://www.research.ibm.com/people/d/dfb/papers/Bacon03Controlling.pdf✤ http://researcher.ibm.com/files/us-hirzel/pldi10-arraylets.pdf
    • Fragmentation, 2✤ Arena allocation for memtables
    • (Memtables?) write( k1 , c1:v1 ) Memory Memtable Commit log Hard drive
    • write( k1 , c1:v ) Memory k1 c1:v Memtable k1 c1:vCommit log Hard drive
    • write( k1 , c2:v ) Memory k1 c1:v c2:v k1 c1:v k1 c2:v Hard drive
    • write( k2 , c1:v c2:v ) Memory k1 c1:v c2:v k2 c1:v c2:v k1 c1:v k1 c2:v k2 c1:v c2:v Hard drive
    • write( k1 , c1:v c3:v ) Memory k1 c1:v c2:v c3:v k2 c1:v c2:v k1 c1:v k1 c2:v k2 c1:v c2:v k1 c1:v c3:v Hard drive
    • Memory flush indexcleanup k1 c1:v c2:v c3:v k2 c1:v c2:v SSTable Hard drive
    • “Java is a memory hog”✤ Large overhead for typical objects and collections✤ How large?✤ java.lang.instrument.Instrumentation ✤ JAMM: Java Agent for Memory Measurements ✤ https://github.com/jbellis/jamm
    • org.apache.cassandra.cache.SerializingCache✤ Live objects are about 85% JVM bookeeping✤ org.apache.cassandra.cache.FreeableMemory using reference counting✤ Considering doing reference-counted, off-heap memtables as well
    • Don’t forget about young gen✤ Always stop-the-world for ~100ms
    • Platform-specific code✤ OS✤ JVM
    • m[un]map✤ Log-structured storage wants to remove old files post- compaction; some platforms disallow deleting open files✤ Old workaround (pre-1.0): ✤ use PhantomReference to tell when mmap’d file is GC (hence unmapped) ✤ Poor user experience and messy corner cases✤ New workaround: ✤ Class.forName("sun.nio.ch.DirectBuffer").getMethod("cleaner")
    • mmap part 2✤ 2GB limit via ByteBuffer: public abstract byte get(int index)✤ Workaround: MmappedSegmentedFile public Iterator<DataInput> iterator(long position)
    • link✤ Used for snapshots✤ Old workaround: JNA✤ New workaround: supported directly by Java7
    • mlockall✤ swappiness: pissing off database developers since 2001 (?)✤ mlockall(MCL_CURRENT)
    • Low-level i/o✤ posix_fadvise✤ mincore/fincore✤ fctl✤ ... JNA
    • A plug for JNA✤ https://github.com/twall/jna static { try { Native.register("c"); ... private static native int mlockall(int flags) throws LastErrorException;
    • The fallacy of choosing portability over power✤ Applets have been dead for years✤ Python gets it right ✤ import readline
    • The fallacy of choosing safety over power✤ Allowing munmap would expose developers to segfaults✤ But, relying on the GC to clean up external resources is a well-known antipattern ✤ File.close✤ We need munmap badly enough that we resort to unnatural and unportable code to get it ✤ You haven’t kept us from risking segfaults, you’ve just made us miserable
    • Compatibility through obscurity?✤ sun.misc.Unsafe✤ Used by high-profile libraries like high-scale-lib
    • ... even public options http://blogs.oracle.com/dave/entry/false_sharing_induced_by_card
    • Too negative?
    • Still true✤ "Many concurrent algorithms are very easy to write with a GC and totally hard (to down right impossible) using explicit free." -- Cliff Click