• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Supercharging Cassandra - GOTO Amsterdam

Supercharging Cassandra - GOTO Amsterdam






Total Views
Views on SlideShare
Embed Views



4 Embeds 9

http://a0.twimg.com 6
http://paper.li 1
http://www.twylah.com 1
https://twitter.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Supercharging Cassandra - GOTO Amsterdam Supercharging Cassandra - GOTO Amsterdam Presentation Transcript

    • Supercharging Cassandra... Tom Wilkie Founder & VP Engineering @tom_wilkie
    • Before the Flood 1990 Small databases BTree indexes BTree File systems RAID Old hardware
    • Two Revolutions 2010 Distributed, shared-nothing databasesWrite-optimised indexes Write-optimised indexesBTree file systems BTree file systems RAID ... RAID New hardware New hardware
    • Bridging the Gap 2011 Distributed, shared-nothing databases Castle Castle ...New hardware New hardware
    • Big Data Applications Memcached Open APIManagement ...Deployment . . . . . . . . .Monitoring ... ... ... ... ... ... ... Acunu Storage Core ... ... Cross-Cluster Management UI
    • 1. Predictability
    • Small random inserts Inserting 3 billion rows Acunu powered Cassandra - ‘standard’ Cassandra -
    • Insert latencyWhile inserting 3 billion rows Acunu powered Cassandra x ‘standard’ Cassandra +
    • Small random range queriesPerformed immediately after inserts Acunu powered Cassandra - ‘standard’ Cassandra -
    • Performance summary Standard Acunu Benefitsinserts rate ~32k/s ~45k/s >1.4x95% latency ~32s ~0.3s >100x gets rate ~100/s ~350/s >3.5x95% latency ~2s ~0.5s >4xrange queries ~0.4/s ~40/s >100x 95% latency ~15s ~2s >7.5x
    • Doubling Array Inserts 2 2 9 9 Buffer arrays in memoryuntil we have > B of them
    • Doubling Array Inserts11 2 9 2 8 9 11 8 8 11 etc...Similar to log-structured merge trees (LSM), cache-oblivious lookahead array (COLA), ...
    • Demohttps://acunu-videos.s3.amazonaws.com/dajs.html
    • 8KB @ 100MB/s, w/ 8ms seek 100 / 5 = 100 IOs/s = 20 updates/s~ log (2^30)/log 100= 5 IOs/update Range Query Update (Size Z) O(logB N) O(Z/B) B-Tree random IOs random IOs O((log N)/B) O(Z/B) Doubling Array sequential IOs sequential IOs ~ log (2^30)/100 8KB @ 100MB/s 13k / 0.2= 0.2 IOs/update = 13k IOs/s = 65k updates/s B = “block size”, say 8KB at 100 bytes/entry ~= 100 entries
    • More Shared memory interface Castle keys Userspace Acunu Kerneluserspace interface values http://goo.gl/wXNDQ In-kernel async, shared memory ring workloads shared bufferskernelspace Streaming interface interface range key buffered key buffered queries insert value insert get value get Doubling Arrays • Opensource (GPLv2, MITdoubling arraymapping layer for user libraries) insert Bloom filters queues key get arrays x range arrays queries management key • insert merges http://bitbucket.org/acunu Arraysmapping layer •modlist btree key Version tree Loadable Kernel Module, insert btree http://goo.gl/gzihe key get btree targeting CentOS’s 2.6.18 range queries value arrays • Cacheblock mapping & http://www.acunu.com/ cacheing layer "Extent" layer prefetcher extent block extent cache blogs/andy-twigg/why- freespace allocator manager flusher & mapper page cache acunu-kernel/linuxs block & Linux Kernel MM layers Block layer Memory manager
    • 2. Monitoring
    • jQuery VisualVM
    • mx4j: Rest-JMX adapter Munin, Nagios etc
    • 3. Operations
    • -bash-3.2$ nodetool...Available commands: ring - Print informations on the token ring join - Join the ring info - Print node informations (uptime, load, ...) cfstats - Print statistics on column families version - Print cassandra version tpstats - Print usage statistics of thread pools drain - Drain the node (stop accepting writes and flush all column families) decommission - Decommission the node compactionstats - Print statistics on compactions disablegossip - Disable gossip (effectively marking the node dead) enablegossip - Reenable gossip disablethrift - Disable thrift server enablethrift - Reenable thrift server netstats [host] - Print network information on provided host (connecting node by defau move <new token> - Move node on the token ring to a new token removetoken status|force|<token> - Show status of current token removal, force completion of setcompactionthroughput <value_in_mb> - Set the MB/s throughput cap for compaction in the sys snapshot [keyspaces...] -t [snapshotName] - Take a snapshot of the specified keyspaces using clearsnapshot [keyspaces...] -t [snapshotName] - Remove snapshots for the specified keyspaces flush [keyspace] [cfnames] - Flush one or more column family repair [keyspace] [cfnames] - Repair one or more column family cleanup [keyspace] [cfnames] - Run cleanup on one or more column family compact [keyspace] [cfnames] - Force a (major) compaction on one or more column family scrub [keyspace] [cfnames] - Scrub (rebuild sstables for) one or more column family invalidatekeycache [keyspace] [cfnames] - Invalidate the key cache of one or more column fami invalidaterowcache [keyspace] [cfnames] - Invalidate the key cache of one or more column fami getcompactionthreshold <keyspace> <cfname> - Print min and max compaction thresholds for a gi cfhistograms <keyspace> <cfname> - Print statistic histograms for a given column family setcachecapacity <keyspace> <cfname> <keycachecapacity> <rowcachecapacity> - Set the key and setcompactionthreshold <keyspace> <cfname> <minthreshold> <maxthreshold> - Set the min and ma
    • SH OT S*SN AP * And clones!
    • v0 v1 v2v5 v3 v4 v6
    • Rebuild
    • Disk Layout: RDA random duplicate allocation 4 2 1 4 5 2 5 3 1 3 7 10 7 6 8 9 9 10 6 8 15 12 14 11 13 14 11 12 13 15 16 16
    • Future
    • Memcache + Cassandraget/insert get/put Cass client memcached 100k random inserts/sec! Cassandra memcache Cassandra memcache Castle Castle ... H/W H/W
    • v1 v1 v1 v1v12 v13 v15 v12 v13 v15 v12 v13 v15 v12 v13 v15 v16 v24 v16 v24 v16 v24 v16 v24
    • ~device capacityBeware the “write cliff”...
    • • Castle: Predictable Performance for Big Data• Monitoring: distributed, multi- master tools, give you aggregated and summarised view of your cluster• Snapshots & Clones: addressing real problems with new workloads• RDA: lightening fast rebuilds for massive disks
    • Questions? Tom Wilkie @tom_wilkie tom@acunu.com http://bitbucket.org/acunu http://github.com/acunuhttp://www.acunu.com/download http://www.acunu.com/insights