Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra Presentation for San Antonio JUG


Published on

1 April 2010 given to the San Antonio Java User Group by Gary Dusbabek

Published in: Technology

Cassandra Presentation for San Antonio JUG

  1. 1.
  2. 2. MySQL for Beginners<br />Gary Dusbabek<br />Rackspace<br />April Fools!!!11<br />
  3. 3. Apache<br />Gary Dusbabek<br />Rackspace<br />
  4. 4. What is Cassandra?<br />Key-value store (with some structure)<br />Highly scalable<br />Eventually consistent<br />Distributed<br />Tunable<br />Partitioning<br />Replication<br />
  5. 5. Where did it come from?<br />Created at Facebook<br />Dynamo: distribution architecture<br />BigTable: data model<br />Open-sourced in 2008<br />Apache incubator in early 2009<br />Graduation in March 2010<br />
  6. 6. Who uses it?<br />Rackspace<br />Facebook (of course)<br />Twitter<br />Digg<br />Reddit<br />IBM<br />Others…<br />
  7. 7. What problems does it solve?<br />Reliability at scale<br />No single point of failure (all nodes are identical)<br />Simple scaling<br />linear<br />High write throughput<br />Large data sets<br />
  8. 8. What problems can’t it solve?<br />No flexible indices<br />No querying on non PK values<br />Not good for big binary data (>64mb) unless you chunk<br />Row contents must fit in available memory<br />
  9. 9. Concepts: CAP<br />CAP Theorem<br />Consistency<br />Availability<br />Partition tolerance<br /><ul><li>Choose two
  10. 10. Cassandra chooses A and P but allows them to be tunable to have more C.</li></li></ul><li>Concepts: Denormalization<br />Ditch joins<br />Duplicate data<br />Structure data around queries<br />Normalized<br />Denormalized<br />
  11. 11. Concepts: Replication & Consistency<br />You specify replication factor<br />You specify consistency level for read/write operations<br />ZERO, ONE, QUORUM, ALL, ANY<br />
  12. 12. Ring Topology<br />Storage ring<br />Every node gets a token<br />Defines its place in the storage ring<br />And which keys it is responsible for (its ranges)<br />RF=3<br />a<br />j<br />d<br />g<br />
  13. 13. Ring Topology<br />Storage ring<br />Every node gets a token<br />Defines its place in the storage ring<br />And which keys it is responsible for (its ranges)<br />RF=2<br />a<br />j<br />d<br />g<br />
  14. 14. Ring: New Node<br />New node<br />Ranges are adjusted<br />RF=3<br />a<br />m<br />j<br />d<br />g<br />
  15. 15. Ring: New Node<br />New node<br />Ranges are adjusted<br />RF=2<br />a<br />m<br />j<br />d<br />g<br />
  16. 16. Ring Partition<br />Node dies or becomes isolated from the ring<br />Hints<br />Handoff<br />RF=3<br />a<br />m<br />j<br />d<br />g<br />
  17. 17. Data Model<br />Keyspace-contains column families<br />ColumnFamily<br />Standard or Super<br />Two levels of indexes (key and column name)<br />
  18. 18. Data Model<br />Column and subcolumn sorting<br />Specify your own comparator:<br />TimeUUID<br />LexicalUUID<br />UTF8<br />Long<br />Bytes<br />CreateYourOwn<br />
  19. 19. Data Model<br />Standard Column Family<br />
  20. 20. Data Model<br />Super Column Family<br />
  21. 21. Inserting: Overview<br />Simple: put(key, col, value)<br />Complex: put(key, [col:value, …, col:value])<br />Batch: multi key.<br />
  22. 22. Inserting: Writes<br />Commit log for durability<br />Memtable – no disk access (no reads or seeks)<br />Sstables are final (become read only)<br />Index<br />Bloom filter<br />Raw data<br />Atomic within a ColumnFamily<br />Bottom line: FAST!!!<br />
  23. 23. Querying: Overview<br />You need a key or keys:<br />Single: key=‘a’<br />Range: key=‘a’ through ’f’<br />And columns to retrieve:<br />Slice: cols={bar through kite}<br />By name: key=‘b’ cols={bar, cat, llama}<br />Nothing like SQL “WHERE col=‘faz’”<br />But secondary indices are being worked on (see CASSANDRA-749)<br />
  24. 24. Querying: Reads<br />Not as fast as writes<br />Read repair when out of sync<br />New in 0.6:<br />Row cache (avoid sstable lookup)<br />Key cache (avoid index scan)<br />
  25. 25. Client API (Low level)<br />Fat Client<br />Maybe too low level, not well-tested<br />Thrift (currently best-supported)<br />Many language bindings<br />Not much of a community<br />No streaming<br />Fast transport<br />Avro<br />Just getting started<br />Shows promise<br />
  26. 26. Client API (High Level)<br />Rapidly changing, getting feature-rich<br />Connection pools<br />Load balancing/Failover<br />Reduces the verbosity of working with thrift<br />For Java, see Hector<br /><br />Also Ruby, Python, C++, C#, Perl, PHP<br /><br />
  27. 27. Java Bits: JMX<br />Relatively easy to expose objects and services as MBeans<br />Simplifies aspects of cluster and node management<br />Easy monitoring<br />You choose the JMX-enabled system management tool (jconsole is alright)<br />
  28. 28. Java Bits: available libraries<br />Excellent:<br />Google collections<br />Multimap, BiMap, Iterators<br />java.util.concurrency<br />nio files (including mmap)<br />Meh:<br />nio sockets<br />
  29. 29. Java Bits: Heap & GC<br />Cassandra tweaks the default GC settings quite a bit:<br />XX:+UseParNewGC<br />XX:+UseConcMarkSweepGC<br />XX:+CMSParallelRemarkEnabled<br />XX:TargetSurvivorRatio=90<br />XX:SurvivorRatio=128<br />XX:MaxTenuringThreshold=0<br />XX:+HeapDumpOnOutOfMemoryError<br />XX:+AggressiveOpts<br />
  30. 30. Java Bits: code management<br />Library versioning<br />No standard way<br />Mostly declarative<br />Not readily queryable<br />Must ship every dependency<br />Or use ant/mvn.<br />Now you have two (or more!) problems.<br />
  31. 31. Java Bits: daemonization<br />Java doesn’t make it easy re: stdout, stderr<br />After setting up, System.out and System.err are close()d<br />Windows: don’t ask<br />
  32. 32. Future Direction<br />Range delete (delete these cols from those keys)<br />Vector clocks (including server-side conflict resolution)<br />Altering keyspace/column family definitions on a live cluster<br />Byte[] keys<br />Compression<br />Multi-tenant support<br />Less memory restrictions<br />
  33. 33. Linky<br /><br /><br />Google BigTable<br /><br />Amazon Dynamo<br /><br />Facebook Cassandra <br /><br />Java tuning:<br /><br /><br />Me<br /><br />gdusbabek on twitter and just about everything else.<br />