Cassandra Presentation for San Antonio JUG


Published on

1 April 2010 given to the San Antonio Java User Group by Gary Dusbabek

Published in: Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Hello World
  • RandomPartitioner – takes key, uses MD5 as the real key, then stores on the appropriate node.OrderPreservingPartitioner– get cheap range scans. Takes more work.
  • Eric Brewer
  • Need to describe hinted handoff better.
  • Keyspace == like namespaceCF == like a tableKeyspace + Table used interchangeably in the code.
  • Key cache : keys whose location are kept in memory to avoid index scan.Row cache: entire rows kept in memory.
  • Avro: Doug Cutting
  • Mmap – index and data files (read only)
  • is low pause times and high throughput:-XX:TargetSurvivorRatio=90Allows 90% of the survivor spaces to be occupied instead of the default 50%, allowing better utilization of the survivor space memory. -XX:SurvivorRatio=128Sets survivor space ratio to 1:128, resulting in small survivor. Smaller survivor spaces allow short lived less time in the young generation (they die faster). -XX:+AggressiveOptsturns on point optimizations that are expected to be on in later releases. Experimental and sometimes reveals JDK bugs.-XX:+UseParNewGC -UseConcMarkSweepGCparallel young generation collector. Similar to +UsePareallelGC except can be used with the concurrent collector. See benefits here on multiway systems. Two pauses instead of one long pause (mark, then sweep). Mark: directly reachable (young). 2nd: objects missed due to concurrent execution of threads (the remark).-XX:+CMSParallelRemarkEnabledworks with UseParNewGC to decrease the remark pauses.
  • Cassandra Presentation for San Antonio JUG

    1. 1.
    2. 2. MySQL for Beginners<br />Gary Dusbabek<br />Rackspace<br />April Fools!!!11<br />
    3. 3. Apache<br />Gary Dusbabek<br />Rackspace<br />
    4. 4. What is Cassandra?<br />Key-value store (with some structure)<br />Highly scalable<br />Eventually consistent<br />Distributed<br />Tunable<br />Partitioning<br />Replication<br />
    5. 5. Where did it come from?<br />Created at Facebook<br />Dynamo: distribution architecture<br />BigTable: data model<br />Open-sourced in 2008<br />Apache incubator in early 2009<br />Graduation in March 2010<br />
    6. 6. Who uses it?<br />Rackspace<br />Facebook (of course)<br />Twitter<br />Digg<br />Reddit<br />IBM<br />Others…<br />
    7. 7. What problems does it solve?<br />Reliability at scale<br />No single point of failure (all nodes are identical)<br />Simple scaling<br />linear<br />High write throughput<br />Large data sets<br />
    8. 8. What problems can’t it solve?<br />No flexible indices<br />No querying on non PK values<br />Not good for big binary data (>64mb) unless you chunk<br />Row contents must fit in available memory<br />
    9. 9. Concepts: CAP<br />CAP Theorem<br />Consistency<br />Availability<br />Partition tolerance<br /><ul><li>Choose two
    10. 10. Cassandra chooses A and P but allows them to be tunable to have more C.</li></li></ul><li>Concepts: Denormalization<br />Ditch joins<br />Duplicate data<br />Structure data around queries<br />Normalized<br />Denormalized<br />
    11. 11. Concepts: Replication & Consistency<br />You specify replication factor<br />You specify consistency level for read/write operations<br />ZERO, ONE, QUORUM, ALL, ANY<br />
    12. 12. Ring Topology<br />Storage ring<br />Every node gets a token<br />Defines its place in the storage ring<br />And which keys it is responsible for (its ranges)<br />RF=3<br />a<br />j<br />d<br />g<br />
    13. 13. Ring Topology<br />Storage ring<br />Every node gets a token<br />Defines its place in the storage ring<br />And which keys it is responsible for (its ranges)<br />RF=2<br />a<br />j<br />d<br />g<br />
    14. 14. Ring: New Node<br />New node<br />Ranges are adjusted<br />RF=3<br />a<br />m<br />j<br />d<br />g<br />
    15. 15. Ring: New Node<br />New node<br />Ranges are adjusted<br />RF=2<br />a<br />m<br />j<br />d<br />g<br />
    16. 16. Ring Partition<br />Node dies or becomes isolated from the ring<br />Hints<br />Handoff<br />RF=3<br />a<br />m<br />j<br />d<br />g<br />
    17. 17. Data Model<br />Keyspace-contains column families<br />ColumnFamily<br />Standard or Super<br />Two levels of indexes (key and column name)<br />
    18. 18. Data Model<br />Column and subcolumn sorting<br />Specify your own comparator:<br />TimeUUID<br />LexicalUUID<br />UTF8<br />Long<br />Bytes<br />CreateYourOwn<br />
    19. 19. Data Model<br />Standard Column Family<br />
    20. 20. Data Model<br />Super Column Family<br />
    21. 21. Inserting: Overview<br />Simple: put(key, col, value)<br />Complex: put(key, [col:value, …, col:value])<br />Batch: multi key.<br />
    22. 22. Inserting: Writes<br />Commit log for durability<br />Memtable – no disk access (no reads or seeks)<br />Sstables are final (become read only)<br />Index<br />Bloom filter<br />Raw data<br />Atomic within a ColumnFamily<br />Bottom line: FAST!!!<br />
    23. 23. Querying: Overview<br />You need a key or keys:<br />Single: key=‘a’<br />Range: key=‘a’ through ’f’<br />And columns to retrieve:<br />Slice: cols={bar through kite}<br />By name: key=‘b’ cols={bar, cat, llama}<br />Nothing like SQL “WHERE col=‘faz’”<br />But secondary indices are being worked on (see CASSANDRA-749)<br />
    24. 24. Querying: Reads<br />Not as fast as writes<br />Read repair when out of sync<br />New in 0.6:<br />Row cache (avoid sstable lookup)<br />Key cache (avoid index scan)<br />
    25. 25. Client API (Low level)<br />Fat Client<br />Maybe too low level, not well-tested<br />Thrift (currently best-supported)<br />Many language bindings<br />Not much of a community<br />No streaming<br />Fast transport<br />Avro<br />Just getting started<br />Shows promise<br />
    26. 26. Client API (High Level)<br />Rapidly changing, getting feature-rich<br />Connection pools<br />Load balancing/Failover<br />Reduces the verbosity of working with thrift<br />For Java, see Hector<br /><br />Also Ruby, Python, C++, C#, Perl, PHP<br /><br />
    27. 27. Java Bits: JMX<br />Relatively easy to expose objects and services as MBeans<br />Simplifies aspects of cluster and node management<br />Easy monitoring<br />You choose the JMX-enabled system management tool (jconsole is alright)<br />
    28. 28. Java Bits: available libraries<br />Excellent:<br />Google collections<br />Multimap, BiMap, Iterators<br />java.util.concurrency<br />nio files (including mmap)<br />Meh:<br />nio sockets<br />
    29. 29. Java Bits: Heap & GC<br />Cassandra tweaks the default GC settings quite a bit:<br />XX:+UseParNewGC<br />XX:+UseConcMarkSweepGC<br />XX:+CMSParallelRemarkEnabled<br />XX:TargetSurvivorRatio=90<br />XX:SurvivorRatio=128<br />XX:MaxTenuringThreshold=0<br />XX:+HeapDumpOnOutOfMemoryError<br />XX:+AggressiveOpts<br />
    30. 30. Java Bits: code management<br />Library versioning<br />No standard way<br />Mostly declarative<br />Not readily queryable<br />Must ship every dependency<br />Or use ant/mvn.<br />Now you have two (or more!) problems.<br />
    31. 31. Java Bits: daemonization<br />Java doesn’t make it easy re: stdout, stderr<br />After setting up, System.out and System.err are close()d<br />Windows: don’t ask<br />
    32. 32. Future Direction<br />Range delete (delete these cols from those keys)<br />Vector clocks (including server-side conflict resolution)<br />Altering keyspace/column family definitions on a live cluster<br />Byte[] keys<br />Compression<br />Multi-tenant support<br />Less memory restrictions<br />
    33. 33. Linky<br /><br /><br />Google BigTable<br /><br />Amazon Dynamo<br /><br />Facebook Cassandra <br /><br />Java tuning:<br /><br /><br />Me<br /><br />gdusbabek on twitter and just about everything else.<br />