#CASSANDRA13Ma#hew	  Stump	  |	  Architect	  @	  KISSmetricsReal-time Large Queries
#CASSANDRA13
#CASSANDRA13KISSmetrics Customers Want*Churn Prediction*AB Tests*Which Blog Posts and Ad Campaigns Attract High ValueCusto...
#CASSANDRA13Understanding Queries
#CASSANDRA13RowKey username first_name last_name postal_codecstar cstar Cassandra Database 94110user2 user2 Some Guy 94112
#CASSANDRA13RowKey username first_name last_name postal_codecstar cstar Cassandra Database 94110user2 user2 Some Guy 94112
#CASSANDRA13RowKey username first_name last_name postal_codecstar cstar Cassandra Database 94110user2 user2 Some Guy 94112
#CASSANDRA13RowKey94110 cstar94112 user2 user4 user7 ...
#CASSANDRA13Where Secondary Indexes BreakSource: Place source content or footnote here. Delete if not needed.*High Cardina...
#CASSANDRA13What Do I Want?Source: Place source content or footnote here. Delete if not needed.*Index high cardinality dat...
#CASSANDRA13Bitmap and Bit-Slice Indexes
#CASSANDRA13
#CASSANDRA13RowKey94110 cstar94112 user2 user4 user7 ...
#CASSANDRA13RowKey94110 00001000 01000000 00000000 00000000094112 10000110 01000000 00000000 000000000
#CASSANDRA13RowKey94110 00001000 01000000 00000000 00000000094112 10000110 01000000 00000000 000000000hash(“cstar”) = 4
#CASSANDRA13SELECT * FROM users WHERE zipcode = 94110 OR zipcode = 9411294112 or9411010001110 01000000 00000000 000000000F...
#CASSANDRA13SELECT * FROM users WHERE Event1 = true AND Event2 = trueEvent1 andEvent210000010 01000000 00000000 000000000F...
#CASSANDRA13Field Value Sliceevent_counter 1 10001010 01000000 00000000 000000000event_counter 2 10000110 01000000 0000000...
#CASSANDRA13"this is a test string"
#CASSANDRA13[thi, s i, s a,  te, st , str, ing]
#CASSANDRA13[0x746869, 0x732069, 0x732061, 0x207465,0x737420, 0x737472, 0x696e67]
#CASSANDRA13Field Value Slicetext_field 0x207465  te 10001010 01000000 00000000 000000000text_field 0x696e67 ing 10111110 ...
#CASSANDRA13"thi.*ing"
#CASSANDRA13"thi" AND "ing"
#CASSANDRA130x746869 AND 0x696e67
#CASSANDRA13Field Value Slicetext_field 0x207465  te 10001010 01000000 00000000 000000000text_field 0x696e67 ing 10111110 ...
#CASSANDRA13"th.*ing"
#CASSANDRA13"th" AND "ing"
#CASSANDRA13range(0x746800, 0x7468FF) AND 0x696e67range("th" + 0x00, "th" + 0xFF) AND "ing"
#CASSANDRA13Field Value Slicetext_field 0x207465  te 10001010 01000000 00000000 000000000text_field 0x696e67 ing 10111110 ...
#CASSANDRA13
#CASSANDRA13Implementation
#CASSANDRA13Query &Indexing EngineQueries andEvents
#CASSANDRA13RowKey Offset 0x00 Offset 0x01 Offset 0x02 Offset 0x03event1_0x00 10011000 10011000event1_0x01 10011000 100110...
#CASSANDRA13Results So Far*Results returned for an 8 clause query for 4 billion rows < 2second*Full regular expression sup...
#CASSANDRA13What isnt finished*Support for atomic counters*"Group By" query aggregation*Still working on event processing ...
#CASSANDRA13https://github.com/project-z/
#CASSANDRA13mstump@kissmetrics.com@mattstump
#CASSANDRA13THANK YOU
Upcoming SlideShare
Loading in …5
×

C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

1,809 views

Published on

Flash Memory technology, deployed as server-side PCIe or solid state disks (SSDs), is emerging as a critical tool for performance and efficiency in data centers of all scales. This presentation will discuss how the use of Flash impacts Cassandra deployments in terms of configuration, DRAM requirements and performance expectations. Ideas on leveraging C*'s cutting-edge data-center awareness to blend flash and disk storage nodes for cost and workload efficiency will also be shared. Flash media itself will be examined from a physical perspective to understand endurance issues. Data on write amplification under bulk-load and operational workload conditions will be presented to explain the impact to Flash of C*'s Log Structured Merge Tree architecture and the associated compactions. Finally, we will examine strategies to make Cassandra more Flash-aware using both conventional techniques as well as emerging Non-volatile memory (NVM) programming capabilities. Lessons learned from real-world customer deployments will be shared to complete this presentation.

Published in: Technology

C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

  1. 1. #CASSANDRA13Ma#hew  Stump  |  Architect  @  KISSmetricsReal-time Large Queries
  2. 2. #CASSANDRA13
  3. 3. #CASSANDRA13KISSmetrics Customers Want*Churn Prediction*AB Tests*Which Blog Posts and Ad Campaigns Attract High ValueCustomers?*User Conversion Funnel*Revenue Prediction*Customer Acquisition Costs*Customer Lifetime Value
  4. 4. #CASSANDRA13Understanding Queries
  5. 5. #CASSANDRA13RowKey username first_name last_name postal_codecstar cstar Cassandra Database 94110user2 user2 Some Guy 94112
  6. 6. #CASSANDRA13RowKey username first_name last_name postal_codecstar cstar Cassandra Database 94110user2 user2 Some Guy 94112
  7. 7. #CASSANDRA13RowKey username first_name last_name postal_codecstar cstar Cassandra Database 94110user2 user2 Some Guy 94112
  8. 8. #CASSANDRA13RowKey94110 cstar94112 user2 user4 user7 ...
  9. 9. #CASSANDRA13Where Secondary Indexes BreakSource: Place source content or footnote here. Delete if not needed.*High Cardinality Data*Only one index per query*Indexes are distributed*Only some datatypes; no counters*Range queries are expensive
  10. 10. #CASSANDRA13What Do I Want?Source: Place source content or footnote here. Delete if not needed.*Index high cardinality data; e.g. counters*Complex queries, with multiple clauses*Results in < 500ms for billions of rows*Sub-field searching with regular expressions*Range queries
  11. 11. #CASSANDRA13Bitmap and Bit-Slice Indexes
  12. 12. #CASSANDRA13
  13. 13. #CASSANDRA13RowKey94110 cstar94112 user2 user4 user7 ...
  14. 14. #CASSANDRA13RowKey94110 00001000 01000000 00000000 00000000094112 10000110 01000000 00000000 000000000
  15. 15. #CASSANDRA13RowKey94110 00001000 01000000 00000000 00000000094112 10000110 01000000 00000000 000000000hash(“cstar”) = 4
  16. 16. #CASSANDRA13SELECT * FROM users WHERE zipcode = 94110 OR zipcode = 9411294112 or9411010001110 01000000 00000000 000000000Field Index94110 10001010 01000000 00000000 00000000094112 10000110 01000000 00000000 000000000
  17. 17. #CASSANDRA13SELECT * FROM users WHERE Event1 = true AND Event2 = trueEvent1 andEvent210000010 01000000 00000000 000000000Field IndexEvent1 10001010 01000000 00000000 000000000Event2 10000110 01000000 00000000 000000000
  18. 18. #CASSANDRA13Field Value Sliceevent_counter 1 10001010 01000000 00000000 000000000event_counter 2 10000110 01000000 00000000 000000000SELECT * FROM users WHERE event_counter < 5Value1 orValue210000010 01000000 00000000 000000000
  19. 19. #CASSANDRA13"this is a test string"
  20. 20. #CASSANDRA13[thi, s i, s a, te, st , str, ing]
  21. 21. #CASSANDRA13[0x746869, 0x732069, 0x732061, 0x207465,0x737420, 0x737472, 0x696e67]
  22. 22. #CASSANDRA13Field Value Slicetext_field 0x207465 te 10001010 01000000 00000000 000000000text_field 0x696e67 ing 10111110 10001000 00000000 000001000text_field 0x732061 s a 10001010 01000001 00001000 110101110text_field 0x732069 s i 10001010 01000000 10110011 000000000text_field 0x737420 st 10001010 01001100 10110111 000000000text_field 0x737472 str 10001010 01000000 00011010 011000000text_field 0x746869 thi 10001010 01000000 10110111 000000010
  23. 23. #CASSANDRA13"thi.*ing"
  24. 24. #CASSANDRA13"thi" AND "ing"
  25. 25. #CASSANDRA130x746869 AND 0x696e67
  26. 26. #CASSANDRA13Field Value Slicetext_field 0x207465 te 10001010 01000000 00000000 000000000text_field 0x696e67 ing 10111110 10001000 00000000 000001000text_field 0x732061 s a 10001010 01000001 00001000 110101110text_field 0x732069 s i 10001010 01000000 10110011 000000000text_field 0x737420 st 10001010 01001100 10110111 000000000text_field 0x737472 str 10001010 01000000 00011010 011000000text_field 0x746869 thi 10001010 01000000 10110111 000000010
  27. 27. #CASSANDRA13"th.*ing"
  28. 28. #CASSANDRA13"th" AND "ing"
  29. 29. #CASSANDRA13range(0x746800, 0x7468FF) AND 0x696e67range("th" + 0x00, "th" + 0xFF) AND "ing"
  30. 30. #CASSANDRA13Field Value Slicetext_field 0x207465 te 10001010 01000000 00000000 000000000text_field 0x696e67 ing 10111110 10001000 00000000 000001000text_field 0x732061 s a 10001010 01000001 00001000 110101110text_field 0x732069 s i 10001010 01000000 10110011 000000000text_field 0x737420 st 10001010 01001100 10110111 000000000text_field 0x737472 str 10001010 01000000 00011010 011000000text_field 0x746869 thi 10001010 01000000 10110111 000000010text_field 0x74687A thz 10000000 00000001 00011100 000110010range(0x746800, 0x7468FF) AND 0x696e67
  31. 31. #CASSANDRA13
  32. 32. #CASSANDRA13Implementation
  33. 33. #CASSANDRA13Query &Indexing EngineQueries andEvents
  34. 34. #CASSANDRA13RowKey Offset 0x00 Offset 0x01 Offset 0x02 Offset 0x03event1_0x00 10011000 10011000event1_0x01 10011000 10011000 10011000
  35. 35. #CASSANDRA13Results So Far*Results returned for an 8 clause query for 4 billion rows < 2second*Full regular expression support*Full support for range queries*Ability to index any numeric value, or value which can behashed.
  36. 36. #CASSANDRA13What isnt finished*Support for atomic counters*"Group By" query aggregation*Still working on event processing and distribution
  37. 37. #CASSANDRA13https://github.com/project-z/
  38. 38. #CASSANDRA13mstump@kissmetrics.com@mattstump
  39. 39. #CASSANDRA13THANK YOU

×