State of Cassandra 2012

2,717
-1

Published on

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,717
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
56
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

State of Cassandra 2012

  1. 1. State of Cassandra, 2012Jonathan EllisProject Chair, Apache CassandraCTO, DataStax@spyced
  2. 2. Some Cassandra users, early 2011©2012 DataStax
  3. 3. Some Casandra users, mid 2012©2012 DataStax
  4. 4. eBay Application/Use Case • Social Signals: like/want/own features for eBay product and item pages • Hunch taste graph for eBay users and items • Many time series use cases Why Cassandra? • Multi-datacenter • Scalable • Write performance • Distributed counters • Hadoop support©2012 DataStax ACE
  5. 5. Time series data©2012 DataStax
  6. 6. Multi-datacenter support©2012 DataStax
  7. 7. Distributed counters©2012 DataStax
  8. 8. Hadoop support©2012 DataStax
  9. 9. Disney Application/Use Case • Meet the data management needs of user facing applications across The Walt Disney Company with a single platform Why Cassandra? • DataStax Enterprise can tackle real-time and search functions in the same cluster • Scalability • 24x7 uptime©2012 DataStax NDI
  10. 10. Multitenancy 1©2012 DataStax
  11. 11. Multitenancy 1 2©2012 DataStax
  12. 12. Multitenancy 1 2 3©2012 DataStax
  13. 13. Multitenancy©2012 DataStax
  14. 14. Enterprise search©2012 DataStax
  15. 15. SimpleReach Application/Use Case • SimpleReach tracks social actions for content creators, from Twitter and Facebook to Pinterest and Reddit, to deliver detailed insights and clear metrics around social behavior. Why Cassandra? • Very high velocity data ingest rate and large data volumes • Workload separation between realtime and batch applications©2012 DataStax NDE
  16. 16. SourceNinja Application/Use Case • SourceNinja notifies you to performance, security, and bug fixes for the software you depend on Why Cassandra? • Previous database system could not handle load; HBase has too many points of failure and was too slow • Fast real time capabilities, batch analytics on that data, and enterprise search©2012 DataStax RDE
  17. 17. Realtime + search + analytics = DataStax Enterprise©2012 DataStax
  18. 18. Netflix Application/Use Case • General purpose backend for large scale highly available cloud based web services supporting Netflix Streaming Why Cassandra? • Highly available, highly robust and no schema change downtime • Highly scalable, optimized for SSD • Much lower cost than previous Oracle and SimpleDB implementations • Flexible data model • Ability to directly influence/implement OSS feature set • Supports local and wide area distributed operations, spanning US and Europe©2012 DataStax RCE
  19. 19. Optimized for SSD©2012 DataStax
  20. 20. Open source©2012 DataStax
  21. 21. Use case patterns • Massively scalable • High performance • Reliable/Available©2012 DataStax
  22. 22. ©2012 DataStax
  23. 23. reads/s writes/s 35000 30000 25000 20000 15000 10000 5000 Cassandra 0.6 0©2012 DataStax Cassandra 1.0
  24. 24. ©2012 DataStax
  25. 25. Recent Cassandra history • 0.7 (Jan 2011) • CREATE COLUMN FAMILY • TTL • Secondary (column) indexes • 0.8 (Jun 2011) • Counters • Automatic memtable tuning • 1.0 (Oct 2011) • Compression • Leveled compaction©2012 DataStax
  26. 26. Present • 1.1 (Apr 2012) • Self-tuning row + key caches • Support for mixed SSD + HDD nodes • Row-level isolation©2012 DataStax
  27. 27. Self-tuning Row Cache SSTables Without Cache Client Merge Client Row Cache With Cache©2012 DataStax 25
  28. 28. Mixed SSD/HDD Support user_sessions Client Cassandra user_activity Instance user_activity user_sessions HDD SSD Cassandra Node©2012 DataStax 26
  29. 29. Row Level Isolation Login Passwd Login Passwd Foo Foo Foo FooUPDATE Users Bar FooSET login=bar Bar BarAND password=barWHERE key=e29b-41d4 Bar BarSELECT login, passwordFROM Users Bar, Foo Bar, BarWHERE key=e29b-41d4 Cassandra 1.0 Cassandra 1.1©2012 DataStax 27
  30. 30. ACID©2012 DataStax 28
  31. 31. Overloading “consistency” • ACID consistency = referential integrity • Distributed system consistency • {consistency, availability, partition tolerance}©2012 DataStax 29
  32. 32. Future • 1.2 (Oct 2012?) • Concurrent schema changes • JBOD support • Virtual nodes • CQL3 • Collections©2012 DataStax
  33. 33. Concurrent Schema Changes CREATE TABLE X; ... DROP TABLE X; Client Cassandra Cluster Client CREATE TABLE Y; ...©2012 DataStax DROP TABLE Y; 31
  34. 34. JBOD support Cassandra Instance HDD1 HDD2 HDD3 HDD4©2012 DataStax
  35. 35. JBOD support Cassandra Instance HDD1 X HDD2 HDD3 HDD4©2012 DataStax
  36. 36. Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  37. 37. Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  38. 38. Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  39. 39. Node Rebuild without vnodes Node 1 Node 2 Node 3 A B C F E A F B A A F B Ring without vnodes E C D D E F C B D C E D Node 4 Node 5 Node 6©2012 DataStax 35
  40. 40. Node Rebuild with vnodes Node 1 Node 2 Node 3 B E A P K G G K M O C N C D D J D H J F B E A F L A K F P I P Ring with G O VNodes H N I M O E P H C M J L K I H I A B O B L M C N E F D G N J L Node 4 Node 5 Node 6©2012 DataStax 36
  41. 41. CQL: You got SQL in my NoSQL! CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;©2012 DataStax
  42. 42. Strictly “realtime” focused • No joins • No subqueries • No aggregation functions* or GROUP BY • Strictly limited ORDER BY©2012 DataStax
  43. 43. Example: CFS sblocks create column family sblocks with comparator = UUIDType and default_validation_class = BytesType and key_validation_class = UUIDType©2012 DataStax
  44. 44. sblocks in context©2012 DataStax
  45. 45. sblocks in context©2012 DataStax
  46. 46. sblocks in CQL3 CREATE TABLE sblocks (     block_id uuid,     subblock_id uuid,     data blob, block_id subblock_id data     PRIMARY KEY (block_id, subblock_id) Block1 subblock A data A ); Block1 subblock B data B ... ... ... Block2 subblock C data C Block2 subblock D data D ... ... ... Block3 subblock E data E Block3 subblock F data F ... ... ...©2012 DataStax
  47. 47. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int );©2012 DataStax
  48. 48. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses;©2012 DataStax
  49. 49. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, X birth_date int ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses;©2012 DataStax
  50. 50. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text> );©2012 DataStax
  51. 51. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text> ); UPDATE users SET email_addresses = email_addresses + {‘jbellis@gmail.com’, ‘jbellis@datastax.com’};©2012 DataStax
  52. 52. Q&A in the DataStax Lounge©2012 DataStax
  53. 53. Introducing the Cassandra MVPs©2012 DataStax

×