Your SlideShare is downloading. ×
0
State of Cassandra, 2012Jonathan EllisProject Chair, Apache CassandraCTO, DataStax@spyced
Some Cassandra users, early 2011©2012 DataStax
Some Casandra users, mid 2012©2012 DataStax
eBay                     Application/Use Case                     • Social Signals: like/want/own features for            ...
Time series data©2012 DataStax
Multi-datacenter support©2012 DataStax
Distributed counters©2012 DataStax
Hadoop support©2012 DataStax
Disney                     Application/Use Case                     • Meet the data management needs of user              ...
Multitenancy                 1©2012 DataStax
Multitenancy                 1                     2©2012 DataStax
Multitenancy                 1                         2                     3©2012 DataStax
Multitenancy©2012 DataStax
Enterprise search©2012 DataStax
SimpleReach                     Application/Use Case                     • SimpleReach tracks social actions for          ...
SourceNinja                     Application/Use Case                     • SourceNinja notifies you to performance,        ...
Realtime + search + analytics = DataStax Enterprise©2012 DataStax
Netflix                     Application/Use Case                     • General purpose backend for large scale             ...
Optimized for SSD©2012 DataStax
Open source©2012 DataStax
Use case patterns  • Massively scalable  • High performance  • Reliable/Available©2012 DataStax
©2012 DataStax
reads/s            writes/s                                                                       35000                   ...
©2012 DataStax
Recent Cassandra history  • 0.7 (Jan 2011)           •     CREATE COLUMN FAMILY           •     TTL           •     Second...
Present  • 1.1 (Apr 2012)           •     Self-tuning row + key caches           •     Support for mixed SSD + HDD nodes  ...
Self-tuning Row Cache                  SSTables   Without                              Cache      Client                  ...
Mixed SSD/HDD Support                 user_sessions      Client                               Cassandra                 us...
Row Level Isolation                         Login    Passwd   Login    Passwd                          Foo       Foo     F...
ACID©2012 DataStax   28
Overloading “consistency”     • ACID consistency = referential integrity     • Distributed system consistency           • ...
Future  • 1.2 (Oct 2012?)           •     Concurrent schema changes           •     JBOD support           •     Virtual n...
Concurrent Schema Changes                    CREATE TABLE X;                          ...                    DROP TABLE X;...
JBOD support                          Cassandra                           Instance                 HDD1   HDD2      HDD3  ...
JBOD support                          Cassandra                           Instance                 HDD1                   ...
Virtual nodes                      A                          C   D                                             B         ...
Virtual nodes                      A                          C   D                                             B         ...
Virtual nodes                      A                          C   D                                             B         ...
Node Rebuild without vnodes                              Node 1      Node 2      Node 3                                  A...
Node Rebuild with vnodes                                         Node 1   Node 2   Node 3                                 ...
CQL: You got SQL in my NoSQL! CREATE TABLE users (    id uuid PRIMARY KEY,    name text,    state text,    birth_date int ...
Strictly “realtime” focused  • No joins  • No subqueries  • No aggregation functions* or GROUP BY  • Strictly limited ORDE...
Example: CFS sblocks create column family sblocks   with comparator = UUIDType   and default_validation_class = BytesType ...
sblocks in context©2012 DataStax
sblocks in context©2012 DataStax
sblocks in CQL3 CREATE TABLE sblocks (     block_id uuid,     subblock_id uuid,     data blob,                            ...
Collections CREATE TABLE users (    id uuid PRIMARY KEY,    name text,    state text,    birth_date int );©2012 DataStax
Collections CREATE TABLE users (    id uuid PRIMARY KEY,    name text,    state text,    birth_date int ); CREATE TABLE us...
Collections CREATE TABLE users (    id uuid PRIMARY KEY,    name text,    state text,                 X    birth_date int ...
Collections CREATE TABLE users (    id uuid PRIMARY KEY,    name text,    state text,    birth_date int,    email_addresse...
Collections CREATE TABLE users (    id uuid PRIMARY KEY,    name text,    state text,    birth_date int,    email_addresse...
Q&A in the DataStax Lounge©2012 DataStax
Introducing the Cassandra MVPs©2012 DataStax
Upcoming SlideShare
Loading in...5
×

State of Cassandra 2012

2,474

Published on

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,474
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
56
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "State of Cassandra 2012"

  1. 1. State of Cassandra, 2012Jonathan EllisProject Chair, Apache CassandraCTO, DataStax@spyced
  2. 2. Some Cassandra users, early 2011©2012 DataStax
  3. 3. Some Casandra users, mid 2012©2012 DataStax
  4. 4. eBay Application/Use Case • Social Signals: like/want/own features for eBay product and item pages • Hunch taste graph for eBay users and items • Many time series use cases Why Cassandra? • Multi-datacenter • Scalable • Write performance • Distributed counters • Hadoop support©2012 DataStax ACE
  5. 5. Time series data©2012 DataStax
  6. 6. Multi-datacenter support©2012 DataStax
  7. 7. Distributed counters©2012 DataStax
  8. 8. Hadoop support©2012 DataStax
  9. 9. Disney Application/Use Case • Meet the data management needs of user facing applications across The Walt Disney Company with a single platform Why Cassandra? • DataStax Enterprise can tackle real-time and search functions in the same cluster • Scalability • 24x7 uptime©2012 DataStax NDI
  10. 10. Multitenancy 1©2012 DataStax
  11. 11. Multitenancy 1 2©2012 DataStax
  12. 12. Multitenancy 1 2 3©2012 DataStax
  13. 13. Multitenancy©2012 DataStax
  14. 14. Enterprise search©2012 DataStax
  15. 15. SimpleReach Application/Use Case • SimpleReach tracks social actions for content creators, from Twitter and Facebook to Pinterest and Reddit, to deliver detailed insights and clear metrics around social behavior. Why Cassandra? • Very high velocity data ingest rate and large data volumes • Workload separation between realtime and batch applications©2012 DataStax NDE
  16. 16. SourceNinja Application/Use Case • SourceNinja notifies you to performance, security, and bug fixes for the software you depend on Why Cassandra? • Previous database system could not handle load; HBase has too many points of failure and was too slow • Fast real time capabilities, batch analytics on that data, and enterprise search©2012 DataStax RDE
  17. 17. Realtime + search + analytics = DataStax Enterprise©2012 DataStax
  18. 18. Netflix Application/Use Case • General purpose backend for large scale highly available cloud based web services supporting Netflix Streaming Why Cassandra? • Highly available, highly robust and no schema change downtime • Highly scalable, optimized for SSD • Much lower cost than previous Oracle and SimpleDB implementations • Flexible data model • Ability to directly influence/implement OSS feature set • Supports local and wide area distributed operations, spanning US and Europe©2012 DataStax RCE
  19. 19. Optimized for SSD©2012 DataStax
  20. 20. Open source©2012 DataStax
  21. 21. Use case patterns • Massively scalable • High performance • Reliable/Available©2012 DataStax
  22. 22. ©2012 DataStax
  23. 23. reads/s writes/s 35000 30000 25000 20000 15000 10000 5000 Cassandra 0.6 0©2012 DataStax Cassandra 1.0
  24. 24. ©2012 DataStax
  25. 25. Recent Cassandra history • 0.7 (Jan 2011) • CREATE COLUMN FAMILY • TTL • Secondary (column) indexes • 0.8 (Jun 2011) • Counters • Automatic memtable tuning • 1.0 (Oct 2011) • Compression • Leveled compaction©2012 DataStax
  26. 26. Present • 1.1 (Apr 2012) • Self-tuning row + key caches • Support for mixed SSD + HDD nodes • Row-level isolation©2012 DataStax
  27. 27. Self-tuning Row Cache SSTables Without Cache Client Merge Client Row Cache With Cache©2012 DataStax 25
  28. 28. Mixed SSD/HDD Support user_sessions Client Cassandra user_activity Instance user_activity user_sessions HDD SSD Cassandra Node©2012 DataStax 26
  29. 29. Row Level Isolation Login Passwd Login Passwd Foo Foo Foo FooUPDATE Users Bar FooSET login=bar Bar BarAND password=barWHERE key=e29b-41d4 Bar BarSELECT login, passwordFROM Users Bar, Foo Bar, BarWHERE key=e29b-41d4 Cassandra 1.0 Cassandra 1.1©2012 DataStax 27
  30. 30. ACID©2012 DataStax 28
  31. 31. Overloading “consistency” • ACID consistency = referential integrity • Distributed system consistency • {consistency, availability, partition tolerance}©2012 DataStax 29
  32. 32. Future • 1.2 (Oct 2012?) • Concurrent schema changes • JBOD support • Virtual nodes • CQL3 • Collections©2012 DataStax
  33. 33. Concurrent Schema Changes CREATE TABLE X; ... DROP TABLE X; Client Cassandra Cluster Client CREATE TABLE Y; ...©2012 DataStax DROP TABLE Y; 31
  34. 34. JBOD support Cassandra Instance HDD1 HDD2 HDD3 HDD4©2012 DataStax
  35. 35. JBOD support Cassandra Instance HDD1 X HDD2 HDD3 HDD4©2012 DataStax
  36. 36. Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  37. 37. Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  38. 38. Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  39. 39. Node Rebuild without vnodes Node 1 Node 2 Node 3 A B C F E A F B A A F B Ring without vnodes E C D D E F C B D C E D Node 4 Node 5 Node 6©2012 DataStax 35
  40. 40. Node Rebuild with vnodes Node 1 Node 2 Node 3 B E A P K G G K M O C N C D D J D H J F B E A F L A K F P I P Ring with G O VNodes H N I M O E P H C M J L K I H I A B O B L M C N E F D G N J L Node 4 Node 5 Node 6©2012 DataStax 36
  41. 41. CQL: You got SQL in my NoSQL! CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;©2012 DataStax
  42. 42. Strictly “realtime” focused • No joins • No subqueries • No aggregation functions* or GROUP BY • Strictly limited ORDER BY©2012 DataStax
  43. 43. Example: CFS sblocks create column family sblocks with comparator = UUIDType and default_validation_class = BytesType and key_validation_class = UUIDType©2012 DataStax
  44. 44. sblocks in context©2012 DataStax
  45. 45. sblocks in context©2012 DataStax
  46. 46. sblocks in CQL3 CREATE TABLE sblocks (     block_id uuid,     subblock_id uuid,     data blob, block_id subblock_id data     PRIMARY KEY (block_id, subblock_id) Block1 subblock A data A ); Block1 subblock B data B ... ... ... Block2 subblock C data C Block2 subblock D data D ... ... ... Block3 subblock E data E Block3 subblock F data F ... ... ...©2012 DataStax
  47. 47. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int );©2012 DataStax
  48. 48. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses;©2012 DataStax
  49. 49. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, X birth_date int ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses;©2012 DataStax
  50. 50. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text> );©2012 DataStax
  51. 51. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text> ); UPDATE users SET email_addresses = email_addresses + {‘jbellis@gmail.com’, ‘jbellis@datastax.com’};©2012 DataStax
  52. 52. Q&A in the DataStax Lounge©2012 DataStax
  53. 53. Introducing the Cassandra MVPs©2012 DataStax
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×