State of Cassandra 2012
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

State of Cassandra 2012

on

  • 3,024 views

 

Statistics

Views

Total Views
3,024
Views on SlideShare
3,024
Embed Views
0

Actions

Likes
4
Downloads
54
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

State of Cassandra 2012 Presentation Transcript

  • 1. State of Cassandra, 2012Jonathan EllisProject Chair, Apache CassandraCTO, DataStax@spyced
  • 2. Some Cassandra users, early 2011©2012 DataStax
  • 3. Some Casandra users, mid 2012©2012 DataStax
  • 4. eBay Application/Use Case • Social Signals: like/want/own features for eBay product and item pages • Hunch taste graph for eBay users and items • Many time series use cases Why Cassandra? • Multi-datacenter • Scalable • Write performance • Distributed counters • Hadoop support©2012 DataStax ACE
  • 5. Time series data©2012 DataStax
  • 6. Multi-datacenter support©2012 DataStax
  • 7. Distributed counters©2012 DataStax
  • 8. Hadoop support©2012 DataStax
  • 9. Disney Application/Use Case • Meet the data management needs of user facing applications across The Walt Disney Company with a single platform Why Cassandra? • DataStax Enterprise can tackle real-time and search functions in the same cluster • Scalability • 24x7 uptime©2012 DataStax NDI
  • 10. Multitenancy 1©2012 DataStax
  • 11. Multitenancy 1 2©2012 DataStax
  • 12. Multitenancy 1 2 3©2012 DataStax
  • 13. Multitenancy©2012 DataStax
  • 14. Enterprise search©2012 DataStax
  • 15. SimpleReach Application/Use Case • SimpleReach tracks social actions for content creators, from Twitter and Facebook to Pinterest and Reddit, to deliver detailed insights and clear metrics around social behavior. Why Cassandra? • Very high velocity data ingest rate and large data volumes • Workload separation between realtime and batch applications©2012 DataStax NDE
  • 16. SourceNinja Application/Use Case • SourceNinja notifies you to performance, security, and bug fixes for the software you depend on Why Cassandra? • Previous database system could not handle load; HBase has too many points of failure and was too slow • Fast real time capabilities, batch analytics on that data, and enterprise search©2012 DataStax RDE
  • 17. Realtime + search + analytics = DataStax Enterprise©2012 DataStax
  • 18. Netflix Application/Use Case • General purpose backend for large scale highly available cloud based web services supporting Netflix Streaming Why Cassandra? • Highly available, highly robust and no schema change downtime • Highly scalable, optimized for SSD • Much lower cost than previous Oracle and SimpleDB implementations • Flexible data model • Ability to directly influence/implement OSS feature set • Supports local and wide area distributed operations, spanning US and Europe©2012 DataStax RCE
  • 19. Optimized for SSD©2012 DataStax
  • 20. Open source©2012 DataStax
  • 21. Use case patterns • Massively scalable • High performance • Reliable/Available©2012 DataStax
  • 22. ©2012 DataStax
  • 23. reads/s writes/s 35000 30000 25000 20000 15000 10000 5000 Cassandra 0.6 0©2012 DataStax Cassandra 1.0
  • 24. ©2012 DataStax
  • 25. Recent Cassandra history • 0.7 (Jan 2011) • CREATE COLUMN FAMILY • TTL • Secondary (column) indexes • 0.8 (Jun 2011) • Counters • Automatic memtable tuning • 1.0 (Oct 2011) • Compression • Leveled compaction©2012 DataStax
  • 26. Present • 1.1 (Apr 2012) • Self-tuning row + key caches • Support for mixed SSD + HDD nodes • Row-level isolation©2012 DataStax
  • 27. Self-tuning Row Cache SSTables Without Cache Client Merge Client Row Cache With Cache©2012 DataStax 25
  • 28. Mixed SSD/HDD Support user_sessions Client Cassandra user_activity Instance user_activity user_sessions HDD SSD Cassandra Node©2012 DataStax 26
  • 29. Row Level Isolation Login Passwd Login Passwd Foo Foo Foo FooUPDATE Users Bar FooSET login=bar Bar BarAND password=barWHERE key=e29b-41d4 Bar BarSELECT login, passwordFROM Users Bar, Foo Bar, BarWHERE key=e29b-41d4 Cassandra 1.0 Cassandra 1.1©2012 DataStax 27
  • 30. ACID©2012 DataStax 28
  • 31. Overloading “consistency” • ACID consistency = referential integrity • Distributed system consistency • {consistency, availability, partition tolerance}©2012 DataStax 29
  • 32. Future • 1.2 (Oct 2012?) • Concurrent schema changes • JBOD support • Virtual nodes • CQL3 • Collections©2012 DataStax
  • 33. Concurrent Schema Changes CREATE TABLE X; ... DROP TABLE X; Client Cassandra Cluster Client CREATE TABLE Y; ...©2012 DataStax DROP TABLE Y; 31
  • 34. JBOD support Cassandra Instance HDD1 HDD2 HDD3 HDD4©2012 DataStax
  • 35. JBOD support Cassandra Instance HDD1 X HDD2 HDD3 HDD4©2012 DataStax
  • 36. Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  • 37. Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  • 38. Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  • 39. Node Rebuild without vnodes Node 1 Node 2 Node 3 A B C F E A F B A A F B Ring without vnodes E C D D E F C B D C E D Node 4 Node 5 Node 6©2012 DataStax 35
  • 40. Node Rebuild with vnodes Node 1 Node 2 Node 3 B E A P K G G K M O C N C D D J D H J F B E A F L A K F P I P Ring with G O VNodes H N I M O E P H C M J L K I H I A B O B L M C N E F D G N J L Node 4 Node 5 Node 6©2012 DataStax 36
  • 41. CQL: You got SQL in my NoSQL! CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;©2012 DataStax
  • 42. Strictly “realtime” focused • No joins • No subqueries • No aggregation functions* or GROUP BY • Strictly limited ORDER BY©2012 DataStax
  • 43. Example: CFS sblocks create column family sblocks with comparator = UUIDType and default_validation_class = BytesType and key_validation_class = UUIDType©2012 DataStax
  • 44. sblocks in context©2012 DataStax
  • 45. sblocks in context©2012 DataStax
  • 46. sblocks in CQL3 CREATE TABLE sblocks (     block_id uuid,     subblock_id uuid,     data blob, block_id subblock_id data     PRIMARY KEY (block_id, subblock_id) Block1 subblock A data A ); Block1 subblock B data B ... ... ... Block2 subblock C data C Block2 subblock D data D ... ... ... Block3 subblock E data E Block3 subblock F data F ... ... ...©2012 DataStax
  • 47. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int );©2012 DataStax
  • 48. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses;©2012 DataStax
  • 49. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, X birth_date int ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses;©2012 DataStax
  • 50. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text> );©2012 DataStax
  • 51. Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text> ); UPDATE users SET email_addresses = email_addresses + {‘jbellis@gmail.com’, ‘jbellis@datastax.com’};©2012 DataStax
  • 52. Q&A in the DataStax Lounge©2012 DataStax
  • 53. Introducing the Cassandra MVPs©2012 DataStax