State of Cassandra 2012
Upcoming SlideShare
Loading in...5
×
 

State of Cassandra 2012

on

  • 2,879 views

 

Statistics

Views

Total Views
2,879
Slideshare-icon Views on SlideShare
2,879
Embed Views
0

Actions

Likes
4
Downloads
54
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    State of Cassandra 2012 State of Cassandra 2012 Presentation Transcript

    • State of Cassandra, 2012Jonathan EllisProject Chair, Apache CassandraCTO, DataStax@spyced
    • Some Cassandra users, early 2011©2012 DataStax
    • Some Casandra users, mid 2012©2012 DataStax
    • eBay Application/Use Case • Social Signals: like/want/own features for eBay product and item pages • Hunch taste graph for eBay users and items • Many time series use cases Why Cassandra? • Multi-datacenter • Scalable • Write performance • Distributed counters • Hadoop support©2012 DataStax ACE
    • Time series data©2012 DataStax
    • Multi-datacenter support©2012 DataStax
    • Distributed counters©2012 DataStax
    • Hadoop support©2012 DataStax
    • Disney Application/Use Case • Meet the data management needs of user facing applications across The Walt Disney Company with a single platform Why Cassandra? • DataStax Enterprise can tackle real-time and search functions in the same cluster • Scalability • 24x7 uptime©2012 DataStax NDI
    • Multitenancy 1©2012 DataStax
    • Multitenancy 1 2©2012 DataStax
    • Multitenancy 1 2 3©2012 DataStax
    • Multitenancy©2012 DataStax
    • Enterprise search©2012 DataStax
    • SimpleReach Application/Use Case • SimpleReach tracks social actions for content creators, from Twitter and Facebook to Pinterest and Reddit, to deliver detailed insights and clear metrics around social behavior. Why Cassandra? • Very high velocity data ingest rate and large data volumes • Workload separation between realtime and batch applications©2012 DataStax NDE
    • SourceNinja Application/Use Case • SourceNinja notifies you to performance, security, and bug fixes for the software you depend on Why Cassandra? • Previous database system could not handle load; HBase has too many points of failure and was too slow • Fast real time capabilities, batch analytics on that data, and enterprise search©2012 DataStax RDE
    • Realtime + search + analytics = DataStax Enterprise©2012 DataStax
    • Netflix Application/Use Case • General purpose backend for large scale highly available cloud based web services supporting Netflix Streaming Why Cassandra? • Highly available, highly robust and no schema change downtime • Highly scalable, optimized for SSD • Much lower cost than previous Oracle and SimpleDB implementations • Flexible data model • Ability to directly influence/implement OSS feature set • Supports local and wide area distributed operations, spanning US and Europe©2012 DataStax RCE
    • Optimized for SSD©2012 DataStax
    • Open source©2012 DataStax
    • Use case patterns • Massively scalable • High performance • Reliable/Available©2012 DataStax
    • ©2012 DataStax
    • reads/s writes/s 35000 30000 25000 20000 15000 10000 5000 Cassandra 0.6 0©2012 DataStax Cassandra 1.0
    • ©2012 DataStax
    • Recent Cassandra history • 0.7 (Jan 2011) • CREATE COLUMN FAMILY • TTL • Secondary (column) indexes • 0.8 (Jun 2011) • Counters • Automatic memtable tuning • 1.0 (Oct 2011) • Compression • Leveled compaction©2012 DataStax
    • Present • 1.1 (Apr 2012) • Self-tuning row + key caches • Support for mixed SSD + HDD nodes • Row-level isolation©2012 DataStax
    • Self-tuning Row Cache SSTables Without Cache Client Merge Client Row Cache With Cache©2012 DataStax 25
    • Mixed SSD/HDD Support user_sessions Client Cassandra user_activity Instance user_activity user_sessions HDD SSD Cassandra Node©2012 DataStax 26
    • Row Level Isolation Login Passwd Login Passwd Foo Foo Foo FooUPDATE Users Bar FooSET login=bar Bar BarAND password=barWHERE key=e29b-41d4 Bar BarSELECT login, passwordFROM Users Bar, Foo Bar, BarWHERE key=e29b-41d4 Cassandra 1.0 Cassandra 1.1©2012 DataStax 27
    • ACID©2012 DataStax 28
    • Overloading “consistency” • ACID consistency = referential integrity • Distributed system consistency • {consistency, availability, partition tolerance}©2012 DataStax 29
    • Future • 1.2 (Oct 2012?) • Concurrent schema changes • JBOD support • Virtual nodes • CQL3 • Collections©2012 DataStax
    • Concurrent Schema Changes CREATE TABLE X; ... DROP TABLE X; Client Cassandra Cluster Client CREATE TABLE Y; ...©2012 DataStax DROP TABLE Y; 31
    • JBOD support Cassandra Instance HDD1 HDD2 HDD3 HDD4©2012 DataStax
    • JBOD support Cassandra Instance HDD1 X HDD2 HDD3 HDD4©2012 DataStax
    • Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
    • Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
    • Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
    • Node Rebuild without vnodes Node 1 Node 2 Node 3 A B C F E A F B A A F B Ring without vnodes E C D D E F C B D C E D Node 4 Node 5 Node 6©2012 DataStax 35
    • Node Rebuild with vnodes Node 1 Node 2 Node 3 B E A P K G G K M O C N C D D J D H J F B E A F L A K F P I P Ring with G O VNodes H N I M O E P H C M J L K I H I A B O B L M C N E F D G N J L Node 4 Node 5 Node 6©2012 DataStax 36
    • CQL: You got SQL in my NoSQL! CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;©2012 DataStax
    • Strictly “realtime” focused • No joins • No subqueries • No aggregation functions* or GROUP BY • Strictly limited ORDER BY©2012 DataStax
    • Example: CFS sblocks create column family sblocks with comparator = UUIDType and default_validation_class = BytesType and key_validation_class = UUIDType©2012 DataStax
    • sblocks in context©2012 DataStax
    • sblocks in context©2012 DataStax
    • sblocks in CQL3 CREATE TABLE sblocks (     block_id uuid,     subblock_id uuid,     data blob, block_id subblock_id data     PRIMARY KEY (block_id, subblock_id) Block1 subblock A data A ); Block1 subblock B data B ... ... ... Block2 subblock C data C Block2 subblock D data D ... ... ... Block3 subblock E data E Block3 subblock F data F ... ... ...©2012 DataStax
    • Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int );©2012 DataStax
    • Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses;©2012 DataStax
    • Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, X birth_date int ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); SELECT * FROM users NATURAL JOIN users_addresses;©2012 DataStax
    • Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text> );©2012 DataStax
    • Collections CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text> ); UPDATE users SET email_addresses = email_addresses + {‘jbellis@gmail.com’, ‘jbellis@datastax.com’};©2012 DataStax
    • Q&A in the DataStax Lounge©2012 DataStax
    • Introducing the Cassandra MVPs©2012 DataStax