©2013 DataStax Confidential. Do not distribute without consent.
CTO, DataStax
Jonathan Ellis

Project Chair, Apache Cassandra
Cassandra 2.1 (mostly)
1
Five Years of Cassandra
Jun-09 Mar-10 Jan-11 Nov-11 Sep-12 Jul-13
0.1 0.3 0.6 0.7 1.0 1.2
...
2.0
DSE
Jul-08
•Massively scalable
•High performance
•Reliable/Available
Core values Cassandra HBase Redis MySQL
CREATE TABLE users (	
id uuid PRIMARY KEY,	
name text,	
state text,	
birth_date int	
);	
!
!
CREATE INDEX ON users(state);	
!
SELECT * FROM users 	
WHERE state=‘Texas’	
AND birth_date > 1950;
New Core Value
•Massively scalable
•High performance
•Reliable/Available
•Productivity + ease of use
CREATE TABLE users (	
id uuid PRIMARY KEY,	
name text,	
state text,	
birth_date int	
);
Collections
CREATE TABLE users (	
id uuid PRIMARY KEY,	
name text,	
state text,	
birth_date int	
);
CREATE TABLE users_addresses (	
user_id uuid REFERENCES users,	
email text	
);	
!
SELECT *	
FROM users NATURAL JOIN users_addresses;	
Collections
CREATE TABLE users (	
id uuid PRIMARY KEY,	
name text,	
state text,	
birth_date int	
);
CREATE TABLE users_addresses (	
user_id uuid REFERENCES users,	
email text	
);	
!
SELECT *	
FROM users NATURAL JOIN users_addresses;	
X
Collections
CREATE TABLE users (	
id uuid PRIMARY KEY,	
name text,	
state text,	
birth_date int,	
email_addresses set<text>	
);
Collections
UPDATE users	
SET email_addresses = email_addresses +
{‘jbellis@gmail.com’, ‘jbellis@datastax.com’};	
CREATE TABLE users (	
id uuid PRIMARY KEY,	
name text,	
state text,	
birth_date int,	
email_addresses set<text>	
);
Collections
Cassandra 2.0
Race condition
SELECT name!
FROM users!
WHERE username = 'pmcfadin';
Race condition
SELECT name!
FROM users!
WHERE username = 'pmcfadin';
(0 rows) SELECT name!
FROM users!
WHERE username = 'pmcfadin';
Race condition
SELECT name!
FROM users!
WHERE username = 'pmcfadin';
(0 rows) SELECT name!
FROM users!
WHERE username = 'pmcfadin';
INSERT INTO users !
(username, name, email,!
password, created_date)!
VALUES ('pmcfadin',!
'Patrick McFadin',!
['patrick@datastax.com'],!
'ba27e03fd9...',!
'2011-06-20 13:50:00');
(0 rows)
Race condition
SELECT name!
FROM users!
WHERE username = 'pmcfadin';
(0 rows) SELECT name!
FROM users!
WHERE username = 'pmcfadin';
INSERT INTO users !
(username, name, email,!
password, created_date)!
VALUES ('pmcfadin',!
'Patrick McFadin',!
['patrick@datastax.com'],!
'ba27e03fd9...',!
'2011-06-20 13:50:00');
(0 rows)
INSERT INTO users !
(username, name, email,!
password, created_date)!
VALUES ('pmcfadin',!
'Patrick McFadin',!
['patrick@datastax.com'],!
'ea24e13ad9...',!
'2011-06-20 13:50:01');
Race condition
This one wins
SELECT name!
FROM users!
WHERE username = 'pmcfadin';
(0 rows) SELECT name!
FROM users!
WHERE username = 'pmcfadin';
INSERT INTO users !
(username, name, email,!
password, created_date)!
VALUES ('pmcfadin',!
'Patrick McFadin',!
['patrick@datastax.com'],!
'ba27e03fd9...',!
'2011-06-20 13:50:00');
(0 rows)
INSERT INTO users !
(username, name, email,!
password, created_date)!
VALUES ('pmcfadin',!
'Patrick McFadin',!
['patrick@datastax.com'],!
'ea24e13ad9...',!
'2011-06-20 13:50:01');
Lightweight transactions
Lightweight transactions
INSERT INTO users !
(username, name, email,!
password, created_date)!
VALUES ('pmcfadin',!
'Patrick McFadin',!
['patrick@datastax.com'],!
'ba27e03fd9...',!
'2011-06-20 13:50:00')!
IF NOT EXISTS;
Lightweight transactions
INSERT INTO users !
(username, name, email,!
password, created_date)!
VALUES ('pmcfadin',!
'Patrick McFadin',!
['patrick@datastax.com'],!
'ba27e03fd9...',!
'2011-06-20 13:50:00')!
IF NOT EXISTS;
INSERT INTO users !
(username, name, email,!
password, created_date)!
VALUES ('pmcfadin',!
'Patrick McFadin',!
['patrick@datastax.com'],!
'ea24e13ad9...',!
'2011-06-20 13:50:01')!
IF NOT EXISTS;
Lightweight transactions
INSERT INTO users !
(username, name, email,!
password, created_date)!
VALUES ('pmcfadin',!
'Patrick McFadin',!
['patrick@datastax.com'],!
'ba27e03fd9...',!
'2011-06-20 13:50:00')!
IF NOT EXISTS;
[applied]!
-----------!
True
INSERT INTO users !
(username, name, email,!
password, created_date)!
VALUES ('pmcfadin',!
'Patrick McFadin',!
['patrick@datastax.com'],!
'ea24e13ad9...',!
'2011-06-20 13:50:01')!
IF NOT EXISTS;
Lightweight transactions
[applied] | username | created_date | name !
-----------+----------+----------------+----------------!
False | pmcfadin | 2011-06-20 ... | Patrick McFadin
INSERT INTO users !
(username, name, email,!
password, created_date)!
VALUES ('pmcfadin',!
'Patrick McFadin',!
['patrick@datastax.com'],!
'ba27e03fd9...',!
'2011-06-20 13:50:00')!
IF NOT EXISTS;
[applied]!
-----------!
True
INSERT INTO users !
(username, name, email,!
password, created_date)!
VALUES ('pmcfadin',!
'Patrick McFadin',!
['patrick@datastax.com'],!
'ea24e13ad9...',!
'2011-06-20 13:50:01')!
IF NOT EXISTS;
Atomic log appends with LWT
CREATE TABLE log (!
log_name text,!
seq int static,!
logged_at timeuuid,!
entry text,!
primary key (log_name, logged_at)!
);!
!
INSERT INTO log (log_name, seq) !
VALUES ('foo', 0);
Atomic log appends with LWT
BEGIN BATCH!
!
UPDATE log SET seq = 1!
WHERE log_name = 'foo'!
IF seq = 0;!
!
INSERT INTO log (log_name, logged_at, entry)!
VALUES ('foo', now(), 'test');!
!
APPLY BATCH;!
Details
•http://www.datastax.com/dev/blog/lightweight-transactions-in-
cassandra-2-0
•Paxos state is durable + quorum based
•Paxos made Simple
•Immediate consistency with no leader election or failover
•ConsistencyLevel.SERIAL
•4 round trips vs 1 for normal updates
•http://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistency-
hopeful-consistency-by-christos-kalantzis
Reads in a cluster
Client Coordinator
40%

busy
90%
busy
30%
busy
Reads in a cluster
Client Coordinator
40%

busy
90%
busy
30%
busy
Reads in a cluster
Client Coordinator
40%

busy
90%
busy
30%
busy
Reads in a cluster
Client Coordinator
40%

busy
90%
busy
30%
busy
Reads in a cluster
Client Coordinator
40%

busy
90%
busy
30%
busy
A failure
Client Coordinator
40%

busy
90%
busy
30%
busy
A failure
Client Coordinator
40%

busy
90%
busy
30%
busy
A failure
Client Coordinator
40%

busy
90%
busy
30%
busy
A failure
Client Coordinator
40%

busy
90%
busy
30%
busy
X
A failure
Client Coordinator
40%

busy
90%
busy
30%
busy
Xtimeout
Rapid read protection
Client Coordinator
40%

busy
90%
busy
30%
busy
Rapid read protection
Client Coordinator
40%

busy
90%
busy
30%
busy
Rapid read protection
Client Coordinator
40%

busy
90%
busy
30%
busy
Rapid read protection
Client Coordinator
40%

busy
90%
busy
30%
busy
X
Rapid read protection
Client Coordinator
40%

busy
90%
busy
30%
busy
X
Rapid read protection
Client Coordinator
40%

busy
90%
busy
30%
busy
X
Rapid read protection
Client Coordinator
40%

busy
90%
busy
30%
busy
Xsuccess
Rapid Read Protection
NONE
Latency (mid-compaction)
Cold data
10,000 req/s 5,000 req/s
4,000 req/s 10 req/s
Cold data
10,000 req/s 5,000 req/s
4,000 req/s 10 req/s
Cold data compaction
10 req/s
10,000 req/s
Cassandra 2.1
User defined types
CREATE TYPE address (	
street text,	
city text,	
zip_code int,	
phones set<text>	
)	
!
CREATE TABLE users (	
id uuid PRIMARY KEY,	
name text,	
addresses map<text, address>	
)	
!
SELECT id, name, addresses.city, addresses.phones FROM users;	
!
id | name | addresses.city | addresses.phones	
--------------------+----------------+--------------------------	
63bf691f | jbellis | Austin | {'512-4567', '512-9999'}
Collection indexing
CREATE TABLE songs (	
id uuid PRIMARY KEY,	
artist text,	
album text,	
title text,	
data blob,	
tags set<text>	
);	
!
CREATE INDEX song_tags_idx ON songs(tags);	
!
SELECT * FROM songs WHERE tags CONTAINS 'blues';	
!
id | album | artist | tags | title	
----------+---------------+-------------------+-----------------------+------------------	
5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind	
!
!
(UDT indexing?)
Counters++
Counters++
•simpler implementation, no more edge cases
Counters++
•simpler implementation, no more edge cases
•possible to properly repair now
Counters++
•simpler implementation, no more edge cases
•possible to properly repair now
•significantly less garbage and internode traffic generated
Counters++
•simpler implementation, no more edge cases
•possible to properly repair now
•significantly less garbage and internode traffic generated
•better performance for 99% of uses
Counters++
•simpler implementation, no more edge cases
•possible to properly repair now
•significantly less garbage and internode traffic generated
•better performance for 99% of uses
•RF>1, replicate_on_write=true
Counters++
•simpler implementation, no more edge cases
•possible to properly repair now
•significantly less garbage and internode traffic generated
•better performance for 99% of uses
•RF>1, replicate_on_write=true
•topology changes not leading to data loss (#4071)
Counters++
•simpler implementation, no more edge cases
•possible to properly repair now
•significantly less garbage and internode traffic generated
•better performance for 99% of uses
•RF>1, replicate_on_write=true
•topology changes not leading to data loss (#4071)
•commitlog now 100% safe to replay (#4417)
Counters++
•simpler implementation, no more edge cases
•possible to properly repair now
•significantly less garbage and internode traffic generated
•better performance for 99% of uses
•RF>1, replicate_on_write=true
•topology changes not leading to data loss (#4071)
•commitlog now 100% safe to replay (#4417)
•Internal format overhaul still coming in 3.0 (#6506)
What hasn’t changed
What hasn’t changed
•same API
What hasn’t changed
•same API
•same average throughput
What hasn’t changed
•same API
•same average throughput
•same restrictions on mixing counter and non-counter columns
What hasn’t changed
•same API
•same average throughput
•same restrictions on mixing counter and non-counter columns
•same restrictions on mixing counter and non-counter updates
What hasn’t changed
•same API
•same average throughput
•same restrictions on mixing counter and non-counter columns
•same restrictions on mixing counter and non-counter updates
•same restrictions on counter deletes
What hasn’t changed
•same API
•same average throughput
•same restrictions on mixing counter and non-counter columns
•same restrictions on mixing counter and non-counter updates
•same restrictions on counter deletes
•same retry limitations
Writes (low contention)
Writes (high contention)
Data directories (2.0)
/var/lib/cassandra/data/foo/bar/foo-bar-jb-1-CompressionInfo.db	
/var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Data.db	
/var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Filter.db	
/var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Index.db	
/var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Statistics.db	
/var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Summary.db	
/var/lib/cassandra/data/foo/bar/foo-bar-jb-1-TOC.txt
Data directories (2.1)
/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4	
/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-CompressionInfo.db	
/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Data.db	
/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Filter.db	
/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Index.db	
/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Statistics.db	
/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Summary.db	
/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-TOC.txt
Inefficient bloom filters
+
= ?
+
=
Inefficient bloom filters
+
=
Inefficient bloom filters
Inefficient bloom filters
HyperLogLog applied
HLL and compaction
HLL and compaction
HLL and compaction
More-efficient repair
More-efficient repair
More-efficient repair
More-efficient repair
More-efficient repair
More-efficient repair
More-efficient repair
More-efficient repair
More-efficient repair
Implications for LCS (and STCS)
The new query cache
The new row cache
CREATE TABLE notifications (	
target_user text,	
notification_id timeuuid,	
source_id uuid,	
source_type text, 	
activity text,	
PRIMARY KEY (target_user, notification_id)	
)	
WITH CLUSTERING ORDER BY (notification_id DESC)	
AND caching = 'rows_only'	
AND rows_per_partition_to_cache = '3';	
!
The new row cache
target_user notification_id source_id source_type activity
nick e1bd2bcb- d972b679- photo jbellis liked
nick 321998c- d972b679- photo rbranson commented
nick ea1c5d35- 88a049d5- user
mbulman created
account
nick 5321998c- 64613f27- photo jbellis commented
nick 07581439- 076eab7e- user
thobbs created
account
rbranson 1c34467a- f04e309f- user
jbellis created
account
The new row cache
target_user notification_id source_id source_type activity
nick e1bd2bcb- d972b679- photo jbellis liked
nick 321998c- d972b679- photo rbranson commented
nick ea1c5d35- 88a049d5- user
mbulman created
account
nick 5321998c- 64613f27- photo jbellis commented
nick 07581439- 076eab7e- user
thobbs created
account
rbranson 1c34467a- f04e309f- user
jbellis created
account
Read performance
Reads post-compaction
Questions?

Cassandra 2.1