Cassandra
Nick Bailey
@nickmbailey
nick@datastax.com
Thursday, May 30, 13
©2012 DataStax
Introduction
2
Thursday, May 30, 13
©2012 DataStax
Why does Cassandra Exist?
3
Thursday, May 30, 13
©2012 DataStax
Analytics
+
Real Time
4
Big Data
Thursday, May 30, 13
©2012 DataStax
Architecture
5
Thursday, May 30, 13
©2012 DataStax
Dynamo
+
BigTable
6
Thursday, May 30, 13
©2012 DataStax
Why do people like Cassandra?
7
Thursday, May 30, 13
©2012 DataStax
Availability
8
Thursday, May 30, 13
©2012 DataStax
Scalability
9
Thursday, May 30, 13
©2012 DataStax 10
Thursday, May 30, 13
©2012 DataStax
Performance
11
Thursday, May 30, 13
©2012 DataStax 12
Thursday, May 30, 13
©2012 DataStax
Multi Datacenter Support
13
Thursday, May 30, 13
©2012 DataStax 14
Thursday, May 30, 13
©2012 DataStax
Hadoop Support
15
Thursday, May 30, 13
©2012 DataStax
Hadoop Support
• InputFormat
• Run tasktrackers/datanodes locally
• Run namenode/jobtracker anywhere
16
Thursday, May 30, 13
©2012 DataStax
Data Locality
Workload Partitioning
17
Thursday, May 30, 13
©2012 DataStax
Data Modeling
18
Thursday, May 30, 13
©2012 DataStax
Keyspace,
Column Families
19
Thursday, May 30, 13
©2012 DataStax
Database,
Tables
20
Thursday, May 30, 13
©2012 DataStax
Column Family =
Row Key + Columns (name, value)
...
21
Thursday, May 30, 13
©2012 DataStax
Static Column Families
Dynamic Column Families
22
Thursday, May 30, 13
©2012 DataStax
Static - Users Column Family
23
Row Key
g_m_bluth
password:
banana stand
name: George
Michael
tobias_f
password:
c_weathers
name:Tobias phone: 512-7777
Thursday, May 30, 13
©2012 DataStax
Dynamic - Friend Column Family
24
Row Key
g_m_bluth <date>:ann_v <date>:maeby
tobias_f <date>:barry_z <date>:carl_w <date>:lindsay ...
Thursday, May 30, 13
©2012 DataStax
Time Series Data
• Event logs
• Metrics
• Sensor Data
• Etc
25
Thursday, May 30, 13
©2012 DataStax
Time Series - Login CF
26
Row Key
g_m_bluth
1369633061:
United States
1369625839:
Mexico
...
tobias_f
1369932413:
Canada
1369681738:
United States
...
Thursday, May 30, 13
©2012 DataStax
What Else?
27
Thursday, May 30, 13
©2012 DataStax
Counter Columns
28
• Inc/Dec operations
• Not idempotent
• Possibility for over counting
Thursday, May 30, 13
©2012 DataStax
Expiring Columns
29
• TTL - Time to live
• Set per column
• Possibly an anti-pattern (we’ll get to that later)
Thursday, May 30, 13
©2012 DataStax
Secondary Indexes
30
• Select * from Users where name=Nick;
• Only support ‘=’ clauses (for first condition)
• Often misused
Thursday, May 30, 13
©2012 DataStax
CQL
Cassandra Query Language
31
Thursday, May 30, 13
©2012 DataStax 32
CREATE COLUMNFAMILY songs (
id uuid PRIMARY KEY,
title text,
album text,
artist text,
data blob);
INSERT INTO songs (id, title, artist, album)
VALUES ('a3e64f8f...', 'La Grange', 'ZZ Top', 'Tres Hombres');
SELECT * FROM songs;
id          | album        | artist         | title
-------------+--------------+----------------+----------------
2b09185b... |    Roll Away | Back Door Slam | Outside Woman...
8a172618... | We Must Obey |      Fu Manchu | Moving in Ste...
a3e64f8f... | Tres Hombres |         ZZ Top | La Grange
Thursday, May 30, 13
©2012 DataStax
How do I start?
33
Thursday, May 30, 13
©2012 DataStax
Define your questions
34
Thursday, May 30, 13
©2012 DataStax
SELECT time, location FROM
logins WHERE user =
‘nickmbailey’ ORDER BY time
DESC LIMIT 10;
35
Thursday, May 30, 13
©2012 DataStax
WHERE user = ‘nickmbailey’
Row Key
36
Thursday, May 30, 13
©2012 DataStax
ORDER BY time DESC LIMIT
10;
Store columns in chronological
order
37
Thursday, May 30, 13
©2012 DataStax
CREATE COLUMN FAMILY logins (
! user,
time,
location,
PRIMARY KEY (user, time));
38
Thursday, May 30, 13
©2012 DataStax
What about?
39
Thursday, May 30, 13
©2012 DataStax
SELECT time FROM logins
WHERE user = ‘nickmbailey’
and location = ‘United States’;
40
Thursday, May 30, 13
©2012 DataStax 41
g_m_bluth
1369633061:
United States
1369625839:
Mexico
....
1369622839:
Canada
1369422839:
Canada
1368422839:
Canada
....
1368421839:
Canada
1367421839:
United States
1367411839:
Mexico
....
Thursday, May 30, 13
©2012 DataStax
CREATE COLUMN FAMILY
logins (user, time, location,
PRIMARY KEY (user, location));
42
Thursday, May 30, 13
©2012 DataStax 43
g_m_bluth
United States:
1369633061
Canada:
1369622839
....
Thursday, May 30, 13
©2012 DataStax
To Normalize or Not
44
Thursday, May 30, 13
©2012 DataStax
SELECT time, location FROM.....
+
SELECT city, state, zip.... FROM
locations.....
45
Thursday, May 30, 13
©2012 DataStax 46
g_m_bluth
1369633061:
<United States,
Austin,
Texas,
78701>
1369625839:
<Mexico,
Tiajuana,
88191>
1358633061:
<United
States,Austin,
Texas,
78701>
Thursday, May 30, 13
©2012 DataStax
Anti Patterns
47
Thursday, May 30, 13
©2012 DataStax
Batched Writes
• Failure case is suboptimal
• Increased chance of failure
• Tune to your workload
48
Thursday, May 30, 13
©2012 DataStax
BOP/OPP
• You don’t really need it
• Your Ops Team will hate you
• Really, you don’t need it.
49
Thursday, May 30, 13
©2012 DataStax
Super Columns
• Performance penalty
• Speed
• Memory
• Replaced by CQL3
50
Thursday, May 30, 13
©2012 DataStax
Read Before Write
• Race conditions
• Hurts performance
• Cache
• IO
51
Thursday, May 30, 13
©2012 DataStax
Queues
• More generally, many deletes within a row
• A delete in Cassandra is actually a tombstone
• Read 1000 tombstones in order to find 10
columns
52
Thursday, May 30, 13
©2012 DataStax
Use Cases
53
Thursday, May 30, 13
©2012 DataStax
Ebay
54
Thursday, May 30, 13
©2012 DataStax
http://www.youtube.com/
watch?v=F-fYqPu2ciQ
55
Thursday, May 30, 13
©2012 DataStax
Ebay
• dozens of nodes
• 200 TB+ of storage
56
Thursday, May 30, 13
©2012 DataStax
Ebay
• Social Signals
• Hunch Taste Graph
• Various Time Series
57
Thursday, May 30, 13
©2012 DataStax
Social Signals
• Like, Own, Want
• Need:
• scalable counters
• high performance writes
• want to find most popular items in a given
category
58
Thursday, May 30, 13
©2012 DataStax
Social Signals
59
Row Key
item_id_1 like: 300 own:104 want:105
item_id_2 ... ... ...
ItemCount
Row Key
user_id_1 like: 50 own:10 want:75
user_id_2 ... ... ...
UserCount
Thursday, May 30, 13
©2012 DataStax
Social Signals
60
Row Key
item_id_1 user_id_1:<time> user_id_2:<time> ...
item_id_2 ... ... ...
ItemLike
Row Key
user_id_1 <time>: <item_id> <time>: <item_id> ...
user_id_2 ... ... ...
UserLike
Thursday, May 30, 13
©2012 DataStax
Social Signals - Possibilities
• Store aggregated counts per category
• Column names are counts
• Get top N items in a category
61
Thursday, May 30, 13
Questions?
Thursday, May 30, 13
Come to the Summit!
Ask me for a discount code
June 11-12, 2013
San Francisco, CA
http://www.datastax.com/company/news-and-events/events/
cassandrasummit2013
Thursday, May 30, 13

Introduction to Cassandra and Data Modeling