Introduction to Cassandra and Data Modeling

Cassandra
Nick Bailey
@nickmbailey
nick@datastax.com
Thursday, May 30, 13

©2012 DataStax
Introduction
2

©2012 DataStax
Why does Cassandra Exist?
3

©2012 DataStax
Analytics
+
Real Time
4
Big Data

©2012 DataStax
Architecture
5

©2012 DataStax
Dynamo
+
BigTable
6

©2012 DataStax
Why do people like Cassandra?
7

©2012 DataStax
Availability
8

©2012 DataStax
Scalability
9

©2012 DataStax
Performance
11

©2012 DataStax
Multi Datacenter Support
13

©2012 DataStax
Hadoop Support
15

©2012 DataStax
Hadoop Support
• InputFormat
• Run tasktrackers/datanodes locally
• Run namenode/jobtracker anywhere
16

©2012 DataStax
Data Locality
Workload Partitioning
17

©2012 DataStax
Data Modeling
18

©2012 DataStax
Keyspace,
Column Families
19

©2012 DataStax
Database,
Tables
20

©2012 DataStax
Column Family =
Row Key + Columns (name, value)
...
21

©2012 DataStax
Static Column Families
Dynamic Column Families
22

©2012 DataStax
Static - Users Column Family
23
Row Key
g_m_bluth
password:
banana stand
name: George
Michael
tobias_f
password:
c_weathers
name:Tobias phone: 512-7777

©2012 DataStax
Dynamic - Friend Column Family
24
Row Key
g_m_bluth <date>:ann_v <date>:maeby
tobias_f <date>:barry_z <date>:carl_w <date>:lindsay ...

©2012 DataStax
Time Series Data
• Event logs
• Metrics
• Sensor Data
• Etc
25

©2012 DataStax
Time Series - Login CF
26
Row Key
g_m_bluth
1369633061:
United States
1369625839:
Mexico
...
tobias_f
1369932413:
Canada
1369681738:
United States
...

©2012 DataStax
What Else?
27

©2012 DataStax
Counter Columns
28
• Inc/Dec operations
• Not idempotent
• Possibility for over counting

©2012 DataStax
Expiring Columns
29
• TTL - Time to live
• Set per column
• Possibly an anti-pattern (we’ll get to that later)

©2012 DataStax
Secondary Indexes
30
• Select * from Users where name=Nick;
• Only support ‘=’ clauses (for ﬁrst condition)
• Often misused

©2012 DataStax
CQL
Cassandra Query Language
31

©2012 DataStax
How do I start?
33

©2012 DataStax
Deﬁne your questions
34

©2012 DataStax
SELECT time, location FROM
logins WHERE user =
‘nickmbailey’ ORDER BY time
DESC LIMIT 10;
35

©2012 DataStax
WHERE user = ‘nickmbailey’
Row Key
36

©2012 DataStax
ORDER BY time DESC LIMIT
10;
Store columns in chronological
order
37

©2012 DataStax
CREATE COLUMN FAMILY logins (
! user,
time,
location,
PRIMARY KEY (user, time));
38

©2012 DataStax
What about?
39

©2012 DataStax
SELECT time FROM logins
WHERE user = ‘nickmbailey’
and location = ‘United States’;
40

©2012 DataStax 41
g_m_bluth
1369633061:
United States
1369625839:
Mexico
....
1369622839:
Canada
1369422839:
Canada
1368422839:
Canada
....
1368421839:
Canada
1367421839:
United States
1367411839:
Mexico
....

©2012 DataStax
CREATE COLUMN FAMILY
logins (user, time, location,
PRIMARY KEY (user, location));
42

©2012 DataStax 43
g_m_bluth
United States:
1369633061
Canada:
1369622839
....

©2012 DataStax
To Normalize or Not
44

©2012 DataStax 46
g_m_bluth
1369633061:
<United States,
Austin,
Texas,
78701>
1369625839:
<Mexico,
Tiajuana,
88191>
1358633061:
<United
States,Austin,
Texas,
78701>

©2012 DataStax
Queues
• More generally, many deletes within a row
• A delete in Cassandra is actually a tombstone
• Read 1000 tombstones in order to ﬁnd 10
columns
52

©2012 DataStax
Social Signals
• Like, Own, Want
• Need:
• scalable counters
• high performance writes
• want to ﬁnd most popular items in a given
category
58

©2012 DataStax
Social Signals
59
Row Key
item_id_1 like: 300 own:104 want:105
item_id_2 ... ... ...
ItemCount
Row Key
user_id_1 like: 50 own:10 want:75
user_id_2 ... ... ...
UserCount

©2012 DataStax
Social Signals
60
Row Key
item_id_1 user_id_1:<time> user_id_2:<time> ...
item_id_2 ... ... ...
ItemLike
Row Key
user_id_1 <time>: <item_id> <time>: <item_id> ...
user_id_2 ... ... ...
UserLike

©2012 DataStax
Social Signals - Possibilities
• Store aggregated counts per category
• Column names are counts
• Get top N items in a category
61

Questions?

Come to the Summit!
Ask me for a discount code
June 11-12, 2013
San Francisco, CA
http://www.datastax.com/company/news-and-events/events/
cassandrasummit2013

Introduction to Cassandra and Data Modeling

Recommended

Recommended

More Related Content

Similar to Introduction to Cassandra and Data Modeling

Similar to Introduction to Cassandra and Data Modeling (8)

More from nickmbailey

More from nickmbailey (7)

Introduction to Cassandra and Data Modeling