Cassandra introduction at FinishJUG

@doanduyhai
Introduction to Cassandra
DuyHai DOAN, Technical Advocate

@doanduyhai
Who Am I ?!
Duy Hai DOAN
Cassandra technical advocate
•  talks, meetups, confs
•  open-source devs (Achilles, …)
•  OSS Cassandra point of contact
☞ duy_hai.doan@datastax.com
☞ @doanduyhai
2

@doanduyhai
Datastax!
•  Founded in April 2010
•  We contribute a lot to Apache Cassandra™
•  400+ customers (25 of the Fortune 100), 200+ employees
•  Headquarter in San Francisco Bay area
•  EU headquarter in London, ofﬁces in France and Germany
•  Datastax Enterprise = OSS Cassandra + extra features
3

@doanduyhai
Agenda!
Architecture
•  Cluster, Replication, Consistency
Data model
•  Last Write Win (LWW), CQL basics, From SQL to CQL,
Lightweight Transaction
DSE
Use Cases
4

@doanduyhai
Cassandra history!
NoSQL database
•  created at Facebook
•  open-sourced since 2008
•  current version = 2.1
•  column-oriented ☞ distributed table
5

@doanduyhai
Cassandra 5 key facts!
Key fact 1: linear scalability
C*
C*C*
NetcoSports
3 nodes, ≈3GB
1k+ nodes, PB+
YOU
6

@doanduyhai
Key fact 2: continuous availability (≈100% up-time)
•  resilient architecture (Dynamo)

7

@doanduyhai
Key fact 3: multi-data centers
•  out-of-the-box (conﬁg only)
•  AWS conf for multi-region DCs
•  GCE/CloudStack support
•  Microsoft Azure

8

@doanduyhai
Multi-DC usages!
New York (DC1)
London (DC2)
Data-locality, disaster recovery
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3
n4n5
n1
Async
replication
9

@doanduyhai
Multi-DC usages!
Workload segregation/virtual DC
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3
n4n5
n1
Production
(Live)
Analytics
(Spark/Hadoop)
Same
room
Async
replication
10

@doanduyhai
Multi-DC usages!
Prod data copy for testing/benchmarking
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3n1
Use
LOCAL
consistency
My tiny test
cluster
Data copy
NEVER WRITE HERE !!!
11

@doanduyhai
Key fact 4: operational simplicity
•  1 node = 1 process + 2 conﬁg ﬁle (main + IP)
•  deployment automation
•  OpsCenter for monitoring

12

@doanduyhai
13

@doanduyhai
Key fact 5: analytics combo
•  Cassandra + Spark = awesome !
•  realtime streaming/analytics/aggregation …
14

Cassandra architecture!
Cluster
Replication
Consistency

@doanduyhai
Cassandra architecture!
Cluster layer
•  Amazon DynamoDB paper
•  masterless architecture

Data-store layer
•  Google Big Table paper
•  Columns/columns family
16

@doanduyhai
Data distribution!
Random: hash of #partition → token = hash(#p)

Hash: ]-X, X]

X = huge number (264/2)

n1
n2
n3
n4
n5
n6
n7
n8
17

@doanduyhai
Token Ranges!
A: ]0, X/8]
B: ] X/8, 2X/8]
C: ] 2X/8, 3X/8]
D: ] 3X/8, 4X/8]
E: ] 4X/8, 5X/8]
F: ] 5X/8, 6X/8]
G: ] 6X/8, 7X/8]
H: ] 7X/8, X]
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H
18

@doanduyhai
Distributed Table!
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H
user_id1
user_id2
user_id3
user_id4
user_id5
19

@doanduyhai
Distributed Table!
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H
user_id1
user_id2
user_id3
user_id4
user_id5
20

@doanduyhai
Linear scalability!
n1
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3 n4
n5
n6
n7
n8n9
n10
8 nodes 10 nodes
21

@doanduyhai
Failure tolerance!
Replication Factor (RF) = 3
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
{B, A, H}
{C, B, A}
{D, C, B}
A
B
C
D
E
F
G
H
22

@doanduyhai
Coordinator node!
Incoming requests (read/write)

Coordinator node handles the request
Every node can be coordinator àmasterless
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
request
23

@doanduyhai
Consistency!
Tunable at runtime
•  ONE
•  QUORUM (strict majority w.r.t. RF)
•  ALL

Apply both to read & write

24

@doanduyhai
Consistency in action!
RF = 3, Write ONE, Read ONE
B A A
B A A
Read ONE: A
data replication in progress …
Write ONE: B
25

@doanduyhai
RF = 3, Write ONE, Read QUORUM
B A A
Write ONE: B
Read QUORUM: A
B A A
26

@doanduyhai
RF = 3, Write ONE, Read ALL
B A A
Read ALL: B
B A A
Write ONE: B
27

@doanduyhai
RF = 3, Write QUORUM, Read ONE

B B A
Write QUORUM: B
Read ONE: A
B B A
28

@doanduyhai
RF = 3, Write QUORUM, Read QUORUM
B B A
Read QUORUM: B
B B A
Write QUORUM: B
29

@doanduyhai
Consistency trade-off!
30

@doanduyhai
Consistency level!
ONE
Fast, may not read latest written value

31

@doanduyhai
Consistency level!
QUORUM
Strict majority w.r.t. Replication Factor
Good balance
32

@doanduyhai
Consistency level!
ALL
Paranoid
Slow, no high availability
33

@doanduyhai
Consistency summary!

ONERead + ONEWrite
☞ available for read/write even (N-1) replicas down

QUORUMRead + QUORUMWrite
☞ available for read/write even 1+ replica down
34

Data model!
Last Write Win!
CQL basics!
From SQL to CQL!
Lightweight Transaction!

@doanduyhai
Cassandra Write Path!
Commit log1
. . .
1
Commit log2
Commit logn
Memory
37

@doanduyhai
Memory
Commit log1
. . .
1
Commit log2
Commit logn
MemTable
Table1
MemTable
Table2
MemTable
TableN
2
. . .
38

@doanduyhai
Commit log1
Commit log2
Commit logn
Table1
SSTable1
Table2 Table3
SSTable2 SSTable3
3
Memory
. . .
39

@doanduyhai
Commit log1
Commit log2
Commit logn
Table1
SSTable1
Table2 Table3
SSTable2 SSTable3
Memory. . .
MemTable
Table1
MemTable
Table2
MemTable
TableN
. . .
40

@doanduyhai
Commit log1
Commit log2
Commit logn
Table1
SSTable1
Table2 Table3
SSTable2 SSTable3
Memory
SSTable1
SSTable2
SSTable3
. . .
41

@doanduyhai
Last Write Win (LWW)!
jdoe
age
name
33 John DOE
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
#partition
42

@doanduyhai
jdoe
age (t1) name (t1)
33 John DOE
auto-generated timestamp
.
43

@doanduyhai
UPDATE users SET age = 34 WHERE login = ‘jdoe’;
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2
44

@doanduyhai
DELETE age FROM users WHERE login = ‘jdoe’;
jdoe
age (t3)
ý
tombstone
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2 SSTable3
45

@doanduyhai
SELECT age FROM users WHERE login = ‘jdoe’;
???
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
46

@doanduyhai
✓✕✕
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
47

@doanduyhai
Compaction!
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
New SSTable
jdoe
age (t3) name (t1)
ý John DOE
48

@doanduyhai
Historical data!
history
id
date1(t1) date2(t2) … date9(t9)
… … … …
SSTable1 SSTable2
You want to keep data history ?
•  do not use internal generated timestamp !!!
•  ☞ time-series data modeling
id
date10(t10)date11(t11) …
…
… … … …
49

@doanduyhai
CRUD operations!


UPDATE users SET age = 34 WHERE login = ‘jdoe’;

DELETE age FROM users WHERE login = ‘jdoe’;

50

@doanduyhai
Simple Table!

CREATE TABLE users (

login text,

name text,

age int,

…

PRIMARY KEY(login));
partition key (#partition)
51

@doanduyhai
What about joins ?!
How can I join data between tables ?
How can I model 1 – N relationships ?

How to model a mailbox ?
EmailsUser
1 n
52

@doanduyhai
Clustered table (1 – N)!

CREATE TABLE mailbox (

login text,

message_id timeuuid,

interlocutor text,

message text,

PRIMARY KEY((login), message_id));
partition key clustering column
(sorted)
unicity
53

@doanduyhai
SSTable2
SSTable1
On disk layout
jdoe
message_id1 message_id2 … message_id104
… … … …
hsue
… … … …
jdoe
… … … …
54

@doanduyhai
Queries!
Get message by user and message_id (date)

SELECT * FROM mailbox WHERE login = jdoe

and message_id = ‘2014-09-25 16:00:00’;
Get message by user and date interval

SELECT * FROM mailbox WHERE login = jdoe

and message_id <= ‘2014-09-25 16:00:00’

and message_id >= ‘2014-09-20 16:00:00’;
55

@doanduyhai
Queries!
Get message by message_id only ?

SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’;
Get message by date interval only ?

SELECT * FROM mailbox WHERE

and message_id <= ‘2014-09-25 16:00:00’

and message_id >= ‘2014-09-20 16:00:00’;
❓
❓
56

@doanduyhai
Queries!
Get message by message_id only (#partition not provided)

SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’;
Get message by date interval only (#partition not provided)

SELECT * FROM mailbox WHERE

and message_id <= ‘2014-09-25 16:00:00’

and message_id >= ‘2014-09-20 16:00:00’;
57

@doanduyhai
Without #partition
?
?
?
?
?
?
?
?
❓
❓
❓
❓
❓
❓
❓
❓
No #partition
☞ no token
☞ where are my data ?
58

@doanduyhai
The importance of #partition
In RDBMS, no primary key
☞ full table scan
😭
59

@doanduyhai
The importance of #partition
With Cassandra, no partition key
☞ full CLUSTER scan
😱
60

@doanduyhai
Queries!

SELECT * FROM mailbox WHERE login >= ‘hsue’ and login <= ‘jdoe’;
Get message by user range (range query on #partition)

SELECT * FROM mailbox WHERE login like ‘%doe%‘;
Get message by user pattern (non exact match on #partition)
61

@doanduyhai
WHERE clause restrictions!
All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition
Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed
•  ☞ full cluster scan

On clustering columns, only range queries (<, ≤, >, ≥) and exact match

WHERE clause only possible
•  on columns deﬁned in PRIMARY KEY
•  on indexed columns ( )
62

@doanduyhai
What if I want to perform « arbitrary » WHERE clause ?
•  search form scenario, dynamic search ﬁelds
63

@doanduyhai

DO NOT RE-INVENT THE WHEEL !
☞ Apache Solr (Lucene) integration (Datastax Enterprise)
☞ Same JVM, 1-cluster-2-products (Solr & Cassandra)
64

@doanduyhai

DO NOT RE-INVENT THE WHEEL !
☞ Apache Solr (Lucene) integration (Datastax Enterprise)
☞ Same JVM, 1-cluster-2-products (Solr & Cassandra)

SELECT * FROM users WHERE solr_query = ‘age:[33 TO *] AND gender:male’;

SELECT * FROM users WHERE solr_query = ‘lastname:*schwei?er’;
65

@doanduyhai
Collections & maps!


login text,

name text,

age int,

friends set<text>,

hobbies list<text>,

languages map<int, text>,

…

66
Keep the cardinality low ≈ 1000

@doanduyhai
User Deﬁned Type (UDT)!


login text,

…

street_number int,

street_name text,

postcode int,

country text,

…

Instead of
67

@doanduyhai
User Deﬁned Type (UDT)!

CREATE TYPE address (

street_number int,

street_name text,

postcode int,

country text);


login text,

…

location frozen <address>,

…

68

@doanduyhai
UDT insert!

INSERT INTO users(login,name, location) VALUES (

‘jdoe’,

’John DOE’,

{
‘street_number’: 124,
‘street_name’: ‘Congress Avenue’,
‘postcode’: 95054,
‘country’: ‘USA’
});
69

@doanduyhai
UDT update!

UPDATE users set location =

{
‘street_number’: 125,
‘street_name’: ‘Congress Avenue’,
‘postcode’: 95054,
‘country’: ‘USA’
}
WHERE login = jdoe;
Can be nested ☞ store documents
•  but no dynamic ﬁelds (or use map<text, blob>)
70

@doanduyhai
From SQL to CQL!
Normalized
Comment
User
1
n
CREATE TABLE comments (

article_id uuid,

comment_id timeuuid,

author_login text, // typical join id

content text,

PRIMARY KEY((article_id), comment_id));
71

@doanduyhai
From SQL to CQL
1 SELECT
-  10 last comments
-  10 author_login

What to do with 10 author_login ???
Comment
User
1
n
72

@doanduyhai
From SQL to CQL
1 SELECT
-  10 last comments
-  10 author_login

What to do with 10 author_login ???
10 extra SELECT → N+1 SELECT problem !
Comment
User
1
n
73

@doanduyhai
From SQL to CQL!
De-normalized
Comment
User
1
n
CREATE TABLE comments (

article_id uuid,

comment_id timeuuid,

author frozen<person>, // person is UDT

content text,

PRIMARY KEY((article_id), comment_id));
74

@doanduyhai
Data modeling best practices!
Start by queries
•  identify core functional read paths
•  1 read scenario ≈ 1 SELECT

75

@doanduyhai
Start by queries
•  identify core functional read paths
•  1 read scenario ≈ 1 SELECT

Denormalize
•  wisely, only duplicate necessary & immutable data
•  functional/technical trade-off
76

@doanduyhai
Person UDT
- firstname/lastname
- date of birth
- gender
- mood
- location
77

@doanduyhai
John DOE, male
birthdate: 21/02/1981
subscribed since 03/06/2011
☉ San Mateo, CA
’’Impossible is not John DOE’’
Full detail read from
User table on click
78

@doanduyhai
Data modeling trade-oﬀ
What if ...
•  not possible to de-normalize with immutable data ?
•  have to duplicate mutable data ?

79

@doanduyhai
2 strategies
•  either accept to normalize some data (extra SELECT required)
•  or de-normalize and update everywhere upon data mutation
80

@doanduyhai
2 strategies

But always keep those scenarios rare (5%-10% max), focus on the 90%

81

@doanduyhai
2 strategies

But always keep those scenarios rare (5%-10% max), focus on the 90%

Example: Twitter tweet deletion
82

@doanduyhai
Lightweight Transaction (LWT)!
What ? ☞ make operations linearizable

Why ? ☞ solve a class of race conditions in Cassandra that
would require installing an external lock manager
84

@doanduyhai
INSERT INTO account (id, email)
VALUES (‘jdoe’,
‘john_doe@fiction.com’);
SELECT * FROM account
WHERE id= ‘jdoe’;
(0 rows)
SELECT * FROM account
WHERE id= ‘jdoe’;
(0 rows)
INSERT INTO account (id, email)
VALUES (‘jdoe’,
‘jdoe@fiction.com’);
winner
85

@doanduyhai
How ? ☞ implementing Paxos protocol on Cassandra

Syntax ?

INSERT INTO account (id, email) VALUES (‘jdoe’, ‘john_doe@ﬁction.com’)

IF NOT EXISTS;

UPDATE account SET email = ‘jdoe@ﬁction.com’

IF email = ‘john_doe@fiction.com’ WHERE id=‘jdoe’;
86

@doanduyhai
Recommendations
•  insert with LWT ☞ delete must use LWT

INSERT INTO my_table … IF NOT EXISTS

☞ DELETE FROM my_table … IF EXISTS
87

@doanduyhai
Recommendations
•  LWT expensive (4 round-trips), do not abuse
•  only for 1% – 5% use cases
88

@doanduyhai
1
2
3
4Compare Swap / Learn
Queue-in Consensus
89

@doanduyhai
DSE (Datastax Enterprise)!
Security
Analytics (Spark & Hadoop)
Search (Solr)
91

@doanduyhai
Use Cases!
Messaging
Collections/
Playlists
Fraud
detection
Recommendation/
Personalization
Internet of things/
Sensor data
92

@doanduyhai
Use Cases!
Messaging
Collections/
Playlists
Fraud
detection
Recommendation/
Personalization
Internet of things/
Sensor data
93

Thank You
@doanduyhai
duy_hai.doan@datastax.com
https://academy.datastax.com/

Cassandra introduction at FinishJUG

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Cassandra introduction at FinishJUG

Similar to Cassandra introduction at FinishJUG (20)

More from Duyhai Doan

More from Duyhai Doan (18)

Recently uploaded

Recently uploaded (20)

Cassandra introduction at FinishJUG