Cassandra introduction @ NantesJUG

@doanduyhai
Introduction to Cassandra
DuyHai DOAN, Technical Advocate

@doanduyhai
Who Am I ?!
Duy Hai DOAN
Cassandra technical advocate
•  talks, meetups, confs
•  open-source devs (Achilles, …)
•  OSS Cassandra point of contact
☞ duy_hai.doan@datastax.com
☞ @doanduyhai
2

@doanduyhai
Datastax!
•  Founded in April 2010
•  We contribute a lot to Apache Cassandra™
•  400+ customers (25 of the Fortune 100), 200+ employees
•  Headquarter in San Francisco Bay area
•  EU headquarter in London, ofﬁces in France and Germany
•  Datastax Enterprise = OSS Cassandra + extra features
3

@doanduyhai
Agenda!
Architecture
•  Cluster, Replication, Consistency
Data model
•  Last Write Win (LWW), CQL basics, From SQL to CQL,
Lightweight Transaction
DSE
Use Cases
4

@doanduyhai
Cassandra history!
NoSQL database
•  created at Facebook
•  open-sourced since 2008
•  current version = 2.1
•  column-oriented ☞ distributed table
5

@doanduyhai
Cassandra 5 key facts!
Key fact 1: linear scalability
C*
C*C*
NetcoSports
3 nodes, ≈3GB
1k+ nodes, PB+
YOU
6

@doanduyhai
Key fact 2: continuous availability (≈100% up-time)
•  resilient architecture (Dynamo)

7

@doanduyhai
Key fact 3: multi-data centers
•  out-of-the-box (conﬁg only)
•  AWS conf for multi-region DCs
•  GCE/CloudStack support
•  Microsoft Azure

8

@doanduyhai
Multi-DC usages!
New York (DC1)
London (DC2)
Data-locality, disaster recovery
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3
n4n5
n1
Async
replication
9

@doanduyhai
Multi-DC usages!
Workload segregation/virtual DC
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3
n4n5
n1
Production
(Live)
Analytics
(Spark/Hadoop)
Same
room
Async
replication
10

@doanduyhai
Multi-DC usages!
Prod data copy for testing/benchmarking
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3n1
Use
LOCAL
consistency
My tiny test
cluster
Data copy
NEVER WRITE HERE !!!
11

@doanduyhai
Key fact 4: operational simplicity
•  1 node = 1 process + 2 conﬁg ﬁle (main + IP)
•  deployment automation
•  OpsCenter for monitoring

12

@doanduyhai
13

@doanduyhai
Key fact 5: analytics combo
•  Cassandra + Spark = awesome !
•  realtime streaming/analytics/aggregation …
14

Cassandra architecture!
Cluster
Replication
Consistency

@doanduyhai
Cassandra architecture!
Cluster layer
•  Amazon DynamoDB paper
•  masterless architecture

Data-store layer
•  Google Big Table paper
•  Columns/columns family
16

@doanduyhai
Data distribution!
Random: hash of #partition → token = hash(#p)

Hash: ]-X, X]

X = huge number (264/2)

n1
n2
n3
n4
n5
n6
n7
n8
17

@doanduyhai
Token Ranges!
A: ]0, X/8]
B: ] X/8, 2X/8]
C: ] 2X/8, 3X/8]
D: ] 3X/8, 4X/8]
E: ] 4X/8, 5X/8]
F: ] 5X/8, 6X/8]
G: ] 6X/8, 7X/8]
H: ] 7X/8, X]
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H
18

@doanduyhai
Distributed Table!
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H
user_id1
user_id2
user_id3
user_id4
user_id5
19

@doanduyhai
Distributed Table!
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H
user_id1
user_id2
user_id3
user_id4
user_id5
20

@doanduyhai
Linear scalability!
n1
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3 n4
n5
n6
n7
n8n9
n10
8 nodes 10 nodes
21

@doanduyhai
Failure tolerance!
Replication Factor (RF) = 3
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
{B, A, H}
{C, B, A}
{D, C, B}
A
B
C
D
E
F
G
H
22

@doanduyhai
Coordinator node!
Incoming requests (read/write)

Coordinator node handles the request
Every node can be coordinator àmasterless
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
request
23

@doanduyhai
Consistency!
Tunable at runtime
•  ONE
•  QUORUM (strict majority w.r.t. RF)
•  ALL

Apply both to read & write

24

@doanduyhai
Consistency in action!
RF = 3, Write ONE, Read ONE
B A A
B A A
Read ONE: A
data replication in progress …
Write ONE: B
25

@doanduyhai
RF = 3, Write ONE, Read QUORUM
B A A
Write ONE: B
Read QUORUM: A
B A A
26

@doanduyhai
RF = 3, Write ONE, Read ALL
B A A
Read ALL: B
B A A
Write ONE: B
27

@doanduyhai
RF = 3, Write QUORUM, Read ONE

B B A
Write QUORUM: B
Read ONE: A
B B A
28

@doanduyhai
RF = 3, Write QUORUM, Read QUORUM
B B A
Read QUORUM: B
B B A
Write QUORUM: B
29

@doanduyhai
Consistency trade-off!
30

@doanduyhai
Consistency level!
ONE
Fast, may not read latest written value

31

@doanduyhai
Consistency level!
QUORUM
Strict majority w.r.t. Replication Factor
Good balance
32

@doanduyhai
Consistency level!
ALL
Paranoid
Slow, no high availability
33

@doanduyhai
Consistency summary!

ONERead + ONEWrite
☞ available for read/write even (N-1) replicas down

QUORUMRead + QUORUMWrite
☞ available for read/write even 1+ replica down
34

Data model!
Last Write Win!
CQL basics!
From SQL to CQL!
Lightweight Transaction!

@doanduyhai
Cassandra Write Path!
Commit log1
. . .
1
Commit log2
Commit logn
Memory
37

@doanduyhai
Memory
Commit log1
. . .
1
Commit log2
Commit logn
MemTable
Table1
MemTable
Table2
MemTable
TableN
2
. . .
38

@doanduyhai
Commit log1
Commit log2
Commit logn
Table1
SSTable1
Table2 Table3
SSTable2 SSTable3
3
Memory
. . .
39

@doanduyhai
Commit log1
Commit log2
Commit logn
Table1
SSTable1
Table2 Table3
SSTable2 SSTable3
Memory. . .
MemTable
Table1
MemTable
Table2
MemTable
TableN
. . .
40

@doanduyhai
Commit log1
Commit log2
Commit logn
Table1
SSTable1
Table2 Table3
SSTable2 SSTable3
Memory
SSTable1
SSTable2
SSTable3
. . .
41

@doanduyhai
Last Write Win (LWW)!
jdoe
age
name
33 John DOE
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
#partition
42

@doanduyhai
jdoe
age (t1) name (t1)
33 John DOE
auto-generated timestamp
.
43

@doanduyhai
UPDATE users SET age = 34 WHERE login = ‘jdoe’;
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2
44

@doanduyhai
DELETE age FROM users WHERE login = ‘jdoe’;
jdoe
age (t3)
ý
tombstone
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2 SSTable3
45

@doanduyhai
SELECT age FROM users WHERE login = ‘jdoe’;
???
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
46

@doanduyhai
✓✕✕
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
47

@doanduyhai
Compaction!
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
New SSTable
jdoe
age (t3) name (t1)
ý John DOE
48

@doanduyhai
Historical data!
history
id
date1(t1) date2(t2) … date9(t9)
… … … …
SSTable1 SSTable2
You want to keep data history ?
•  do not use internal generated timestamp !!!
•  ☞ time-series data modeling
id
date10(t10)date11(t11) …
…
… … … …
49

@doanduyhai
CRUD operations!


UPDATE users SET age = 34 WHERE login = ‘jdoe’;

DELETE age FROM users WHERE login = ‘jdoe’;

50

@doanduyhai
Simple Table!

CREATE TABLE users (

login text,

name text,

age int,

…

PRIMARY KEY(login));
partition key (#partition)
51

@doanduyhai
What about joins ?!
How can I join data between tables ?
How can I model 1 – N relationships ?

How to model a mailbox ?
EmailsUser
1 n
52

@doanduyhai
Clustered table (1 – N)!

CREATE TABLE mailbox (

login text,

message_id timeuuid,

interlocutor text,

message text,

PRIMARY KEY((login), message_id));
partition key clustering column
(sorted)
unicity
53

@doanduyhai
SSTable2
SSTable1
On disk layout
jdoe
message_id1 message_id2 … message_id104
… … … …
hsue
… … … …
jdoe
… … … …
54

@doanduyhai
Queries!
Get message by user and message_id (date)

SELECT * FROM mailbox WHERE login = jdoe

and message_id = ‘2014-09-25 16:00:00’;
Get message by user and date interval

SELECT * FROM mailbox WHERE login = jdoe

and message_id <= ‘2014-09-25 16:00:00’

and message_id >= ‘2014-09-20 16:00:00’;
55

@doanduyhai
Queries!
Get message by message_id only ?

SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’;
Get message by date interval only ?

SELECT * FROM mailbox WHERE

and message_id <= ‘2014-09-25 16:00:00’

and message_id >= ‘2014-09-20 16:00:00’;
❓
❓
56

@doanduyhai
Queries!
Get message by message_id only (#partition not provided)

SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’;
Get message by date interval only (#partition not provided)

SELECT * FROM mailbox WHERE

and message_id <= ‘2014-09-25 16:00:00’

and message_id >= ‘2014-09-20 16:00:00’;
57

@doanduyhai
Without #partition
?
?
?
?
?
?
?
?
❓
❓
❓
❓
❓
❓
❓
❓
No #partition
☞ no token
☞ where are my data ?
58

@doanduyhai
The importance of #partition
In RDBMS, no primary key
☞ full table scan
😭
59

@doanduyhai
The importance of #partition
With Cassandra, no partition key
☞ full CLUSTER scan
😱
60

@doanduyhai
Queries!

SELECT * FROM mailbox WHERE login >= ‘hsue’ and login <= ‘jdoe’;
Get message by user range (range query on #partition)

SELECT * FROM mailbox WHERE login like ‘%doe%‘;
Get message by user pattern (non exact match on #partition)
61

@doanduyhai
WHERE clause restrictions!
All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition
Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed
•  ☞ full cluster scan

On clustering columns, only range queries (<, ≤, >, ≥) and exact match

WHERE clause only possible
•  on columns deﬁned in PRIMARY KEY
•  on indexed columns ( )
62

@doanduyhai
What if I want to perform « arbitrary » WHERE clause ?
•  search form scenario, dynamic search ﬁelds
63

@doanduyhai

DO NOT RE-INVENT THE WHEEL !
☞ Apache Solr (Lucene) integration (Datastax Enterprise)
☞ Same JVM, 1-cluster-2-products (Solr & Cassandra)
64

@doanduyhai

DO NOT RE-INVENT THE WHEEL !
☞ Apache Solr (Lucene) integration (Datastax Enterprise)
☞ Same JVM, 1-cluster-2-products (Solr & Cassandra)

SELECT * FROM users WHERE solr_query = ‘age:[33 TO *] AND gender:male’;

SELECT * FROM users WHERE solr_query = ‘lastname:*schwei?er’;
65

@doanduyhai
Collections & maps!


login text,

name text,

age int,

friends set<text>,

hobbies list<text>,

languages map<int, text>,

…

66
Keep the cardinality low ≈ 1000

@doanduyhai
User Deﬁned Type (UDT)!


login text,

…

street_number int,

street_name text,

postcode int,

country text,

…

Instead of
67

@doanduyhai
User Deﬁned Type (UDT)!

CREATE TYPE address (

street_number int,

street_name text,

postcode int,

country text);


login text,

…

location frozen <address>,

…

68

@doanduyhai
UDT insert!

INSERT INTO users(login,name, location) VALUES (

‘jdoe’,

’John DOE’,

{
‘street_number’: 124,
‘street_name’: ‘Congress Avenue’,
‘postcode’: 95054,
‘country’: ‘USA’
});
69

@doanduyhai
UDT update!

UPDATE users set location =

{
‘street_number’: 125,
‘street_name’: ‘Congress Avenue’,
‘postcode’: 95054,
‘country’: ‘USA’
}
WHERE login = jdoe;
Can be nested ☞ store documents
•  but no dynamic ﬁelds (or use map<text, blob>)
70

@doanduyhai
From SQL to CQL!
Normalized
Comment
User
1
n
CREATE TABLE comments (

article_id uuid,

comment_id timeuuid,

author_login text, // typical join id

content text,

PRIMARY KEY((article_id), comment_id));
71

@doanduyhai
From SQL to CQL
1 SELECT
-  10 last comments
-  10 author_login

What to do with 10 author_login ???
Comment
User
1
n
72

@doanduyhai
From SQL to CQL
1 SELECT
-  10 last comments
-  10 author_login

What to do with 10 author_login ???
10 extra SELECT → N+1 SELECT problem !
Comment
User
1
n
73

@doanduyhai
From SQL to CQL!
De-normalized
Comment
User
1
n
CREATE TABLE comments (

article_id uuid,

comment_id timeuuid,

author frozen<person>, // person is UDT

content text,

PRIMARY KEY((article_id), comment_id));
74

@doanduyhai
Data modeling best practices!
Start by queries
•  identify core functional read paths
•  1 read scenario ≈ 1 SELECT

75

@doanduyhai
Start by queries
•  identify core functional read paths
•  1 read scenario ≈ 1 SELECT

Denormalize
•  wisely, only duplicate necessary & immutable data
•  functional/technical trade-off
76

@doanduyhai
Person UDT
- firstname/lastname
- date of birth
- gender
- mood
- location
77

@doanduyhai
John DOE, male
birthdate: 21/02/1981
subscribed since 03/06/2011
☉ San Mateo, CA
’’Impossible is not John DOE’’
Full detail read from
User table on click
78

@doanduyhai
Data modeling trade-oﬀ
What if ...
•  not possible to de-normalize with immutable data ?
•  have to duplicate mutable data ?

79

@doanduyhai
2 strategies
•  either accept to normalize some data (extra SELECT required)
•  or de-normalize and update everywhere upon data mutation
80

@doanduyhai
2 strategies

But always keep those scenarios rare (5%-10% max), focus on the 90%

81

@doanduyhai
2 strategies

But always keep those scenarios rare (5%-10% max), focus on the 90%

Example: Twitter tweet deletion
82

@doanduyhai
Lightweight Transaction (LWT)!
What ? ☞ make operations linearizable

Why ? ☞ solve a class of race conditions in Cassandra that
would require installing an external lock manager
84

@doanduyhai
INSERT INTO account (id, email)
VALUES (‘jdoe’,
‘john_doe@fiction.com’);
SELECT * FROM account
WHERE id= ‘jdoe’;
(0 rows)
SELECT * FROM account
WHERE id= ‘jdoe’;
(0 rows)
INSERT INTO account (id, email)
VALUES (‘jdoe’,
‘jdoe@fiction.com’);
winner
85

@doanduyhai
How ? ☞ implementing Paxos protocol on Cassandra

Syntax ?

INSERT INTO account (id, email) VALUES (‘jdoe’, ‘john_doe@ﬁction.com’)

IF NOT EXISTS;

UPDATE account SET email = ‘jdoe@ﬁction.com’

IF email = ‘john_doe@fiction.com’ WHERE id=‘jdoe’;
86

@doanduyhai
Recommendations
•  insert with LWT ☞ delete must use LWT

INSERT INTO my_table … IF NOT EXISTS

☞ DELETE FROM my_table … IF EXISTS
87

@doanduyhai
Recommendations
•  LWT expensive (4 round-trips), do not abuse
•  only for 1% – 5% use cases
88

@doanduyhai
1
2
3
4Compare Swap / Learn
Queue-in Consensus
89

@doanduyhai
DSE (Datastax Enterprise)!
Security
Analytics (Spark & Hadoop)
Search (Solr)
91

@doanduyhai
Use Cases!
Messaging
Collections/
Playlists
Fraud
detection
Recommendation/
Personalization
Internet of things/
Sensor data
92

@doanduyhai
Use Cases!
Messaging
Collections/
Playlists
Fraud
detection
Recommendation/
Personalization
Internet of things/
Sensor data
93

Thank You
@doanduyhai
duy_hai.doan@datastax.com
https://academy.datastax.com/

Cassandra introduction @ NantesJUG

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Cassandra introduction @ NantesJUG

Similar to Cassandra introduction @ NantesJUG (20)

More from Duyhai Doan

More from Duyhai Doan (17)

Recently uploaded

Recently uploaded (20)

Cassandra introduction @ NantesJUG