Cassandra EU - Data model on fire

#CASSANDRAEU

Data Model on Fire

Patrick McFadin | Chief Evangelist DataStax
@PatrickMcFadin

Friday, October 18, 13

Data Model is King
•With 2.0 we now have more choices
•Sometimes the data model is only the first part
•Understanding the underlying engine helps
•You aren’t done until you tune
Load test baby!


#CASSANDRAEU

Light Weight Transactions


The race is on
Process 1

#CASSANDRAEU

Process 2

SELECT firstName, lastName
FROM users
WHERE username = 'pmcfadin';

T0
T1

(0 rows)

SELECT firstName, lastName
FROM users
WHERE username = 'pmcfadin';

(0 rows)

INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Patrick','McFadin',
['patrick@datastax.com'],
'ba27e03fd95e507daf2937c937d499ab',
'2011-06-20 13:50:00');

Got nothing! Good to go!

T2

T3
This one wins


VALUES ('pmcfadin','Paul','McFadin',
['paul@oracle.com'],
'ea24e13ad95a209ded8912e937d499de',
'2011-06-20 13:51:00');

Solution LWT

#CASSANDRAEU

Process 1

VALUES ('pmcfadin','Patrick','McFadin',
['patrick@datastax.com'],
'ba27e03fd95e507daf2937c937d499ab',
'2011-06-20 13:50:00')
IF NOT EXISTS;

[applied]
----------True

T0

T1

•Check performed for record
•Paxos ensures exclusive access
•applied = true: Success

Solution LWT
Process 2
T2

T3

VALUES ('pmcfadin','Paul','McFadin',
['paul@oracle.com'],
'ea24e13ad95a209ded8912e937d499de',
'2011-06-20 13:51:00')
IF NOT EXISTS;

[applied] | username | created_date
| firstname | lastname
-----------+----------+--------------------------+-----------+---------False | pmcfadin | 2011-06-20 13:50:00-0700 |
Patrick | McFadin

•applied = false: Rejected
•No record stomping!

#CASSANDRAEU

LWT Fine Print

#CASSANDRAEU

•Light Weight Transactions solve edge conditions
•They have latency cost.
• Be aware
• Load test
• Consider in your data model

•Now go shut down that ZooKeeper mess you have!


Form Versioning: Revisited


Form Versioning Pt 1
•From “Next top data model”
•Great idea, but edge conditions
CREATE TABLE working_version (
!
username varchar,
!
form_id int,
!
version_number int,
!
locked_by varchar,
!
form_attributes map<varchar,varchar>
!
PRIMARY KEY ((username, form_id), version_number)
) WITH CLUSTERING ORDER BY (version_number DESC);

•Each user has a form
•Each form needs versioning
•Need an exclusive lock on the form

#CASSANDRAEU

1. Insert ﬁrst version
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,1,'',
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<radio>':'Y,N'});

2. Lock for one user

Danger Zone

UPDATE working_version
SET locked_by = 'pmcfadin'
WHERE username = 'pmcfadin'
AND form_id = 1138
AND version_number = 1;

3. Insert new version. Release lock
VALUES ('pmcfadin',1138,2,null,
'Newsletter<checkbox>':'Y'});


#CASSANDRAEU


#CASSANDRAEU

1. Insert ﬁrst version
VALUES ('pmcfadin',1138,1,'pmcfadin',
'Newsletter<radio>':'Y,N'})
IF NOT EXISTS;

Exclusive lock
SET form_attributes['EmailAddress<text>'] = 'Primary Email Address: '
AND form_id = 1138
AND version_number = 1
IF locked_by = 'pmcfadin';

Accepted

SET form_attributes['EmailAddress<text>'] = 'Email Adx: '
AND form_id = 1138
AND version_number = 1
IF locked_by = 'dude';

Rejected
(sorry dude)


•Old way: Edge cases with problems
• Use external locking?
• Take your chances?

•New way: Managed expectations (LWT)
• Exclusive by existence check
• Continued with IF clause
• Downside: More latency


#CASSANDRAEU

Fire: Bring it


Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like


#CASSANDRAEU

Cassandra 2.0 Fire
Single pass compaction


#CASSANDRAEU

Cassandra 2.0 Fire
Hints to reduce SSTable reads


#CASSANDRAEU

Cassandra 2.0 Fire
Hints to reduce SSTable reads
Faster index reads from off-heap


#CASSANDRAEU

Why is this important?
•Reducing SStable reads mean less seeks
•Disk seeks can add up fast
•5 seeks on SATA = 60ms of just disk!
Avg Access Time*

Rotation Speed

12ms

7200 RPM

7ms

10k RPM

5ms

15k RPM

.04ms

SSD

* Source: www.tomshardware.com


#CASSANDRAEU

Why is this important?
•Reducing SStable reads mean less seeks
•Disk seeks can add up fast
•5 seeks on SATA = 60ms of just disk!
Avg Access Time*

Rotation Speed

12ms

7200 RPM

7ms

10k RPM

5ms

15k RPM

.04ms

SSD

Shared storage == Great sadness
* Source: www.tomshardware.com


#CASSANDRAEU

Quick Diversion

#CASSANDRAEU

•cfhistograms is your friend
•Histograms of statistics per table
•Collected...
• per read
• per write
• SSTable flush
• Compaction
nodetool cfhistograms <keyspace> <table>


#CASSANDRAEU

How do I even read this thing!


Histograms How to

#CASSANDRAEU

nodetool cfhistograms videodb users
videodb/users histograms
Offset
SSTables
Write Latency
(micros)
1
107
0
2
0
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

Partition Size
(bytes)
0
0
0
0
0
5

Cell Count

•Unit-less column
•Units are assigned by each column
•Numerical buckets

0
0
5
0
0
0

Histograms How to

#CASSANDRAEU

Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

Partition Size
(bytes)
0
0
0
0
0
5

•Per read. How many seeks?
•Offset is number of SSTables read
•Less == lower read latency
•107 reads took 1 seek to satisfy

Cell Count
0
0
5
0
0
0

Histograms How to

#CASSANDRAEU

Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

•Per write. How fast?
•Offset is microseconds


Partition Size
(bytes)
0
0
0
0
0
5

Cell Count
0
0
5
0
0
0

Histograms How to

#CASSANDRAEU

Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

•Per read. How fast?
•Offset is microseconds


Partition Size
(bytes)
0
0
0
0
0
5

Cell Count
0
0
5
0
0
0

Histograms How to

#CASSANDRAEU

Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

Partition Size
(bytes)
0
0
0
0
0
5

•Per partition (storage row)
•Offset is size in bytes
•5 partitions are 1250 bytes

Cell Count
0
0
5
0
0
0

Histograms How to

#CASSANDRAEU

Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

Partition Size
(bytes)
0
0
0
0
0
5

•Per partition (storage row)
•Offset is count of cells in partition
•5 partitions have 10 cells

Cell Count
0
0
5
0
0
0

Histograms + Data Model
•Your data model is the key to success
•How do you ensure that?
Test
Measure
Repeat


#CASSANDRAEU

Real World Example
•Real Customer
•Needed very tight SLA on reads

Problem

•Read response highly variable
•Loading data increases latency


#CASSANDRAEU

Offset

Friday, October

SSTables

1
2
3
4
5
6
7
8
10
12
14
17
20
24
29
35
42
50
60
72
86
103
124
149
179
215
258
310
372
446
535
642
770
924
1109
1331
1597
1916
2299
2759
3311
3973
4768
5722
6866
8239
9887
11864
14237
17084
20501
24601
29521
35425
42510
51012
61214
73457
88148
105778
126934
152321
18, 13

2016550
2064495
434526
51084
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Write Latency
(micros)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Read Latency
(micros)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3
18
47
71
141
67
36466
263829
608488
209549
398845
625099
462636
499920
380787
285323
202417
148920
106452
81533
55470
43512
30810
22375
15148
12047
11298
9652
6715
13788
15322
8585
5041
2892
1543
900
486
285

Partition Size
(bytes)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1629
0
2971
1468
59
45105
5731
132391
16265
20015
30980
44973
38502
69479
39218
23027
58498
73629
33444
28321
17021
13072
7790
7764
5890
4046
2973
1954
936
661
409
289

Cell Count
0
0
0
0
0
0
0
0
1629
2971
1286
68
188
101
50799
269
132414
32943
62099
116855
41562
42796
46719
57693
27659
26941
21589
19494
8681
9499
9360
4349
4242
2422
1685
954
610
366
303
188
106
64
55
23
15
3
2
0
1
0
0
3
0
0
0
0
0
0
0
0
0
0

#CASSANDRAEU

• Compactions behind
• Disk IO problems
• How to optimize?

Offset

Less
seeks

2 ms!

Friday, October

SSTables

1
2
3
4
5
6
7
8
10
12
14
17
20
24
29
35
42
50
60
72
86
103
124
149
179
215
258
310
372
446
535
642
770
924
1109
1331
1597
1916
2299
2759
3311
3973
4768
5722
6866
8239
9887
11864
14237
17084
20501
24601
29521
35425
42510
51012
61214
73457
88148
105778
126934
152321
18, 13

2045656
1813961
70496
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Write Latency
(micros)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Read Latency
(micros)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
17
95
84
174
53082
318074
423140
382926
365670
414824
442701
335862
302920
236448
171726
122880
90413
66682
53385
39121
26828
18930
12517
8269
6049
4614
5868
6167
2879
2054
8913
4429
1541
560
192
59
19
0

Partition Size
(bytes)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
47
0
860
392
46
30325
4082
97224
11843
15160
23484
34799
29619
53155
30702
18627
47739
61853
28875
24391
14450
11112
6609
6654
4986
3352
2465
1607
809
523
333
262

Cell Count
0
0
0
0
0
0
0
0
47
860
393
50
0
21
34489
32
97226
24490
47077
94761
32559
33885
37051
48429
23272
22459
17953
16178
7123
7836
7904
3552
3525
1998
1411
757
518
294
254
162
89
62
54
23
12
3
2
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0

#CASSANDRAEU

• Tuned data disk
• Compactions better
• 1 less seek overall
• Further tuning made it
even better!

What about the partition
size?

Partition Size

#CASSANDRAEU

•Tuning is an option based on size in bytes
•All about the reads
•index_interval
•How many samples taken
•Lower for faster access but more memory usage
•column_index_size_in_kb
•Add column indexes to a row when the data
reaches this size

•Partial row reads? Maybe smaller.

Tuning results
•Spent a lot of time tuning disk
•Played with
• index_interval (Lowered)
• concurrent_reads (Increased)
• column_index_size_in_kb (Lowered)
220 Million Ops/Day
10000 Transactions/Sec Peak
9ms at 95th percentile. Measured at the application!


#CASSANDRAEU

Offset
1
2
3
4
5
6
7
8
10
12
14
17
20
24
29
35
42
50
60
72
86
103
124
149
179
215
258
310
372
446
535
642
770
924
1109
1331
1597
1916
2299
2759
3311
3973
4768
5722
6866
8239
9887
11864
14237
17084
20501
24601
29521
35425
42510
51012


SSTables
27425403
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Write Latency
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Read Latency
0
0
0
1
24
56
92
283
2834
11954
32621
135311
314195
610665
536736
162541
25277
7847
5864
9580
5517
3822
1850
394
253
305
4657297
12748409
7475534
263549
217171
41908
24876
13566
10875
9379
7111
5333
5072
3987
5290
5169
2867
2093
3177
2161
1552
1200
834
1380
6219
4977
2114
6479
18417
5532

Row Size
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1218345
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Column Count
0
0
0
0
0
0
0
0
0
0
1218345
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

#CASSANDRAEU

• The two hump problem
• Reads awesome until
• Compaction!

• Solution:
• Throttle down compaction
• Tune disk
• Ignore it

Disk + Data Model
•Understand the internals
• Size of partition
• Compaction

•Learn how to measure
•Load test


#CASSANDRAEU

#CASSANDRAEU

Thank you! Time for questions...

*More? My data modeling talks:
The Data Model is Dead, Long Live the Data Model
Become a Super Modeler
The World's Next Top Data Model


Cassandra EU - Data model on fire

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Cassandra EU - Data model on fire

Similar to Cassandra EU - Data model on fire (20)

More from Patrick McFadin

More from Patrick McFadin (9)

Recently uploaded

Recently uploaded (20)

Cassandra EU - Data model on fire