Data Model on Fire

Data Model On Fire
Patrick McFadin
Chief Evangelist/Solution Architect - DataStax
@PatrickMcFadin | Chief Evangelist DataStax
Patrick McFadin
@PatrickMcFadin

©2013 DataStax Confidential. Do not distribute without consent.
Data Model is King
•With 2.0 we now have more choices
•Sometimes the data model is only the first part
•Understanding the underlying engine helps
•You aren’t done until you tune
Load test baby!
Light Weight Transactions
The race is on
Process 1

Process 2

SELECT firstName, lastName!
FROM users!
WHERE username = 'pmcfadin';

T0
T1

(0 rows)

SELECT firstName, lastName!
FROM users!
WHERE username = 'pmcfadin';

(0 rows)

INSERT INTO users (username, firstname, !
lastname, email, password, created_date)!
VALUES ('pmcfadin','Patrick','McFadin',!
['patrick@datastax.com'],!
'ba27e03fd95e507daf2937c937d499ab',!
'2011-06-20 13:50:00');

Got nothing! Good to go!

T2

T3
This one wins

INSERT INTO users (username, firstname, !
lastname, email, password, created_date)!
VALUES ('pmcfadin','Paul','McFadin',!
['paul@oracle.com'],!
'ea24e13ad95a209ded8912e937d499de',!
'2011-06-20 13:51:00');
Solution LWT
Process 1

INSERT INTO users (username, firstname, !
lastname, email, password, created_date)!
VALUES ('pmcfadin','Patrick','McFadin',!
['patrick@datastax.com'],!
'ba27e03fd95e507daf2937c937d499ab',!
'2011-06-20 13:50:00')!
IF NOT EXISTS;

[applied]!
-----------!
True

T0

T1

•Check performed for record
•Paxos ensures exclusive access
•applied = true: Success
Solution LWT
Process 2
T2

T3

INSERT INTO users (username, firstname, !
lastname, email, password, created_date)!
VALUES ('pmcfadin','Paul','McFadin',!
['paul@oracle.com'],!
'ea24e13ad95a209ded8912e937d499de',!
'2011-06-20 13:51:00')!
IF NOT EXISTS;

[applied] | username | created_date
| firstname | lastname !
-----------+----------+--------------------------+-----------+----------!
False | pmcfadin | 2011-06-20 13:50:00-0700 |
Patrick | McFadin

•applied = false: Rejected
•No record stomping!
LWT Fine Print
•Light Weight Transactions solve edge conditions
•They have latency cost.
• Be aware
• Load test
• Consider in your data model
!

•Now go shut down that ZooKeeper mess you have!
Form Versioning: Revisited
Form Versioning Pt 1
•From “Next top data model”
•Great idea, but edge conditions
CREATE TABLE working_version (!
!
username varchar,!
!
form_id int,!
!
version_number int,!
!
locked_by varchar,!
!
form_attributes map<varchar,varchar> !
!
PRIMARY KEY ((username, form_id), version_number)!
) WITH CLUSTERING ORDER BY (version_number DESC);

•Each user has a form
•Each form needs versioning
•Need an exclusive lock on the form
Form Versioning Pt 1
1. Insert first version
INSERT INTO working_version !
(username, form_id, version_number, locked_by, form_attributes)!
VALUES ('pmcfadin',1138,1,'',!
{'FirstName<text>':'First Name: ',!
'LastName<text>':'Last Name: ',!
'EmailAddress<text>':'Email Address: ',!
'Newsletter<radio>':'Y,N'});

2. Lock for one user

Danger Zone

UPDATE working_version !
SET locked_by = 'pmcfadin'!
WHERE username = 'pmcfadin'!
AND form_id = 1138!
AND version_number = 1;

3. Insert new version. Release lock
INSERT INTO working_version !
(username, form_id, version_number, locked_by, form_attributes)!
VALUES ('pmcfadin',1138,2,null,!
{'FirstName<text>':'First Name: ',!
'LastName<text>':'Last Name: ',!
'EmailAddress<text>':'Email Address: ',!
'Newsletter<checkbox>':'Y'});
Form Versioning Pt 2
1. Insert first version
INSERT INTO working_version !
(username, form_id, version_number, locked_by, form_attributes)!
VALUES ('pmcfadin',1138,1,'pmcfadin',!
{'FirstName<text>':'First Name: ',!
'LastName<text>':'Last Name: ',!
'EmailAddress<text>':'Email Address: ',!
'Newsletter<radio>':'Y,N'})!
IF NOT EXISTS;

Exclusive lock
UPDATE working_version !
SET form_attributes['EmailAddress<text>'] = 'Primary Email Address: '!
WHERE username = 'pmcfadin'!
AND form_id = 1138!
AND version_number = 1!
IF locked_by = 'pmcfadin';

Accepted

UPDATE working_version !
SET form_attributes['EmailAddress<text>'] = 'Email Adx: '!
WHERE username = 'pmcfadin'!
AND form_id = 1138!
AND version_number = 1!
IF locked_by = 'dude';

Rejected
(sorry dude)
Form Versioning Pt 2
•Old way: Edge cases with problems
• Use external locking?
• Take your chances?
!

•New way: Managed expectations (LWT)
• Exclusive by existence check
• Continued with IF clause
• Downside: More latency
Fire: Bring it
Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like
Single pass compaction
Hints to reduce SSTable reads
Faster index reads from off-heap
Why is this important?
•Reducing SStable reads mean less seeks
•Disk seeks can add up fast
•5 seeks on SATA = 60ms of just disk!
Avg Access Time*

Rotation Speed

12ms

7200 RPM

7ms

10k RPM

5ms

15k RPM

.04ms

SSD

Shared storage == Great sadness
* Source: www.tomshardware.com
Quick Diversion
•cfhistograms is your friend
•Histograms of statistics per table
•Collected...
• per read
• per write
• SSTable flush
• Compaction
nodetool cfhistograms <keyspace> <table>
How do I even read this thing!
Histograms How to

nodetool cfhistograms videodb users!

!

videodb/users histograms!
Offset
SSTables
Write Latency
(micros)
1
107
0
2
0
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

Partition Size
(bytes)!
0
0
0
0
0
5

Cell Count!

•Unit-less column
•Units are assigned by each column
•Numerical buckets

0!
0!
5!
0!
0!
0
Histograms How to

nodetool cfhistograms videodb users!

!

videodb/users histograms!
Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

Partition Size
(bytes)!
0
0
0
0
0
5

•Per read. How many seeks?
•Offset is number of SSTables read
•Less == lower read latency
•107 reads took 1 seek to satisfy

Cell Count!
0!
0!
5!
0!
0!
0
Histograms How to

nodetool cfhistograms videodb users!

!

videodb/users histograms!
Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

•Per write. How fast?
•Offset is microseconds

Partition Size
(bytes)!
0
0
0
0
0
5

Cell Count!
0!
0!
5!
0!
0!
0
Histograms How to

nodetool cfhistograms videodb users!

!

videodb/users histograms!
Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

•Per read. How fast?
•Offset is microseconds

Partition Size
(bytes)!
0
0
0
0
0
5

Cell Count!
0!
0!
5!
0!
0!
0
Histograms How to

nodetool cfhistograms videodb users!

!

videodb/users histograms!
Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

Partition Size
(bytes)!
0
0
0
0
0
5

•Per partition (storage row)
•Offset is size in bytes
•5 partitions are 1250 bytes

Cell Count!
0!
0!
5!
0!
0!
0
Histograms How to

nodetool cfhistograms videodb users!

!

videodb/users histograms!
Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

Partition Size
(bytes)!
0
0
0
0
0
5

•Per partition (storage row)
•Offset is count of cells in partition
•5 partitions have 10 cells

Cell Count!
0!
0!
5!
0!
0!
0
Histograms + Data Model
•Your data model is the key to success
•How do you ensure that?
Test
Measure
Repeat
Real World Example
•Real Customer
•Needed very tight SLA on reads

Problem

•Read response highly variable
•Loading data increases latency
Offset

SSTables

1
2
3
4
5
6
7
8
10
12
14
17
20
24
29
35
42
50
60
72
86
103
124
149
179
215
258
310
372
446
535
642
770
924
1109
1331
1597
1916
2299
2759
3311
3973
4768
5722
6866
8239
9887
11864
14237
17084
20501
24601
29521
35425
42510
51012
61214
73457
88148
105778
126934

2016550
2064495
434526
51084
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Write Latency
(micros)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Read Latency
(micros)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3
18
47
71
141
67
36466
263829
608488
209549
398845
625099
462636
499920
380787
285323
202417
148920
106452
81533
55470
43512
30810
22375
15148
12047
11298
9652
6715
13788
15322
8585
5041
2892
1543
900
486

Partition Size
(bytes)!
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1629
0
2971
1468
59
45105
5731
132391
16265
20015
30980
44973
38502
69479
39218
23027
58498
73629
33444
28321
17021
13072
7790
7764
5890
4046
2973
1954
936
661
409

Cell Count!
0!
0!
0!
0!
0!
0!
0!
0!
1629!
2971!
1286!
68!
188!
101!
50799!
269!
132414!
32943!
62099!
116855!
41562!
42796!
46719!
57693!
27659!
26941!
21589!
19494!
8681!
9499!
9360!
4349!
4242!
2422!
1685!
954!
610!
366!
303!
188!
106!
64!
55!
23!
15!
3!
2!
0!
1!
0!
0!
3!
0!
0!
0!
0!
0!
0!
0!
0!
0!

• Compactions behind
• Disk IO problems
• How to optimize?
Offset

Less
seeks

2 ms!

SSTables

1
2
3
4
5
6
7
8
10
12
14
17
20
24
29
35
42
50
60
72
86
103
124
149
179
215
258
310
372
446
535
642
770
924
1109
1331
1597
1916
2299
2759
3311
3973
4768
5722
6866
8239
9887
11864
14237
17084
20501
24601
29521
35425
42510
51012
61214
73457
88148
105778
126934

2045656
1813961
70496
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Write Latency
(micros)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Read Latency
(micros)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
17
95
84
174
53082
318074
423140
382926
365670
414824
442701
335862
302920
236448
171726
122880
90413
66682
53385
39121
26828
18930
12517
8269
6049
4614
5868
6167
2879
2054
8913
4429
1541
560
192
59
19

Partition Size
(bytes)!
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
47
0
860
392
46
30325
4082
97224
11843
15160
23484
34799
29619
53155
30702
18627
47739
61853
28875
24391
14450
11112
6609
6654
4986
3352
2465
1607
809
523
333

Cell Count!
0!
0!
0!
0!
0!
0!
0!
0!
47!
860!
393!
50!
0!
21!
34489!
32!
97226!
24490!
47077!
94761!
32559!
33885!
37051!
48429!
23272!
22459!
17953!
16178!
7123!
7836!
7904!
3552!
3525!
1998!
1411!
757!
518!
294!
254!
162!
89!
62!
54!
23!
12!
3!
2!
0!
1!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!

• Tuned data disk
• Compactions better
• 1 less seek overall
• Further tuning made it
even better!

What about the partition
size?
Partition Size
•Tuning is an option based on size in bytes
•All about the reads
•index_interval
•How many samples taken
•Lower for faster access but more memory usage
•column_index_size_in_kb
•Add column indexes to a row when the data
reaches this size

•Partial row reads? Maybe smaller.
Tuning results
•Spent a lot of time tuning disk
•Played with
• index_interval (Lowered)
• concurrent_reads (Increased)
• column_index_size_in_kb (Lowered)
220 Million Ops/Day
10000 Transactions/Sec Peak
9ms at 95th percentile. Measured at the application!
Offset
1
2
3
4
5
6
7
8
10
12
14
17
20
24
29
35
42
50
60
72
86
103
124
149
179
215
258
310
372
446
535
642
770
924
1109
1331
1597
1916
2299
2759
3311
3973
4768
5722
6866
8239
9887
11864
14237
17084
20501
24601
29521
35425
42510
51012

SSTables
27425403
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Write Latency
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Read Latency
0
0
0
1
24
56
92
283
2834
11954
32621
135311
314195
610665
536736
162541
25277
7847
5864
9580
5517
3822
1850
394
253
305
4657297
12748409
7475534
263549
217171
41908
24876
13566
10875
9379
7111
5333
5072
3987
5290
5169
2867
2093
3177
2161
1552
1200
834
1380
6219
4977
2114
6479
18417
5532

Row Size
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1218345
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Column Count!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
1218345!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0!
0

• The two hump problem
• Reads awesome until…
• Reading from disk
!
!

• Solution:
• Throttle down compaction
• Tune disk
• Ignore it
Disk + Data Model
•Understand the internals
• Size of partition
• Compaction

•Learn how to measure
•Load test
Thank you! Time for questions...

*More? My data modeling talks:
The Data Model is Dead, Long Live the Data Model
Become a Super Modeler
The World's Next Top Data Model
!

Cassandra Community Webinar | Data Model on Fire