#CASSANDRAEU

Data Model on Fire

Patrick McFadin | Chief Evangelist DataStax
@PatrickMcFadin

Friday, October 18, 13
Data Model is King
•With 2.0 we now have more choices
•Sometimes the data model is only the first part
•Understanding the underlying engine helps
•You aren’t done until you tune
Load test baby!

Friday, October 18, 13

#CASSANDRAEU
Light Weight Transactions

Friday, October 18, 13
The race is on
Process 1

#CASSANDRAEU

Process 2

SELECT firstName, lastName
FROM users
WHERE username = 'pmcfadin';

T0
T1

(0 rows)

SELECT firstName, lastName
FROM users
WHERE username = 'pmcfadin';

(0 rows)

INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Patrick','McFadin',
['patrick@datastax.com'],
'ba27e03fd95e507daf2937c937d499ab',
'2011-06-20 13:50:00');

Got nothing! Good to go!

T2

T3
This one wins

Friday, October 18, 13

INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Paul','McFadin',
['paul@oracle.com'],
'ea24e13ad95a209ded8912e937d499de',
'2011-06-20 13:51:00');
Solution LWT

#CASSANDRAEU

Process 1

INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Patrick','McFadin',
['patrick@datastax.com'],
'ba27e03fd95e507daf2937c937d499ab',
'2011-06-20 13:50:00')
IF NOT EXISTS;

[applied]
----------True

T0

T1

•Check performed for record
•Paxos ensures exclusive access
•applied = true: Success
Friday, October 18, 13
Solution LWT
Process 2
T2

T3

INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Paul','McFadin',
['paul@oracle.com'],
'ea24e13ad95a209ded8912e937d499de',
'2011-06-20 13:51:00')
IF NOT EXISTS;

[applied] | username | created_date
| firstname | lastname
-----------+----------+--------------------------+-----------+---------False | pmcfadin | 2011-06-20 13:50:00-0700 |
Patrick | McFadin

•applied = false: Rejected
•No record stomping!
Friday, October 18, 13

#CASSANDRAEU
LWT Fine Print

#CASSANDRAEU

•Light Weight Transactions solve edge conditions
•They have latency cost.
• Be aware
• Load test
• Consider in your data model

•Now go shut down that ZooKeeper mess you have!

Friday, October 18, 13
Form Versioning: Revisited

Friday, October 18, 13
Form Versioning Pt 1
•From “Next top data model”
•Great idea, but edge conditions
CREATE TABLE working_version (
!
username varchar,
!
form_id int,
!
version_number int,
!
locked_by varchar,
!
form_attributes map<varchar,varchar>
!
PRIMARY KEY ((username, form_id), version_number)
) WITH CLUSTERING ORDER BY (version_number DESC);

•Each user has a form
•Each form needs versioning
•Need an exclusive lock on the form
Friday, October 18, 13

#CASSANDRAEU
Form Versioning Pt 1
1. Insert first version
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,1,'',
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<radio>':'Y,N'});

2. Lock for one user

Danger Zone

UPDATE working_version
SET locked_by = 'pmcfadin'
WHERE username = 'pmcfadin'
AND form_id = 1138
AND version_number = 1;

3. Insert new version. Release lock
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,2,null,
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<checkbox>':'Y'});

Friday, October 18, 13

#CASSANDRAEU
Form Versioning Pt 2

#CASSANDRAEU

1. Insert first version
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,1,'pmcfadin',
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<radio>':'Y,N'})
IF NOT EXISTS;

Exclusive lock
UPDATE working_version
SET form_attributes['EmailAddress<text>'] = 'Primary Email Address: '
WHERE username = 'pmcfadin'
AND form_id = 1138
AND version_number = 1
IF locked_by = 'pmcfadin';

Accepted

UPDATE working_version
SET form_attributes['EmailAddress<text>'] = 'Email Adx: '
WHERE username = 'pmcfadin'
AND form_id = 1138
AND version_number = 1
IF locked_by = 'dude';

Rejected
(sorry dude)

Friday, October 18, 13
Form Versioning Pt 2
•Old way: Edge cases with problems
• Use external locking?
• Take your chances?

•New way: Managed expectations (LWT)
• Exclusive by existence check
• Continued with IF clause
• Downside: More latency

Friday, October 18, 13

#CASSANDRAEU
Fire: Bring it

Friday, October 18, 13
Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like

Friday, October 18, 13

#CASSANDRAEU
Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like
Single pass compaction

Friday, October 18, 13

#CASSANDRAEU
Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like
Single pass compaction
Hints to reduce SSTable reads

Friday, October 18, 13

#CASSANDRAEU
Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like
Single pass compaction
Hints to reduce SSTable reads
Faster index reads from off-heap

Friday, October 18, 13

#CASSANDRAEU
Why is this important?
•Reducing SStable reads mean less seeks
•Disk seeks can add up fast
•5 seeks on SATA = 60ms of just disk!
Avg Access Time*

Rotation Speed

12ms

7200 RPM

7ms

10k RPM

5ms

15k RPM

.04ms

SSD

* Source: www.tomshardware.com

Friday, October 18, 13

#CASSANDRAEU
Why is this important?
•Reducing SStable reads mean less seeks
•Disk seeks can add up fast
•5 seeks on SATA = 60ms of just disk!
Avg Access Time*

Rotation Speed

12ms

7200 RPM

7ms

10k RPM

5ms

15k RPM

.04ms

SSD

Shared storage == Great sadness
* Source: www.tomshardware.com

Friday, October 18, 13

#CASSANDRAEU
Quick Diversion

#CASSANDRAEU

•cfhistograms is your friend
•Histograms of statistics per table
•Collected...
• per read
• per write
• SSTable flush
• Compaction
nodetool cfhistograms <keyspace> <table>

Friday, October 18, 13
#CASSANDRAEU

How do I even read this thing!

Friday, October 18, 13
Histograms How to

#CASSANDRAEU

nodetool cfhistograms videodb users
videodb/users histograms
Offset
SSTables
Write Latency
(micros)
1
107
0
2
0
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

Partition Size
(bytes)
0
0
0
0
0
5

Cell Count

•Unit-less column
•Units are assigned by each column
•Numerical buckets
Friday, October 18, 13

0
0
5
0
0
0
Histograms How to

#CASSANDRAEU

nodetool cfhistograms videodb users
videodb/users histograms
Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

Partition Size
(bytes)
0
0
0
0
0
5

•Per read. How many seeks?
•Offset is number of SSTables read
•Less == lower read latency
•107 reads took 1 seek to satisfy
Friday, October 18, 13

Cell Count
0
0
5
0
0
0
Histograms How to

#CASSANDRAEU

nodetool cfhistograms videodb users
videodb/users histograms
Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

•Per write. How fast?
•Offset is microseconds

Friday, October 18, 13

Partition Size
(bytes)
0
0
0
0
0
5

Cell Count
0
0
5
0
0
0
Histograms How to

#CASSANDRAEU

nodetool cfhistograms videodb users
videodb/users histograms
Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

•Per read. How fast?
•Offset is microseconds

Friday, October 18, 13

Partition Size
(bytes)
0
0
0
0
0
5

Cell Count
0
0
5
0
0
0
Histograms How to

#CASSANDRAEU

nodetool cfhistograms videodb users
videodb/users histograms
Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

Partition Size
(bytes)
0
0
0
0
0
5

•Per partition (storage row)
•Offset is size in bytes
•5 partitions are 1250 bytes
Friday, October 18, 13

Cell Count
0
0
5
0
0
0
Histograms How to

#CASSANDRAEU

nodetool cfhistograms videodb users
videodb/users histograms
Offset
SSTables
Write Latency
(micros)
1
107
0
2
2
0
10
0
0
250
0
5
800
0
10
1250
0
0

Read Latency
(micros)
0
0
0
0
50
300

Partition Size
(bytes)
0
0
0
0
0
5

•Per partition (storage row)
•Offset is count of cells in partition
•5 partitions have 10 cells
Friday, October 18, 13

Cell Count
0
0
5
0
0
0
Histograms + Data Model
•Your data model is the key to success
•How do you ensure that?
Test
Measure
Repeat

Friday, October 18, 13

#CASSANDRAEU
Real World Example
•Real Customer
•Needed very tight SLA on reads

Problem

•Read response highly variable
•Loading data increases latency

Friday, October 18, 13

#CASSANDRAEU
Offset

Friday, October

SSTables

1
2
3
4
5
6
7
8
10
12
14
17
20
24
29
35
42
50
60
72
86
103
124
149
179
215
258
310
372
446
535
642
770
924
1109
1331
1597
1916
2299
2759
3311
3973
4768
5722
6866
8239
9887
11864
14237
17084
20501
24601
29521
35425
42510
51012
61214
73457
88148
105778
126934
152321
18, 13

2016550
2064495
434526
51084
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Write Latency
(micros)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Read Latency
(micros)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3
18
47
71
141
67
36466
263829
608488
209549
398845
625099
462636
499920
380787
285323
202417
148920
106452
81533
55470
43512
30810
22375
15148
12047
11298
9652
6715
13788
15322
8585
5041
2892
1543
900
486
285

Partition Size
(bytes)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1629
0
2971
1468
59
45105
5731
132391
16265
20015
30980
44973
38502
69479
39218
23027
58498
73629
33444
28321
17021
13072
7790
7764
5890
4046
2973
1954
936
661
409
289

Cell Count
0
0
0
0
0
0
0
0
1629
2971
1286
68
188
101
50799
269
132414
32943
62099
116855
41562
42796
46719
57693
27659
26941
21589
19494
8681
9499
9360
4349
4242
2422
1685
954
610
366
303
188
106
64
55
23
15
3
2
0
1
0
0
3
0
0
0
0
0
0
0
0
0
0

#CASSANDRAEU

• Compactions behind
• Disk IO problems
• How to optimize?
Offset

Less
seeks

2 ms!

Friday, October

SSTables

1
2
3
4
5
6
7
8
10
12
14
17
20
24
29
35
42
50
60
72
86
103
124
149
179
215
258
310
372
446
535
642
770
924
1109
1331
1597
1916
2299
2759
3311
3973
4768
5722
6866
8239
9887
11864
14237
17084
20501
24601
29521
35425
42510
51012
61214
73457
88148
105778
126934
152321
18, 13

2045656
1813961
70496
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Write Latency
(micros)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Read Latency
(micros)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
17
95
84
174
53082
318074
423140
382926
365670
414824
442701
335862
302920
236448
171726
122880
90413
66682
53385
39121
26828
18930
12517
8269
6049
4614
5868
6167
2879
2054
8913
4429
1541
560
192
59
19
0

Partition Size
(bytes)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
47
0
860
392
46
30325
4082
97224
11843
15160
23484
34799
29619
53155
30702
18627
47739
61853
28875
24391
14450
11112
6609
6654
4986
3352
2465
1607
809
523
333
262

Cell Count
0
0
0
0
0
0
0
0
47
860
393
50
0
21
34489
32
97226
24490
47077
94761
32559
33885
37051
48429
23272
22459
17953
16178
7123
7836
7904
3552
3525
1998
1411
757
518
294
254
162
89
62
54
23
12
3
2
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0

#CASSANDRAEU

• Tuned data disk
• Compactions better
• 1 less seek overall
• Further tuning made it
even better!

What about the partition
size?
Partition Size

#CASSANDRAEU

•Tuning is an option based on size in bytes
•All about the reads
•index_interval
•How many samples taken
•Lower for faster access but more memory usage
•column_index_size_in_kb
•Add column indexes to a row when the data
reaches this size

•Partial row reads? Maybe smaller.
Friday, October 18, 13
Tuning results
•Spent a lot of time tuning disk
•Played with
• index_interval (Lowered)
• concurrent_reads (Increased)
• column_index_size_in_kb (Lowered)
220 Million Ops/Day
10000 Transactions/Sec Peak
9ms at 95th percentile. Measured at the application!

Friday, October 18, 13

#CASSANDRAEU
Offset
1
2
3
4
5
6
7
8
10
12
14
17
20
24
29
35
42
50
60
72
86
103
124
149
179
215
258
310
372
446
535
642
770
924
1109
1331
1597
1916
2299
2759
3311
3973
4768
5722
6866
8239
9887
11864
14237
17084
20501
24601
29521
35425
42510
51012

Friday, October 18, 13

SSTables
27425403
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Write Latency
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Read Latency
0
0
0
1
24
56
92
283
2834
11954
32621
135311
314195
610665
536736
162541
25277
7847
5864
9580
5517
3822
1850
394
253
305
4657297
12748409
7475534
263549
217171
41908
24876
13566
10875
9379
7111
5333
5072
3987
5290
5169
2867
2093
3177
2161
1552
1200
834
1380
6219
4977
2114
6479
18417
5532

Row Size
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1218345
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Column Count
0
0
0
0
0
0
0
0
0
0
1218345
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

#CASSANDRAEU

• The two hump problem
• Reads awesome until
• Compaction!

• Solution:
• Throttle down compaction
• Tune disk
• Ignore it
Disk + Data Model
•Understand the internals
• Size of partition
• Compaction

•Learn how to measure
•Load test

Friday, October 18, 13

#CASSANDRAEU
#CASSANDRAEU

Thank you! Time for questions...

*More? My data modeling talks:
The Data Model is Dead, Long Live the Data Model
Become a Super Modeler
The World's Next Top Data Model

Friday, October 18, 13

Cassandra EU - Data model on fire

  • 1.
    #CASSANDRAEU Data Model onFire Patrick McFadin | Chief Evangelist DataStax @PatrickMcFadin Friday, October 18, 13
  • 2.
    Data Model isKing •With 2.0 we now have more choices •Sometimes the data model is only the first part •Understanding the underlying engine helps •You aren’t done until you tune Load test baby! Friday, October 18, 13 #CASSANDRAEU
  • 3.
  • 4.
    The race ison Process 1 #CASSANDRAEU Process 2 SELECT firstName, lastName FROM users WHERE username = 'pmcfadin'; T0 T1 (0 rows) SELECT firstName, lastName FROM users WHERE username = 'pmcfadin'; (0 rows) INSERT INTO users (username, firstname, lastname, email, password, created_date) VALUES ('pmcfadin','Patrick','McFadin', ['patrick@datastax.com'], 'ba27e03fd95e507daf2937c937d499ab', '2011-06-20 13:50:00'); Got nothing! Good to go! T2 T3 This one wins Friday, October 18, 13 INSERT INTO users (username, firstname, lastname, email, password, created_date) VALUES ('pmcfadin','Paul','McFadin', ['paul@oracle.com'], 'ea24e13ad95a209ded8912e937d499de', '2011-06-20 13:51:00');
  • 5.
    Solution LWT #CASSANDRAEU Process 1 INSERTINTO users (username, firstname, lastname, email, password, created_date) VALUES ('pmcfadin','Patrick','McFadin', ['patrick@datastax.com'], 'ba27e03fd95e507daf2937c937d499ab', '2011-06-20 13:50:00') IF NOT EXISTS; [applied] ----------True T0 T1 •Check performed for record •Paxos ensures exclusive access •applied = true: Success Friday, October 18, 13
  • 6.
    Solution LWT Process 2 T2 T3 INSERTINTO users (username, firstname, lastname, email, password, created_date) VALUES ('pmcfadin','Paul','McFadin', ['paul@oracle.com'], 'ea24e13ad95a209ded8912e937d499de', '2011-06-20 13:51:00') IF NOT EXISTS; [applied] | username | created_date | firstname | lastname -----------+----------+--------------------------+-----------+---------False | pmcfadin | 2011-06-20 13:50:00-0700 | Patrick | McFadin •applied = false: Rejected •No record stomping! Friday, October 18, 13 #CASSANDRAEU
  • 7.
    LWT Fine Print #CASSANDRAEU •LightWeight Transactions solve edge conditions •They have latency cost. • Be aware • Load test • Consider in your data model •Now go shut down that ZooKeeper mess you have! Friday, October 18, 13
  • 8.
  • 9.
    Form Versioning Pt1 •From “Next top data model” •Great idea, but edge conditions CREATE TABLE working_version ( ! username varchar, ! form_id int, ! version_number int, ! locked_by varchar, ! form_attributes map<varchar,varchar> ! PRIMARY KEY ((username, form_id), version_number) ) WITH CLUSTERING ORDER BY (version_number DESC); •Each user has a form •Each form needs versioning •Need an exclusive lock on the form Friday, October 18, 13 #CASSANDRAEU
  • 10.
    Form Versioning Pt1 1. Insert first version INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes) VALUES ('pmcfadin',1138,1,'', {'FirstName<text>':'First Name: ', 'LastName<text>':'Last Name: ', 'EmailAddress<text>':'Email Address: ', 'Newsletter<radio>':'Y,N'}); 2. Lock for one user Danger Zone UPDATE working_version SET locked_by = 'pmcfadin' WHERE username = 'pmcfadin' AND form_id = 1138 AND version_number = 1; 3. Insert new version. Release lock INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes) VALUES ('pmcfadin',1138,2,null, {'FirstName<text>':'First Name: ', 'LastName<text>':'Last Name: ', 'EmailAddress<text>':'Email Address: ', 'Newsletter<checkbox>':'Y'}); Friday, October 18, 13 #CASSANDRAEU
  • 11.
    Form Versioning Pt2 #CASSANDRAEU 1. Insert first version INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes) VALUES ('pmcfadin',1138,1,'pmcfadin', {'FirstName<text>':'First Name: ', 'LastName<text>':'Last Name: ', 'EmailAddress<text>':'Email Address: ', 'Newsletter<radio>':'Y,N'}) IF NOT EXISTS; Exclusive lock UPDATE working_version SET form_attributes['EmailAddress<text>'] = 'Primary Email Address: ' WHERE username = 'pmcfadin' AND form_id = 1138 AND version_number = 1 IF locked_by = 'pmcfadin'; Accepted UPDATE working_version SET form_attributes['EmailAddress<text>'] = 'Email Adx: ' WHERE username = 'pmcfadin' AND form_id = 1138 AND version_number = 1 IF locked_by = 'dude'; Rejected (sorry dude) Friday, October 18, 13
  • 12.
    Form Versioning Pt2 •Old way: Edge cases with problems • Use external locking? • Take your chances? •New way: Managed expectations (LWT) • Exclusive by existence check • Continued with IF clause • Downside: More latency Friday, October 18, 13 #CASSANDRAEU
  • 13.
    Fire: Bring it Friday,October 18, 13
  • 14.
    Cassandra 2.0 Fire •Greatchanges in both 1.2 and 2.0 for perf •Three big changes in 2.0 I like Friday, October 18, 13 #CASSANDRAEU
  • 15.
    Cassandra 2.0 Fire •Greatchanges in both 1.2 and 2.0 for perf •Three big changes in 2.0 I like Single pass compaction Friday, October 18, 13 #CASSANDRAEU
  • 16.
    Cassandra 2.0 Fire •Greatchanges in both 1.2 and 2.0 for perf •Three big changes in 2.0 I like Single pass compaction Hints to reduce SSTable reads Friday, October 18, 13 #CASSANDRAEU
  • 17.
    Cassandra 2.0 Fire •Greatchanges in both 1.2 and 2.0 for perf •Three big changes in 2.0 I like Single pass compaction Hints to reduce SSTable reads Faster index reads from off-heap Friday, October 18, 13 #CASSANDRAEU
  • 18.
    Why is thisimportant? •Reducing SStable reads mean less seeks •Disk seeks can add up fast •5 seeks on SATA = 60ms of just disk! Avg Access Time* Rotation Speed 12ms 7200 RPM 7ms 10k RPM 5ms 15k RPM .04ms SSD * Source: www.tomshardware.com Friday, October 18, 13 #CASSANDRAEU
  • 19.
    Why is thisimportant? •Reducing SStable reads mean less seeks •Disk seeks can add up fast •5 seeks on SATA = 60ms of just disk! Avg Access Time* Rotation Speed 12ms 7200 RPM 7ms 10k RPM 5ms 15k RPM .04ms SSD Shared storage == Great sadness * Source: www.tomshardware.com Friday, October 18, 13 #CASSANDRAEU
  • 20.
    Quick Diversion #CASSANDRAEU •cfhistograms isyour friend •Histograms of statistics per table •Collected... • per read • per write • SSTable flush • Compaction nodetool cfhistograms <keyspace> <table> Friday, October 18, 13
  • 21.
    #CASSANDRAEU How do Ieven read this thing! Friday, October 18, 13
  • 22.
    Histograms How to #CASSANDRAEU nodetoolcfhistograms videodb users videodb/users histograms Offset SSTables Write Latency (micros) 1 107 0 2 0 0 10 0 0 250 0 5 800 0 10 1250 0 0 Read Latency (micros) 0 0 0 0 50 300 Partition Size (bytes) 0 0 0 0 0 5 Cell Count •Unit-less column •Units are assigned by each column •Numerical buckets Friday, October 18, 13 0 0 5 0 0 0
  • 23.
    Histograms How to #CASSANDRAEU nodetoolcfhistograms videodb users videodb/users histograms Offset SSTables Write Latency (micros) 1 107 0 2 2 0 10 0 0 250 0 5 800 0 10 1250 0 0 Read Latency (micros) 0 0 0 0 50 300 Partition Size (bytes) 0 0 0 0 0 5 •Per read. How many seeks? •Offset is number of SSTables read •Less == lower read latency •107 reads took 1 seek to satisfy Friday, October 18, 13 Cell Count 0 0 5 0 0 0
  • 24.
    Histograms How to #CASSANDRAEU nodetoolcfhistograms videodb users videodb/users histograms Offset SSTables Write Latency (micros) 1 107 0 2 2 0 10 0 0 250 0 5 800 0 10 1250 0 0 Read Latency (micros) 0 0 0 0 50 300 •Per write. How fast? •Offset is microseconds Friday, October 18, 13 Partition Size (bytes) 0 0 0 0 0 5 Cell Count 0 0 5 0 0 0
  • 25.
    Histograms How to #CASSANDRAEU nodetoolcfhistograms videodb users videodb/users histograms Offset SSTables Write Latency (micros) 1 107 0 2 2 0 10 0 0 250 0 5 800 0 10 1250 0 0 Read Latency (micros) 0 0 0 0 50 300 •Per read. How fast? •Offset is microseconds Friday, October 18, 13 Partition Size (bytes) 0 0 0 0 0 5 Cell Count 0 0 5 0 0 0
  • 26.
    Histograms How to #CASSANDRAEU nodetoolcfhistograms videodb users videodb/users histograms Offset SSTables Write Latency (micros) 1 107 0 2 2 0 10 0 0 250 0 5 800 0 10 1250 0 0 Read Latency (micros) 0 0 0 0 50 300 Partition Size (bytes) 0 0 0 0 0 5 •Per partition (storage row) •Offset is size in bytes •5 partitions are 1250 bytes Friday, October 18, 13 Cell Count 0 0 5 0 0 0
  • 27.
    Histograms How to #CASSANDRAEU nodetoolcfhistograms videodb users videodb/users histograms Offset SSTables Write Latency (micros) 1 107 0 2 2 0 10 0 0 250 0 5 800 0 10 1250 0 0 Read Latency (micros) 0 0 0 0 50 300 Partition Size (bytes) 0 0 0 0 0 5 •Per partition (storage row) •Offset is count of cells in partition •5 partitions have 10 cells Friday, October 18, 13 Cell Count 0 0 5 0 0 0
  • 28.
    Histograms + DataModel •Your data model is the key to success •How do you ensure that? Test Measure Repeat Friday, October 18, 13 #CASSANDRAEU
  • 29.
    Real World Example •RealCustomer •Needed very tight SLA on reads Problem •Read response highly variable •Loading data increases latency Friday, October 18, 13 #CASSANDRAEU
  • 30.
    Offset Friday, October SSTables 1 2 3 4 5 6 7 8 10 12 14 17 20 24 29 35 42 50 60 72 86 103 124 149 179 215 258 310 372 446 535 642 770 924 1109 1331 1597 1916 2299 2759 3311 3973 4768 5722 6866 8239 9887 11864 14237 17084 20501 24601 29521 35425 42510 51012 61214 73457 88148 105778 126934 152321 18, 13 2016550 2064495 434526 51084 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 WriteLatency (micros) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Read Latency (micros) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 18 47 71 141 67 36466 263829 608488 209549 398845 625099 462636 499920 380787 285323 202417 148920 106452 81533 55470 43512 30810 22375 15148 12047 11298 9652 6715 13788 15322 8585 5041 2892 1543 900 486 285 Partition Size (bytes) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1629 0 2971 1468 59 45105 5731 132391 16265 20015 30980 44973 38502 69479 39218 23027 58498 73629 33444 28321 17021 13072 7790 7764 5890 4046 2973 1954 936 661 409 289 Cell Count 0 0 0 0 0 0 0 0 1629 2971 1286 68 188 101 50799 269 132414 32943 62099 116855 41562 42796 46719 57693 27659 26941 21589 19494 8681 9499 9360 4349 4242 2422 1685 954 610 366 303 188 106 64 55 23 15 3 2 0 1 0 0 3 0 0 0 0 0 0 0 0 0 0 #CASSANDRAEU • Compactions behind • Disk IO problems • How to optimize?
  • 31.
    Offset Less seeks 2 ms! Friday, October SSTables 1 2 3 4 5 6 7 8 10 12 14 17 20 24 29 35 42 50 60 72 86 103 124 149 179 215 258 310 372 446 535 642 770 924 1109 1331 1597 1916 2299 2759 3311 3973 4768 5722 6866 8239 9887 11864 14237 17084 20501 24601 29521 35425 42510 51012 61214 73457 88148 105778 126934 152321 18,13 2045656 1813961 70496 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Write Latency (micros) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Read Latency (micros) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 17 95 84 174 53082 318074 423140 382926 365670 414824 442701 335862 302920 236448 171726 122880 90413 66682 53385 39121 26828 18930 12517 8269 6049 4614 5868 6167 2879 2054 8913 4429 1541 560 192 59 19 0 Partition Size (bytes) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 47 0 860 392 46 30325 4082 97224 11843 15160 23484 34799 29619 53155 30702 18627 47739 61853 28875 24391 14450 11112 6609 6654 4986 3352 2465 1607 809 523 333 262 Cell Count 0 0 0 0 0 0 0 0 47 860 393 50 0 21 34489 32 97226 24490 47077 94761 32559 33885 37051 48429 23272 22459 17953 16178 7123 7836 7904 3552 3525 1998 1411 757 518 294 254 162 89 62 54 23 12 3 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 #CASSANDRAEU • Tuned data disk • Compactions better • 1 less seek overall • Further tuning made it even better! What about the partition size?
  • 32.
    Partition Size #CASSANDRAEU •Tuning isan option based on size in bytes •All about the reads •index_interval •How many samples taken •Lower for faster access but more memory usage •column_index_size_in_kb •Add column indexes to a row when the data reaches this size •Partial row reads? Maybe smaller. Friday, October 18, 13
  • 33.
    Tuning results •Spent alot of time tuning disk •Played with • index_interval (Lowered) • concurrent_reads (Increased) • column_index_size_in_kb (Lowered) 220 Million Ops/Day 10000 Transactions/Sec Peak 9ms at 95th percentile. Measured at the application! Friday, October 18, 13 #CASSANDRAEU
  • 34.
    Offset 1 2 3 4 5 6 7 8 10 12 14 17 20 24 29 35 42 50 60 72 86 103 124 149 179 215 258 310 372 446 535 642 770 924 1109 1331 1597 1916 2299 2759 3311 3973 4768 5722 6866 8239 9887 11864 14237 17084 20501 24601 29521 35425 42510 51012 Friday, October 18,13 SSTables 27425403 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Write Latency 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Read Latency 0 0 0 1 24 56 92 283 2834 11954 32621 135311 314195 610665 536736 162541 25277 7847 5864 9580 5517 3822 1850 394 253 305 4657297 12748409 7475534 263549 217171 41908 24876 13566 10875 9379 7111 5333 5072 3987 5290 5169 2867 2093 3177 2161 1552 1200 834 1380 6219 4977 2114 6479 18417 5532 Row Size 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1218345 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Column Count 0 0 0 0 0 0 0 0 0 0 1218345 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #CASSANDRAEU • The two hump problem • Reads awesome until • Compaction! • Solution: • Throttle down compaction • Tune disk • Ignore it
  • 35.
    Disk + DataModel •Understand the internals • Size of partition • Compaction •Learn how to measure •Load test Friday, October 18, 13 #CASSANDRAEU
  • 36.
    #CASSANDRAEU Thank you! Timefor questions... *More? My data modeling talks: The Data Model is Dead, Long Live the Data Model Become a Super Modeler The World's Next Top Data Model Friday, October 18, 13