Functional data models are great, but how can you squeeze out more performance and make them awesome? Let's talk through some example Cassandra 2.0 models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying Cassandra 2.0 internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
2. Data Model is King
•With 2.0 we now have more choices
•Sometimes the data model is only the first part
•Understanding the underlying engine helps
•You aren’t done until you tune
Load test baby!
4. The race is on
Process 1
Process 2
SELECT firstName, lastName!
FROM users!
WHERE username = 'pmcfadin';
T0
T1
(0 rows)
SELECT firstName, lastName!
FROM users!
WHERE username = 'pmcfadin';
(0 rows)
INSERT INTO users (username, firstname, !
lastname, email, password, created_date)!
VALUES ('pmcfadin','Patrick','McFadin',!
['patrick@datastax.com'],!
'ba27e03fd95e507daf2937c937d499ab',!
'2011-06-20 13:50:00');
Got nothing! Good to go!
T2
T3
This one wins
INSERT INTO users (username, firstname, !
lastname, email, password, created_date)!
VALUES ('pmcfadin','Paul','McFadin',!
['paul@oracle.com'],!
'ea24e13ad95a209ded8912e937d499de',!
'2011-06-20 13:51:00');
5. Solution LWT
Process 1
INSERT INTO users (username, firstname, !
lastname, email, password, created_date)!
VALUES ('pmcfadin','Patrick','McFadin',!
['patrick@datastax.com'],!
'ba27e03fd95e507daf2937c937d499ab',!
'2011-06-20 13:50:00')!
IF NOT EXISTS;
[applied]!
-----------!
True
T0
T1
•Check performed for record
•Paxos ensures exclusive access
•applied = true: Success
6. Solution LWT
Process 2
T2
T3
INSERT INTO users (username, firstname, !
lastname, email, password, created_date)!
VALUES ('pmcfadin','Paul','McFadin',!
['paul@oracle.com'],!
'ea24e13ad95a209ded8912e937d499de',!
'2011-06-20 13:51:00')!
IF NOT EXISTS;
[applied] | username | created_date
| firstname | lastname !
-----------+----------+--------------------------+-----------+----------!
False | pmcfadin | 2011-06-20 13:50:00-0700 |
Patrick | McFadin
•applied = false: Rejected
•No record stomping!
7. LWT Fine Print
•Light Weight Transactions solve edge conditions
•They have latency cost.
• Be aware
• Load test
• Consider in your data model
!
•Now go shut down that ZooKeeper mess you have!
9. Form Versioning Pt 1
•From “Next top data model”
•Great idea, but edge conditions
CREATE TABLE working_version (!
!
username varchar,!
!
form_id int,!
!
version_number int,!
!
locked_by varchar,!
!
form_attributes map<varchar,varchar> !
!
PRIMARY KEY ((username, form_id), version_number)!
) WITH CLUSTERING ORDER BY (version_number DESC);
•Each user has a form
•Each form needs versioning
•Need an exclusive lock on the form
10. Form Versioning Pt 1
1. Insert first version
INSERT INTO working_version !
(username, form_id, version_number, locked_by, form_attributes)!
VALUES ('pmcfadin',1138,1,'',!
{'FirstName<text>':'First Name: ',!
'LastName<text>':'Last Name: ',!
'EmailAddress<text>':'Email Address: ',!
'Newsletter<radio>':'Y,N'});
2. Lock for one user
Danger Zone
UPDATE working_version !
SET locked_by = 'pmcfadin'!
WHERE username = 'pmcfadin'!
AND form_id = 1138!
AND version_number = 1;
3. Insert new version. Release lock
INSERT INTO working_version !
(username, form_id, version_number, locked_by, form_attributes)!
VALUES ('pmcfadin',1138,2,null,!
{'FirstName<text>':'First Name: ',!
'LastName<text>':'Last Name: ',!
'EmailAddress<text>':'Email Address: ',!
'Newsletter<checkbox>':'Y'});
11. Form Versioning Pt 2
1. Insert first version
INSERT INTO working_version !
(username, form_id, version_number, locked_by, form_attributes)!
VALUES ('pmcfadin',1138,1,'pmcfadin',!
{'FirstName<text>':'First Name: ',!
'LastName<text>':'Last Name: ',!
'EmailAddress<text>':'Email Address: ',!
'Newsletter<radio>':'Y,N'})!
IF NOT EXISTS;
Exclusive lock
UPDATE working_version !
SET form_attributes['EmailAddress<text>'] = 'Primary Email Address: '!
WHERE username = 'pmcfadin'!
AND form_id = 1138!
AND version_number = 1!
IF locked_by = 'pmcfadin';
Accepted
UPDATE working_version !
SET form_attributes['EmailAddress<text>'] = 'Email Adx: '!
WHERE username = 'pmcfadin'!
AND form_id = 1138!
AND version_number = 1!
IF locked_by = 'dude';
Rejected
(sorry dude)
12. Form Versioning Pt 2
•Old way: Edge cases with problems
• Use external locking?
• Take your chances?
!
•New way: Managed expectations (LWT)
• Exclusive by existence check
• Continued with IF clause
• Downside: More latency
14. Cassandra 2.0 Fire
•Great changes in both 1.2 and 2.0 for perf
•Three big changes in 2.0 I like
Single pass compaction
Hints to reduce SSTable reads
Faster index reads from off-heap
15. Why is this important?
•Reducing SStable reads mean less seeks
•Disk seeks can add up fast
•5 seeks on SATA = 60ms of just disk!
Avg Access Time*
Rotation Speed
12ms
7200 RPM
7ms
10k RPM
5ms
15k RPM
.04ms
SSD
Shared storage == Great sadness
* Source: www.tomshardware.com
16. Quick Diversion
•cfhistograms is your friend
•Histograms of statistics per table
•Collected...
• per read
• per write
• SSTable flush
• Compaction
nodetool cfhistograms <keyspace> <table>
28. Partition Size
•Tuning is an option based on size in bytes
•All about the reads
•index_interval
•How many samples taken
•Lower for faster access but more memory usage
•column_index_size_in_kb
•Add column indexes to a row when the data
reaches this size
•Partial row reads? Maybe smaller.
29. Tuning results
•Spent a lot of time tuning disk
•Played with
• index_interval (Lowered)
• concurrent_reads (Increased)
• column_index_size_in_kb (Lowered)
220 Million Ops/Day
10000 Transactions/Sec Peak
9ms at 95th percentile. Measured at the application!
31. Disk + Data Model
•Understand the internals
• Size of partition
• Compaction
•Learn how to measure
•Load test
32. Thank you! Time for questions...
*More? My data modeling talks:
The Data Model is Dead, Long Live the Data Model
Become a Super Modeler
The World's Next Top Data Model
!