Cassandra 
Pretty Cool
History 
Google Big Table 
Amazon Dynamo
Today
Why Should You Care 
● Horizontal Scaling (basically auto sharding) 
● Multiple Nodes - Highly Available 
● Really Fast Wr...
The Cluster 
● replication factor (rf) 
● read consistency (r) 
● write consistency (w) 
● clustering - shard on 
partitio...
The One Ring
Storage - Vnodes
Data Model 
● Wide rows 
● Slices Queries 
● Denormalization 
● Index tables
Data Model - Simple Key 
CREATE TABLE email_app.emails ( 
user_id text, 
subject text, 
to_add text, 
cc text, 
body text,...
Data Model - Simple Inserts 
INSERT INTO email_app.emails (user_id, 
subject, to_add, cc, body) VALUES (‘111’, 
‘party’, ‘...
Data Model Simple Inserts Result 
Select * from email_app.emails; 
111 
subject to_add cc body 
party cat@ hippo@ at my pl...
Mental Model - Nested Hash 
Row Keys 111 
999 
to cc body 
Column 
Values 
subject subject to cc body
Data Model - Simple Insert - Again 
INSERT INTO email_app.emails (user_id, subject, to_add, 
cc, body) VALUES (‘111’, ‘par...
Data Model - Composite Key 1 
CREATE TABLE email_app.emails ( 
user_id text, 
subject text, 
to_add text, 
cc text, 
body ...
Data Model - Composite Insert 1 
INSERT INTO email_app.emails (user_id, 
subject, to_add, cc, body) VALUES (‘111’, 
‘party...
Data Model Composite Insert Result 
Select * from emails WHERE user_id = 111; 
Subject 
111 party|to_ad party|cc party|bod...
Mental Model - Nested Hash 
111 
to_add cc body 
Row Key 
Column 
Values 
party 
Clustering 
Column 
user_id 
subject
Data Model - Composite Insert 2 
INSERT INTO email_app.emails (user_id, 
subject, to_add, cc, body) VALUES (‘111’, ’ 
swim...
Composite Insert 2 Result 
Select * from emails WHERE user_id = ‘111’; 
Subject 
111 party|to_add party|cc party|body 
cat...
Mental Model - Nested Sorted Hash 
111 
party 
to cc body 
Row Key 
Clustering 
Column 
Column 
Values 
swim 
to cc body 
...
Why sorted? 
SLICE QUERIES!! 
SELECT * FROM emails WHERE user_id = '111' 
AND (subject) >= ('s') AND (subject) < (‘t’); 
1...
DM - Compound Composite Key 
CREATE TABLE email_app.emails ( 
user_id text, 
subject text, 
to_add text, 
cc text, 
body t...
Composite / Compound Inserts 
INSERT INTO email_app.emails (user_id, subject, to_add, 
cc, body) VALUES (‘111’, ‘wat‘, ‘ho...
Composite Insert 2 Result 
SELECT * FROM emails WHERE user_id = ‘111’; 
SELECT * FROM emails WHERE user_id = ‘111’ 
AND su...
Data Model - Composite Insert 1 
INSERT INTO email_app.emails (user_id, subject, to_add, 
cc, body) VALUES (‘111’, ‘party‘...
DM - Compound Composite Key 2 
CREATE TABLE email_app.emails ( 
user_id text, 
subject text, 
to_add text, 
cc text, 
body...
Composite / Clustered Inserts 
INSERT INTO email_app.emails (user_id, subject, to_add, 
cc, body) VALUES (‘111’, ‘party‘, ...
DM - Composite / Clustered Inserts 
SELECT * FROM emails WHERE user_id = ‘111’ AND 
subject = ‘party’; 
111|party 
cat@|hi...
Mental Model - Nested Sorted Hash 
111|party 
cat dog 
hippo mouse hippo 
body body body 
Row Key 
Clustering 
Columns 
Co...
Part 2 / 8 of this 7 hour talk 
● Denormalization 
● Index Column Families 
● Cassandra Internals (memtables, SSTables, 
c...
Part 8 / 8: The Future 
● Continually improving 
● More and more adoption 
● Awesome projects 
● http://www.datastax. 
com...
Upcoming SlideShare
Loading in...5
×

HRX Meetup Group 8/20/2014: Cassandra and How to Scale your Database

822

Published on

HR5 alum Stephen Portanova will be presenting on the highly scalable database Cassandra, which is used by Reddit, Netflix, CERN, and The Weather Channel. 'nuff said.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
822
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
18
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

HRX Meetup Group 8/20/2014: Cassandra and How to Scale your Database

  1. 1. Cassandra Pretty Cool
  2. 2. History Google Big Table Amazon Dynamo
  3. 3. Today
  4. 4. Why Should You Care ● Horizontal Scaling (basically auto sharding) ● Multiple Nodes - Highly Available ● Really Fast Writes ● Not too shabby at reads either - SLICES!! ● Bright Future
  5. 5. The Cluster ● replication factor (rf) ● read consistency (r) ● write consistency (w) ● clustering - shard on partition key
  6. 6. The One Ring
  7. 7. Storage - Vnodes
  8. 8. Data Model ● Wide rows ● Slices Queries ● Denormalization ● Index tables
  9. 9. Data Model - Simple Key CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, ROW KEY PRIMARY KEY(user_id));
  10. 10. Data Model - Simple Inserts INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party’, ‘cat@b.com‘, ‘hippo@b.com‘, ‘at my place’); INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘999’, ‘wat ‘, ‘horse@b.com‘, ‘giraffe@b.com‘, ‘is going on?’);
  11. 11. Data Model Simple Inserts Result Select * from email_app.emails; 111 subject to_add cc body party cat@ hippo@ at my place subject to_add cc body wat horse@ giraffe@ is going on 999
  12. 12. Mental Model - Nested Hash Row Keys 111 999 to cc body Column Values subject subject to cc body
  13. 13. Data Model - Simple Insert - Again INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party’, ‘cat@b.com‘, ‘hippo@b. com‘, ‘at my place’); 111 subject to_add cc body party cat@ hippo@ at my place subject to_add cc body wat horse@ giraffe@ Is going on? 999 IDEMPOTENT
  14. 14. Data Model - Composite Key 1 CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY(user_id, subject)); ROW KEY CLUSTERING KEY
  15. 15. Data Model - Composite Insert 1 INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘cat@b.com‘, ‘hippo@b.com‘, ‘at my place’); Same as Before. Right???
  16. 16. Data Model Composite Insert Result Select * from emails WHERE user_id = 111; Subject 111 party|to_ad party|cc party|body cat@ hippo@ At my place
  17. 17. Mental Model - Nested Hash 111 to_add cc body Row Key Column Values party Clustering Column user_id subject
  18. 18. Data Model - Composite Insert 2 INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ’ swim’, ‘cat@b.com‘, ‘hippo@b.com‘, ‘in the pool’);
  19. 19. Composite Insert 2 Result Select * from emails WHERE user_id = ‘111’; Subject 111 party|to_add party|cc party|body cat@ hippo@ at my place swim|to_add swim|cc swim|body cat@ hippo@b in the pool Sorted by clustering column - “subject”
  20. 20. Mental Model - Nested Sorted Hash 111 party to cc body Row Key Clustering Column Column Values swim to cc body user_id subject
  21. 21. Why sorted? SLICE QUERIES!! SELECT * FROM emails WHERE user_id = '111' AND (subject) >= ('s') AND (subject) < (‘t’); 111 party|to_add party|cc party|body cat@ giraffe@ At my place swim|to_add swim|cc swim|body cat@ hippo@b in the pool
  22. 22. DM - Compound Composite Key CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY((user_id, subject), to_add)); ROW KEY CLUSTERING KEY
  23. 23. Composite / Compound Inserts INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘wat‘, ‘horse@b.com‘, ‘giraffe@b. com‘, ‘is going on?’); INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘cat@b.com‘, ‘hippo@b. com‘, ‘at my place’);
  24. 24. Composite Insert 2 Result SELECT * FROM emails WHERE user_id = ‘111’; SELECT * FROM emails WHERE user_id = ‘111’ AND subject = ‘party’; 111:party cat@|cc cat@|body hippo@ At my place to_add
  25. 25. Data Model - Composite Insert 1 INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘dog@b.com‘, ‘hippo@b. com‘, ‘all the time’); SELECT * FROM emails WHERE user_id = ‘111’ AND subject = ‘party’; 111:party cat@|cc cat@...|body giraffe@ At my place dog@|cc dog@|body hippo@b all the time Sorting / slice on - “to_add” to_add
  26. 26. DM - Compound Composite Key 2 CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, ROW KEY CLUSTERING KEYS PRIMARY KEY((user_id, subject), to_add, cc));
  27. 27. Composite / Clustered Inserts INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘dog@b.com‘, ‘hippo@b. com‘, ‘all the time); INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘cat@b.com‘, ‘hippo@b. com‘, ‘At my place’); INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘cat@b.com‘, ‘mouse@b. com‘, ‘At my place’);
  28. 28. DM - Composite / Clustered Inserts SELECT * FROM emails WHERE user_id = ‘111’ AND subject = ‘party’; 111|party cat@|hippo@|body cat@|mouse@|body at my place at my place dog@|hippo@|body all the time Slice on (to_add) OR (to_add, cc)
  29. 29. Mental Model - Nested Sorted Hash 111|party cat dog hippo mouse hippo body body body Row Key Clustering Columns Column Values user_id + subject to_add cc
  30. 30. Part 2 / 8 of this 7 hour talk ● Denormalization ● Index Column Families ● Cassandra Internals (memtables, SSTables, compaction, repair)
  31. 31. Part 8 / 8: The Future ● Continually improving ● More and more adoption ● Awesome projects ● http://www.datastax. com/documentation/cassandra/2. 0/pdf/cassandra20.pdf ● http://planetcassandra.org/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×