HRX Meetup Group 8/20/2014: Cassandra and How to Scale your Database

  • 694 views
Uploaded on

HR5 alum Stephen Portanova will be presenting on the highly scalable database Cassandra, which is used by Reddit, Netflix, CERN, and The Weather Channel. 'nuff said.

HR5 alum Stephen Portanova will be presenting on the highly scalable database Cassandra, which is used by Reddit, Netflix, CERN, and The Weather Channel. 'nuff said.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
694
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
16
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Cassandra Pretty Cool
  • 2. History Google Big Table Amazon Dynamo
  • 3. Today
  • 4. Why Should You Care ● Horizontal Scaling (basically auto sharding) ● Multiple Nodes - Highly Available ● Really Fast Writes ● Not too shabby at reads either - SLICES!! ● Bright Future
  • 5. The Cluster ● replication factor (rf) ● read consistency (r) ● write consistency (w) ● clustering - shard on partition key
  • 6. The One Ring
  • 7. Storage - Vnodes
  • 8. Data Model ● Wide rows ● Slices Queries ● Denormalization ● Index tables
  • 9. Data Model - Simple Key CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, ROW KEY PRIMARY KEY(user_id));
  • 10. Data Model - Simple Inserts INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party’, ‘cat@b.com‘, ‘hippo@b.com‘, ‘at my place’); INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘999’, ‘wat ‘, ‘horse@b.com‘, ‘giraffe@b.com‘, ‘is going on?’);
  • 11. Data Model Simple Inserts Result Select * from email_app.emails; 111 subject to_add cc body party cat@ hippo@ at my place subject to_add cc body wat horse@ giraffe@ is going on 999
  • 12. Mental Model - Nested Hash Row Keys 111 999 to cc body Column Values subject subject to cc body
  • 13. Data Model - Simple Insert - Again INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party’, ‘cat@b.com‘, ‘hippo@b. com‘, ‘at my place’); 111 subject to_add cc body party cat@ hippo@ at my place subject to_add cc body wat horse@ giraffe@ Is going on? 999 IDEMPOTENT
  • 14. Data Model - Composite Key 1 CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY(user_id, subject)); ROW KEY CLUSTERING KEY
  • 15. Data Model - Composite Insert 1 INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘cat@b.com‘, ‘hippo@b.com‘, ‘at my place’); Same as Before. Right???
  • 16. Data Model Composite Insert Result Select * from emails WHERE user_id = 111; Subject 111 party|to_ad party|cc party|body cat@ hippo@ At my place
  • 17. Mental Model - Nested Hash 111 to_add cc body Row Key Column Values party Clustering Column user_id subject
  • 18. Data Model - Composite Insert 2 INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ’ swim’, ‘cat@b.com‘, ‘hippo@b.com‘, ‘in the pool’);
  • 19. Composite Insert 2 Result Select * from emails WHERE user_id = ‘111’; Subject 111 party|to_add party|cc party|body cat@ hippo@ at my place swim|to_add swim|cc swim|body cat@ hippo@b in the pool Sorted by clustering column - “subject”
  • 20. Mental Model - Nested Sorted Hash 111 party to cc body Row Key Clustering Column Column Values swim to cc body user_id subject
  • 21. Why sorted? SLICE QUERIES!! SELECT * FROM emails WHERE user_id = '111' AND (subject) >= ('s') AND (subject) < (‘t’); 111 party|to_add party|cc party|body cat@ giraffe@ At my place swim|to_add swim|cc swim|body cat@ hippo@b in the pool
  • 22. DM - Compound Composite Key CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY((user_id, subject), to_add)); ROW KEY CLUSTERING KEY
  • 23. Composite / Compound Inserts INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘wat‘, ‘horse@b.com‘, ‘giraffe@b. com‘, ‘is going on?’); INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘cat@b.com‘, ‘hippo@b. com‘, ‘at my place’);
  • 24. Composite Insert 2 Result SELECT * FROM emails WHERE user_id = ‘111’; SELECT * FROM emails WHERE user_id = ‘111’ AND subject = ‘party’; 111:party cat@|cc cat@|body hippo@ At my place to_add
  • 25. Data Model - Composite Insert 1 INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘dog@b.com‘, ‘hippo@b. com‘, ‘all the time’); SELECT * FROM emails WHERE user_id = ‘111’ AND subject = ‘party’; 111:party cat@|cc cat@...|body giraffe@ At my place dog@|cc dog@|body hippo@b all the time Sorting / slice on - “to_add” to_add
  • 26. DM - Compound Composite Key 2 CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, ROW KEY CLUSTERING KEYS PRIMARY KEY((user_id, subject), to_add, cc));
  • 27. Composite / Clustered Inserts INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘dog@b.com‘, ‘hippo@b. com‘, ‘all the time); INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘cat@b.com‘, ‘hippo@b. com‘, ‘At my place’); INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘cat@b.com‘, ‘mouse@b. com‘, ‘At my place’);
  • 28. DM - Composite / Clustered Inserts SELECT * FROM emails WHERE user_id = ‘111’ AND subject = ‘party’; 111|party cat@|hippo@|body cat@|mouse@|body at my place at my place dog@|hippo@|body all the time Slice on (to_add) OR (to_add, cc)
  • 29. Mental Model - Nested Sorted Hash 111|party cat dog hippo mouse hippo body body body Row Key Clustering Columns Column Values user_id + subject to_add cc
  • 30. Part 2 / 8 of this 7 hour talk ● Denormalization ● Index Column Families ● Cassandra Internals (memtables, SSTables, compaction, repair)
  • 31. Part 8 / 8: The Future ● Continually improving ● More and more adoption ● Awesome projects ● http://www.datastax. com/documentation/cassandra/2. 0/pdf/cassandra20.pdf ● http://planetcassandra.org/