Cassandra By Example: Data Modelling with CQL3

35,080 views

Published on

Data Day Texas 2013 presentation on Cassandra data modeling with CQL.

Published in: Technology
10 Comments
53 Likes
Statistics
Notes
  • Njce! Thanks for sharing.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • hi, I also have a problem like this. Is someone able to solve the problem?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • @Eric Evans But partitioning will not solve the core problem with NOSQL DB which is too many insertions for single user action.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • @khan7290 Another solution would be to partition the followers across some number of rows.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • On the solutions: 1) Don't use cassandra, use other database where we can just store the mapping of 2 users only. But still 1 million mappings! 2) Rather inserting right away, do lazy insertion. When one of the follower loads his page, only then pull tweets for him from each person he is following. Again he can follow 1 million others. This is a massive problem.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
35,080
On SlideShare
0
From Embeds
0
Number of Embeds
1,824
Actions
Shares
0
Downloads
1,073
Comments
10
Likes
53
Embeds 0
No embeds

No notes for slide

Cassandra By Example: Data Modelling with CQL3

  1. 1. Cassandra By Example:Data Modelling with CQL3Eric Evanseevans@opennms.com@jericevans
  2. 2. CQL is...● Query language for Apache Cassandra● Almost SQL (almost)● Alternative query interface First class citizen● More performant!● Available since Cassandra 0.8.0 (almost 2 years!)
  3. 3. Bad Old Days: Thrift RPC
  4. 4. Bad Old Days: Thrift RPC// Your ColumnColumn col = new Column(ByteBuffer.wrap("name".getBytes()));col.setValue(ByteBuffer.wrap("value".getBytes()));col.setTimestamp(System.currentTimeMillis());// Dont askColumnOrSuperColumn cosc = new ColumnOrSuperColumn();cosc.setColumn(col);// Prepare to be amazedMutation mutation = new Mutation();mutation.setColumnOrSuperColumn(cosc);List<Mutation> mutations = new ArrayList<Mutation>();mutations.add(mutation);Map mutations_map = new HashMap<ByteBuffer, Map<String, List<Mutation>>>();Map cf_map = new HashMap<String, List<Mutation>>();cf_map.set("Standard1", mutations);mutations_map.put(ByteBuffer.wrap("key".getBytes()), cf_map);cassandra.batch_mutate(mutations_map, consistency_level);
  5. 5. Better, no?INSERT INTO (id, name) VALUES (key, value);
  6. 6. But before we begin...
  7. 7. Partitioning Z A Q E M I
  8. 8. Partitioning Z A Q Cat E M I
  9. 9. Partitioning Z A Q Cat E M I
  10. 10. Partitioning A Pets Animal Type Size Youtub-able E Cat mammal small true ... I
  11. 11. Twissandra● Twitter-inspired sample application● Originally by Eric Florenzano, June 2009● Python (Django)● DBAPI-2 driver for CQL● Favors simplicity over correctness!● https://github.com/eevans/twissandra ○ See: cass.py
  12. 12. Twissandra
  13. 13. Twissandra
  14. 14. Twissandra
  15. 15. Twissandra
  16. 16. Twissandra
  17. 17. Twissandra Explained
  18. 18. users
  19. 19. users-- User storageCREATE TABLE users ( username text PRIMARY KEY, password text);
  20. 20. users-- Adding users (signup)INSERT INTO users (username, password) VALUES (meg, s3kr3t)
  21. 21. users
  22. 22. users-- Lookup password (login)SELECT password FROM users WHERE username = meg
  23. 23. following / followers
  24. 24. following-- Users a user is followingCREATE TABLE following ( username text, followed text, PRIMARY KEY(username, followed));
  25. 25. following-- Meg follows StewieINSERT INTO following (username, followed) VALUES (meg, stewie)-- Get a list of who Meg followsSELECT followed FROM following WHERE username = meg
  26. 26. users @meg is following followed---------- brian chris lois peter stewie quagmire ...
  27. 27. followers-- The users who follow usernameCREATE TABLE followers ( username text, following text, PRIMARY KEY(username, following));
  28. 28. followers-- Meg follows StewieINSERT INTO followers (username, followed) VALUES (stewie, meg)-- Get a list of who follows StewieSELECT followers FROM following WHERE username = stewie
  29. 29. redux: following / followers-- @meg follows @stewieBEGIN BATCH INSERT INTO following (username, followed) VALUES (meg, stewie) INSERT INTO followers (username, followed) VALUES (stewie, meg)APPLY BATCH
  30. 30. tweets
  31. 31. Denormalization Ahead!
  32. 32. tweets-- Tweet storage (think: permalink)CREATE TABLE tweets ( tweetid uuid PRIMARY KEY, username text, body text);
  33. 33. tweets-- Store a tweetINSERT INTO tweets ( tweetid, username, body) VALUES ( 60780342-90fe-11e2-8823-0026c650d722, stewie, victory is mine!)
  34. 34. Query tweets by ... ?● author, time descending● followed authors, time descending● date starting / date ending
  35. 35. userlinetweets, by user
  36. 36. userline-- Materialized view of the tweets-- created by user.CREATE TABLE userline ( username text, tweetid timeuuid, body text, PRIMARY KEY(username, tweetid));
  37. 37. Wait, WTF is a timeuuid?● Aka "Type 1 UUID" (http://goo.gl/SWuCb)● 100 nano second units since Oct. 15, 1582● Timestamp is first 60 bits (sorts temporally!)● Used like timestamp, but: ○ more granular ○ globally unique
  38. 38. userline-- Range of tweets for a userSELECT dateOf(tweetid), bodyFROM userlineWHERE username = stewie AND tweetid > minTimeuuid(2013-03-01 12:10:09)ORDER BY tweetid DESCLIMIT 40
  39. 39. @stewies most recent tweets dateOf(posted_at) | body--------------------------+------------------------------- 2013-03-19 14:43:15-0500 | victory is mine! 2013-03-19 13:23:24-0500 | generate killer bandwidth 2013-03-19 13:23:24-0500 | grow B2B e-business 2013-03-19 13:23:24-0500 | innovate vertical e-services 2013-03-19 13:23:24-0500 | deploy e-business experiences 2013-03-19 13:23:24-0500 | grow intuitive infrastructures ...
  40. 40. timelinetweets from those a user follows
  41. 41. timeline-- Materialized view of tweets from-- the users username follows.CREATE TABLE timeline ( username text, tweetid timeuuid, posted_by text, body text, PRIMARY KEY(username, tweetid));
  42. 42. timeline-- Range of tweets for a userSELECT dateOf(tweetid), posted_by, bodyFROM timelineWHERE username = stewie AND tweetid > 2013-03-01 12:10:09ORDER BY tweetid DESCLIMIT 40
  43. 43. most recent tweets for @meg dateOf(posted_at) | posted_by | body--------------------------+-----------+------------------- 2013-03-19 14:43:15-0500 | stewie | victory is mine! 2013-03-19 13:23:25-0500 | meg | evolve intuit... 2013-03-19 13:23:25-0500 | meg | whiteboard bric... 2013-03-19 13:23:25-0500 | stewie | brand clic... 2013-03-19 13:23:25-0500 | brian | synergize gran... 2013-03-19 13:23:24-0500 | brian | expedite real-t... 2013-03-19 13:23:24-0500 | stewie | generate kil... 2013-03-19 13:23:24-0500 | stewie | grow B2B ... 2013-03-19 13:23:24-0500 | meg | generate intera... ...
  44. 44. redux: tweets-- @stewie tweetsBEGIN BATCH INSERT INTO tweets ... INSERT INTO userline ... INSERT INTO timeline ... INSERT INTO timeline ... INSERT INTO timeline ... ...APPLY BATCH
  45. 45. In Conclusion:● Think in terms of your queries, store that● Dont fear duplication; Space is cheap to scale● Go wide; Rows can have 2 billion columns!● The only thing better than NoSQL, is MoSQL● Python hater? Java ❤r? ○ https://github.com/eevans/twissandra-j● http://goo.gl/zPOD
  46. 46. The End

×