C*ollege Credit: Data Modeling for Apache Cassandra

2,878 views
2,748 views

Published on

Cassandra stores data differently than traditional RDBMS’s. It is these differences that allow for improvements in performance, availability and scalability. Aaron Morton, DataStax MVP for Apache Cassandra will present the basics of the data model and outline the differences clearly. This webinar is 101 level and is suitable for people who are coming from a relational background and just starting to get into Apache Cassandra.

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,878
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
84
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

C*ollege Credit: Data Modeling for Apache Cassandra

  1. 1. DATASTAX C*OLLEGE CREDIT:DATA MODELLING FOR APACHE CASSANDRA Aaron MortonApache Cassandra Committer, Data Stax MVP for Apache Cassandra @aaronmorton www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
  2. 2. General Guidelines API Choice Example
  3. 3. Cassandra is good atreading data from a row in the order it is stored.
  4. 4. Typically an efficient data model will denormalize data and use the storage engine order.
  5. 5. To create a good data model understand the queries your application requires.
  6. 6. General Guidelines API Choice Example
  7. 7. Multiple API’s? initially only a Thrift / RPC API, used by language specific clients.
  8. 8. Multiple API’s... Cassandra Query Language (CQL) started as a higher level, declarative alternative.
  9. 9. Multiple API’s... CQL 3 brings many changes. Currently in Beta in Cassandra v1.1
  10. 10. CQL 3 uses a Table Orientated, Schema Driven, Data Model. (I said it had many changes.)
  11. 11. General Guidelines API Choice Example
  12. 12. Twitter Clone Previously done with Thrift at WDCNZ “Hello @World #Cassandra - Apache Cassandra in action” http://vimeo.com/49762233
  13. 13. Twitter clone... using CQL 3 via the cqlsh tool. bin/cqlsh -3
  14. 14. Queries? * Post Tweet to Followers * Get Tweet by ID * List Tweets by User * List Tweets in User Timeline * List Followers
  15. 15. Keyspace is a namespace container.
  16. 16. Our KeyspaceCREATE KEYSPACE cass_collegeWITH strategy_class = NetworkTopologyStrategyAND strategy_options:datacenter1 = 1;
  17. 17. Table is a sparse collection of well known, ordered columns.
  18. 18. First TableCREATE TABLE User( user_name text, password text, real_name text, PRIMARY KEY (user_name));
  19. 19. Some users...cqlsh:cass_college> INSERT INTO User ... (user_name, password, real_name) ... VALUES ... (fred, sekr8t, Mr Foo);cqlsh:cass_college> select * from User; user_name | password | real_name-----------+----------+----------- fred | sekr8t | Mr Foo
  20. 20. Some users...cqlsh:cass_college> INSERT INTO User ... (user_name, password) ... VALUES ... (bob, pwd);cqlsh:cass_college> select * from User where user_name =bob; user_name | password | real_name-----------+----------+----------- bob | pwd | null
  21. 21. Data Model (so far) User
  22. 22. Data Model (so far) CF / User Value user_name Primary Key
  23. 23. Tweet TableCREATE TABLE Tweet( tweet_id bigint, body text, user_name text, timestamp timestamp, PRIMARY KEY (tweet_id));
  24. 24. Tweet Table...cqlsh:cass_college> INSERT INTO Tweet ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, The Tweet,fred,1352150816917);cqlsh:cass_college> select * from Tweet where tweet_id = 1; tweet_id | body | timestamp | user_name----------+-----------+--------------------------+----------- 1 | The Tweet | 2012-11-06 10:26:56+1300 | fred
  25. 25. Data Model (so far) CF / User Tweet Value user_name Primary Key Field tweet_id Primary Key
  26. 26. UserTweets TableCREATE TABLE UserTweets( tweet_id bigint, user_name text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id));
  27. 27. UserTweets Table...cqlsh:cass_college> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, The Tweet,fred,1352150816917);cqlsh:cass_college> select * from UserTweets whereuser_name=fred; user_name | tweet_id | body | timestamp-----------+----------+-----------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
  28. 28. UserTweets Table...cqlsh:cass_college> select * from UserTweets whereuser_name=fred and tweet_id=1; user_name | tweet_id | body | timestamp-----------+----------+-----------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
  29. 29. UserTweets Table...cqlsh:cass_college> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (2, Second Tweet, fred, 1352150816918);cqlsh:cass_college> select * from UserTweets where user_name = fred; user_name | tweet_id | body | timestamp-----------+----------+--------------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300 fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300
  30. 30. UserTweets Table...cqlsh:cass_college> select * from UserTweets where user_name = fred order bytweet_id desc; user_name | tweet_id | body | timestamp-----------+----------+--------------+-------------------------- fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300 fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
  31. 31. UserTimelineCREATE TABLE UserTimeline( tweet_id bigint, user_name text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id));
  32. 32. Data Model (so far) CF / User User User Tweet Value Tweets Timeline user_name Primary Key Field Primary Key Primary Key Primary Key Primary Key tweet_id Primary Key Component Component
  33. 33. UserMetrics TableCREATE TABLE UserMetrics( user_name text, tweets counter, followers counter, following counter, PRIMARY KEY (user_name));
  34. 34. UserMetrics Table...cqlsh:cass_college> UPDATE ... UserMetrics ... SET ... tweets = tweets + 1 ... WHERE ... user_name = fred;cqlsh:cass_college> select * from UserMetrics where user_name= fred; user_name | followers | following | tweets-----------+-----------+-----------+-------- fred | null | null | 1
  35. 35. Data Model (so far) CF / User User User Tweet User Metrics Value Tweets Timeline Primary Primary Primary Primary user_name Field Key Key Key Key Primary Primary Key Primary Key tweet_id Key Component Component
  36. 36. RelationshipsCREATE TABLE Followers( user_name text, follower text, timestamp timestamp, PRIMARY KEY (user_name, follower));CREATE TABLE Following( user_name text, following text, timestamp timestamp, PRIMARY KEY (user_name, following));
  37. 37. RelationshipsINSERT INTO Following (user_name, following, timestamp)VALUES (bob, fred, 1352247749161);INSERT INTO Followers (user_name, follower, timestamp)VALUES (fred, bob, 1352247749161);
  38. 38. Relationshipscqlsh:cass_college> select * from Following; user_name | following | timestamp-----------+-----------+-------------------------- bob | fred | 2012-11-07 13:22:29+1300cqlsh:cass_college> select * from Followers; user_name | follower | timestamp-----------+----------+-------------------------- fred | bob | 2012-11-07 13:22:29+1300
  39. 39. Data Model CF / User User User Follows User Tweet Value Tweets Timeline Metrics Followers Primary Primary Primary Primary Primaryuser_name Field Key Key Key Key Key Field Primary Primary Key Primary Key tweet_id Key Component Component
  40. 40. Thanks.
  41. 41. Aaron Morton @aaronmorton www.thelastpickle.comLicensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

×