Building a Cassandra                         Based Application                                  From 0 to Deploy          ...
Me                   • Solution Architect at DataStax, THE                            Cassandra company                   ...
Goals                   • Take a new application concept                   • What is the data model??                   • ...
The Plan                   • Conceptualize a new application                   • Identify the entity tables               ...
www.killrvideos.com        Start with a                                           Video Title                       Userna...
Break down the                               features                   • Post a video                   • View a video   ...
Create Entity Tables                                 Basic storage unitWednesday, November 7, 12
Users                            firstname   lastname   email      password     created_date            Username    • Simi...
Users: The insert code            static void setUser(User user, Keyspace keyspace) {               // Create a mutator th...
Videos         (one-to-many)           VideoId          videoname     username   description        tags        upload_dat...
Videos: The get code     static Video getVideoByUUID(UUID videoId, Keyspace keyspace){         Video video = new Video(); ...
Comments                          (many-to-many)             VideoId        username   comment_ts         comment         ...
Comments... pt 2                            VideoId   username:comment_ts   ..          username:comment_ts               ...
Ratings                                        rating_count    rating_total                            VideoId            ...
Video Events                                               start_<timestamp>   stop_<timestamp>    start_<timestamp>      ...
Create Query Tables                            Indexes to support fast lookupsWednesday, November 7, 12
Index table principles                        Lookup5RowKey5                                         RowKey1              ...
Index table principles                                         Col3      Col4      Col5       Col6           GetSlice6Col3...
Video by Username                                       VideoId:<timestamp>      ..         VideoId:<timestamp>           ...
Video by Tag                                    VideoId     ..              VideoId                            tag        ...
Deployment             • Replication factor?             • Multi-datacenter?             • Cost?Wednesday, November 7, 12
Deployment                   • Today != tomorrow                   • Scale when needed                   • Have expansion ...
DataStax Enterprise          • Analytics - Hadoop          • Search - SolrWednesday, November 7, 12
Hadoop       • Embedded with Cassandra       • No single point of failure       • Use native c* data       • Hive, Pig, Ma...
Solr         • Embeded with Cassandra         • Fast reverse-index         • Shards Solr by key rangeWednesday, November 7...
OpsCenterWednesday, November 7, 12
Thank you!                            Connect with me at @PatrickMcFadin                                       Or linkedIn...
Upcoming SlideShare
Loading in …5
×

Cassandra data modeling talk

131,159 views

Published on

This is similar to the talk I did for the Cassandra Summit but all examples are in CQL 3.

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
131,159
On SlideShare
0
From Embeds
0
Number of Embeds
39
Actions
Shares
0
Downloads
121
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Cassandra data modeling talk

  1. 1. Building a Cassandra Based Application From 0 to Deploy Patrick McFadin Solution Architect at DataStaxWednesday, November 7, 12
  2. 2. Me • Solution Architect at DataStax, THE Cassandra company • Cassandra user since .7 • Follow me here: @PatrickMcFadinWednesday, November 7, 12
  3. 3. Goals • Take a new application concept • What is the data model?? • Express that in CQL 3 • Some sample codeWednesday, November 7, 12
  4. 4. The Plan • Conceptualize a new application • Identify the entity tables • Identify query tables • Code. Rinse. Repeat. • DeployWednesday, November 7, 12
  5. 5. www.killrvideos.com Start with a Video Title Username concept Recommended Description Meow Ads by Google Text • Video sharing Rating: Tags: Foo Bar website Upload New! Comments *Cat drawing by goodrob13 on FlickrWednesday, November 7, 12
  6. 6. Break down the features • Post a video • View a video • Add a comment • Rate a video • Tag a videoWednesday, November 7, 12
  7. 7. Create Entity Tables Basic storage unitWednesday, November 7, 12
  8. 8. Users firstname lastname email password created_date Username • Similar to a RDBMS table. Fairly fixed columns • Username is unique • Use secondary indexes on firstname and lastname for lookup • Adding columns with Cassandra is super easy CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username) );Wednesday, November 7, 12
  9. 9. Users: The insert code static void setUser(User user, Keyspace keyspace) { // Create a mutator that allows you to talk to casssandra Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer); try { // Use the mutator to insert data into our table mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("firstname", user.getFirstname())); mutator.addInsertion(user.getUsername(), "users”, HFactory.createStringColumn("lastname", user.getLastname())); mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("password", user.getPassword())); // Once the mutator is ready, execute on cassandra mutator.execute(); } catch (HectorException he) { he.printStackTrace(); } }Wednesday, November 7, 12
  10. 10. Videos (one-to-many) VideoId videoname username description tags upload_date <UUID> • Use a UUID as a row key for uniqueness • Allows for same video names • Tags should be stored in some sort of delimited format • Index on username may not be the best plan CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY (videoid,videoname) );Wednesday, November 7, 12
  11. 11. Videos: The get code static Video getVideoByUUID(UUID videoId, Keyspace keyspace){ Video video = new Video(); //Create a slice query. Well be getting specific column names SliceQuery<UUID, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer); sliceQuery.setColumnFamily("videos"); sliceQuery.setKey(videoId); sliceQuery.setColumnNames("videoname","username","description","tags"); // Execute the query and get the list of columns ColumnSlice<String,String> result = sliceQuery.execute().get(); // Get each column by name and add them to our video object video.setVideoName(result.getColumnByName("videoname").getValue()); video.setUsername(result.getColumnByName("username").getValue()); video.setDescription(result.getColumnByName("description").getValue()); video.setTags(result.getColumnByName("tags").getValue().split(",")); return video; }Wednesday, November 7, 12
  12. 12. Comments (many-to-many) VideoId username comment_ts comment <UUID> • Videos have many comments • Comments have many users • Order is as inserted • Use getSlice() to pull some or all of the comments CREATE TABLE comments ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username,comment_ts) );Wednesday, November 7, 12
  13. 13. Comments... pt 2 VideoId username:comment_ts .. username:comment_ts <UUID> comment .. comment Wide row Time ordered • This is what’s really going on • VideoID is the key • Composite of username and comment_ts are the column name • 1 column per commentWednesday, November 7, 12
  14. 14. Ratings rating_count rating_total VideoId <UUID> <counter> <counter> • Use counter for single call update • rating_count is how many ratings were given • rating_total is the sum of rating • Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6 CREATE TABLE video_rating ( videoid uuid, rating_counter counter, rating_total counter, PRIMARY KEY (videoid) );Wednesday, November 7, 12
  15. 15. Video Events start_<timestamp> stop_<timestamp> start_<timestamp> VideoId:Username video_<timestamp> Latest .. Oldest • Track viewing events • Combine Video ID and Username for a unique row • Stop time can be used to pick up where they left off • Great for usage analytics later • Reverse comparator! CREATE TABLE video_event ( videoid_username varchar, event varchar, event_timestamp timestamp, video_timestamp bigint, PRIMARY KEY (videoid_username, event_timestamp, event) ) WITH CLUSTERING ORDER BY (event_timestamp DESC, event ASC);Wednesday, November 7, 12
  16. 16. Create Query Tables Indexes to support fast lookupsWednesday, November 7, 12
  17. 17. Index table principles Lookup5RowKey5 RowKey1 • Lookup by rowkey RowKey2 RowKey3 RowKey4 RowKey5 RowKey6 • Indexed • Cached (most times) RowKey7 RowKey8 RowKey9 RowKey10 RowKey11 RowKey12Wednesday, November 7, 12
  18. 18. Index table principles Col3 Col4 Col5 Col6 GetSlice6Col37Col6 Sequential Read RowKey5 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 • Get row by the key • Slice. Get data in one pass • Cached (sometimes)Wednesday, November 7, 12
  19. 19. Video by Username VideoId:<timestamp> .. VideoId:<timestamp> Username Wide row • Username is unique • One column for each new video uploaded • Column slice for time span. From x to y • VideoId is added the same time a Video record is added CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid, upload_date) );Wednesday, November 7, 12
  20. 20. Video by Tag VideoId .. VideoId tag timestamp timestamp • Tag is unique regardless of video • Great for “List videos with X tag” • Tags have to be updated in Video and Tag at the same time • Index integrity is maintained in app logic CREATE TABLE tag_index ( tag varchar, videoid varchar, timestamp timestamp, PRIMARY KEY (tag, videoid) );Wednesday, November 7, 12
  21. 21. Deployment • Replication factor? • Multi-datacenter? • Cost?Wednesday, November 7, 12
  22. 22. Deployment • Today != tomorrow • Scale when needed • Have expansion plan readyWednesday, November 7, 12
  23. 23. DataStax Enterprise • Analytics - Hadoop • Search - SolrWednesday, November 7, 12
  24. 24. Hadoop • Embedded with Cassandra • No single point of failure • Use native c* data • Hive, Pig, MahoutWednesday, November 7, 12
  25. 25. Solr • Embeded with Cassandra • Fast reverse-index • Shards Solr by key rangeWednesday, November 7, 12
  26. 26. OpsCenterWednesday, November 7, 12
  27. 27. Thank you! Connect with me at @PatrickMcFadin Or linkedInWednesday, November 7, 12

×