Cassandra data modeling talk
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Cassandra data modeling talk

on

  • 130,830 views

This is similar to the talk I did for the Cassandra Summit but all examples are in CQL 3.

This is similar to the talk I did for the Cassandra Summit but all examples are in CQL 3.

Statistics

Views

Total Views
130,830
Views on SlideShare
130,776
Embed Views
54

Actions

Likes
6
Downloads
95
Comments
0

2 Embeds 54

https://twitter.com 53
https://si0.twimg.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Cassandra data modeling talk Presentation Transcript

  • 1. Building a Cassandra Based Application From 0 to Deploy Patrick McFadin Solution Architect at DataStaxWednesday, November 7, 12
  • 2. Me • Solution Architect at DataStax, THE Cassandra company • Cassandra user since .7 • Follow me here: @PatrickMcFadinWednesday, November 7, 12
  • 3. Goals • Take a new application concept • What is the data model?? • Express that in CQL 3 • Some sample codeWednesday, November 7, 12
  • 4. The Plan • Conceptualize a new application • Identify the entity tables • Identify query tables • Code. Rinse. Repeat. • DeployWednesday, November 7, 12
  • 5. www.killrvideos.com Start with a Video Title Username concept Recommended Description Meow Ads by Google Text • Video sharing Rating: Tags: Foo Bar website Upload New! Comments *Cat drawing by goodrob13 on FlickrWednesday, November 7, 12
  • 6. Break down the features • Post a video • View a video • Add a comment • Rate a video • Tag a videoWednesday, November 7, 12
  • 7. Create Entity Tables Basic storage unitWednesday, November 7, 12
  • 8. Users firstname lastname email password created_date Username • Similar to a RDBMS table. Fairly fixed columns • Username is unique • Use secondary indexes on firstname and lastname for lookup • Adding columns with Cassandra is super easy CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username) );Wednesday, November 7, 12
  • 9. Users: The insert code static void setUser(User user, Keyspace keyspace) { // Create a mutator that allows you to talk to casssandra Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer); try { // Use the mutator to insert data into our table mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("firstname", user.getFirstname())); mutator.addInsertion(user.getUsername(), "users”, HFactory.createStringColumn("lastname", user.getLastname())); mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("password", user.getPassword())); // Once the mutator is ready, execute on cassandra mutator.execute(); } catch (HectorException he) { he.printStackTrace(); } }Wednesday, November 7, 12
  • 10. Videos (one-to-many) VideoId videoname username description tags upload_date <UUID> • Use a UUID as a row key for uniqueness • Allows for same video names • Tags should be stored in some sort of delimited format • Index on username may not be the best plan CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY (videoid,videoname) );Wednesday, November 7, 12
  • 11. Videos: The get code static Video getVideoByUUID(UUID videoId, Keyspace keyspace){ Video video = new Video(); //Create a slice query. Well be getting specific column names SliceQuery<UUID, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer); sliceQuery.setColumnFamily("videos"); sliceQuery.setKey(videoId); sliceQuery.setColumnNames("videoname","username","description","tags"); // Execute the query and get the list of columns ColumnSlice<String,String> result = sliceQuery.execute().get(); // Get each column by name and add them to our video object video.setVideoName(result.getColumnByName("videoname").getValue()); video.setUsername(result.getColumnByName("username").getValue()); video.setDescription(result.getColumnByName("description").getValue()); video.setTags(result.getColumnByName("tags").getValue().split(",")); return video; }Wednesday, November 7, 12
  • 12. Comments (many-to-many) VideoId username comment_ts comment <UUID> • Videos have many comments • Comments have many users • Order is as inserted • Use getSlice() to pull some or all of the comments CREATE TABLE comments ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username,comment_ts) );Wednesday, November 7, 12
  • 13. Comments... pt 2 VideoId username:comment_ts .. username:comment_ts <UUID> comment .. comment Wide row Time ordered • This is what’s really going on • VideoID is the key • Composite of username and comment_ts are the column name • 1 column per commentWednesday, November 7, 12
  • 14. Ratings rating_count rating_total VideoId <UUID> <counter> <counter> • Use counter for single call update • rating_count is how many ratings were given • rating_total is the sum of rating • Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6 CREATE TABLE video_rating ( videoid uuid, rating_counter counter, rating_total counter, PRIMARY KEY (videoid) );Wednesday, November 7, 12
  • 15. Video Events start_<timestamp> stop_<timestamp> start_<timestamp> VideoId:Username video_<timestamp> Latest .. Oldest • Track viewing events • Combine Video ID and Username for a unique row • Stop time can be used to pick up where they left off • Great for usage analytics later • Reverse comparator! CREATE TABLE video_event ( videoid_username varchar, event varchar, event_timestamp timestamp, video_timestamp bigint, PRIMARY KEY (videoid_username, event_timestamp, event) ) WITH CLUSTERING ORDER BY (event_timestamp DESC, event ASC);Wednesday, November 7, 12
  • 16. Create Query Tables Indexes to support fast lookupsWednesday, November 7, 12
  • 17. Index table principles Lookup5RowKey5 RowKey1 • Lookup by rowkey RowKey2 RowKey3 RowKey4 RowKey5 RowKey6 • Indexed • Cached (most times) RowKey7 RowKey8 RowKey9 RowKey10 RowKey11 RowKey12Wednesday, November 7, 12
  • 18. Index table principles Col3 Col4 Col5 Col6 GetSlice6Col37Col6 Sequential Read RowKey5 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 • Get row by the key • Slice. Get data in one pass • Cached (sometimes)Wednesday, November 7, 12
  • 19. Video by Username VideoId:<timestamp> .. VideoId:<timestamp> Username Wide row • Username is unique • One column for each new video uploaded • Column slice for time span. From x to y • VideoId is added the same time a Video record is added CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid, upload_date) );Wednesday, November 7, 12
  • 20. Video by Tag VideoId .. VideoId tag timestamp timestamp • Tag is unique regardless of video • Great for “List videos with X tag” • Tags have to be updated in Video and Tag at the same time • Index integrity is maintained in app logic CREATE TABLE tag_index ( tag varchar, videoid varchar, timestamp timestamp, PRIMARY KEY (tag, videoid) );Wednesday, November 7, 12
  • 21. Deployment • Replication factor? • Multi-datacenter? • Cost?Wednesday, November 7, 12
  • 22. Deployment • Today != tomorrow • Scale when needed • Have expansion plan readyWednesday, November 7, 12
  • 23. DataStax Enterprise • Analytics - Hadoop • Search - SolrWednesday, November 7, 12
  • 24. Hadoop • Embedded with Cassandra • No single point of failure • Use native c* data • Hive, Pig, MahoutWednesday, November 7, 12
  • 25. Solr • Embeded with Cassandra • Fast reverse-index • Shards Solr by key rangeWednesday, November 7, 12
  • 26. OpsCenterWednesday, November 7, 12
  • 27. Thank you! Connect with me at @PatrickMcFadin Or linkedInWednesday, November 7, 12