Cassandra data modeling talk
Upcoming SlideShare
Loading in...5

Cassandra data modeling talk



This is similar to the talk I did for the Cassandra Summit but all examples are in CQL 3.

This is similar to the talk I did for the Cassandra Summit but all examples are in CQL 3.



Total Views
Views on SlideShare
Embed Views



2 Embeds 54 53 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Cassandra data modeling talk Cassandra data modeling talk Presentation Transcript

  • Building a Cassandra Based Application From 0 to Deploy Patrick McFadin Solution Architect at DataStaxWednesday, November 7, 12
  • Me • Solution Architect at DataStax, THE Cassandra company • Cassandra user since .7 • Follow me here: @PatrickMcFadinWednesday, November 7, 12
  • Goals • Take a new application concept • What is the data model?? • Express that in CQL 3 • Some sample codeWednesday, November 7, 12
  • The Plan • Conceptualize a new application • Identify the entity tables • Identify query tables • Code. Rinse. Repeat. • DeployWednesday, November 7, 12
  • Start with a Video Title Username concept Recommended Description Meow Ads by Google Text • Video sharing Rating: Tags: Foo Bar website Upload New! Comments *Cat drawing by goodrob13 on FlickrWednesday, November 7, 12
  • Break down the features • Post a video • View a video • Add a comment • Rate a video • Tag a videoWednesday, November 7, 12
  • Create Entity Tables Basic storage unitWednesday, November 7, 12
  • Users firstname lastname email password created_date Username • Similar to a RDBMS table. Fairly fixed columns • Username is unique • Use secondary indexes on firstname and lastname for lookup • Adding columns with Cassandra is super easy CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username) );Wednesday, November 7, 12
  • Users: The insert code static void setUser(User user, Keyspace keyspace) { // Create a mutator that allows you to talk to casssandra Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer); try { // Use the mutator to insert data into our table mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("firstname", user.getFirstname())); mutator.addInsertion(user.getUsername(), "users”, HFactory.createStringColumn("lastname", user.getLastname())); mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("password", user.getPassword())); // Once the mutator is ready, execute on cassandra mutator.execute(); } catch (HectorException he) { he.printStackTrace(); } }Wednesday, November 7, 12
  • Videos (one-to-many) VideoId videoname username description tags upload_date <UUID> • Use a UUID as a row key for uniqueness • Allows for same video names • Tags should be stored in some sort of delimited format • Index on username may not be the best plan CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY (videoid,videoname) );Wednesday, November 7, 12
  • Videos: The get code static Video getVideoByUUID(UUID videoId, Keyspace keyspace){ Video video = new Video(); //Create a slice query. Well be getting specific column names SliceQuery<UUID, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer); sliceQuery.setColumnFamily("videos"); sliceQuery.setKey(videoId); sliceQuery.setColumnNames("videoname","username","description","tags"); // Execute the query and get the list of columns ColumnSlice<String,String> result = sliceQuery.execute().get(); // Get each column by name and add them to our video object video.setVideoName(result.getColumnByName("videoname").getValue()); video.setUsername(result.getColumnByName("username").getValue()); video.setDescription(result.getColumnByName("description").getValue()); video.setTags(result.getColumnByName("tags").getValue().split(",")); return video; }Wednesday, November 7, 12
  • Comments (many-to-many) VideoId username comment_ts comment <UUID> • Videos have many comments • Comments have many users • Order is as inserted • Use getSlice() to pull some or all of the comments CREATE TABLE comments ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username,comment_ts) );Wednesday, November 7, 12
  • Comments... pt 2 VideoId username:comment_ts .. username:comment_ts <UUID> comment .. comment Wide row Time ordered • This is what’s really going on • VideoID is the key • Composite of username and comment_ts are the column name • 1 column per commentWednesday, November 7, 12
  • Ratings rating_count rating_total VideoId <UUID> <counter> <counter> • Use counter for single call update • rating_count is how many ratings were given • rating_total is the sum of rating • Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6 CREATE TABLE video_rating ( videoid uuid, rating_counter counter, rating_total counter, PRIMARY KEY (videoid) );Wednesday, November 7, 12
  • Video Events start_<timestamp> stop_<timestamp> start_<timestamp> VideoId:Username video_<timestamp> Latest .. Oldest • Track viewing events • Combine Video ID and Username for a unique row • Stop time can be used to pick up where they left off • Great for usage analytics later • Reverse comparator! CREATE TABLE video_event ( videoid_username varchar, event varchar, event_timestamp timestamp, video_timestamp bigint, PRIMARY KEY (videoid_username, event_timestamp, event) ) WITH CLUSTERING ORDER BY (event_timestamp DESC, event ASC);Wednesday, November 7, 12
  • Create Query Tables Indexes to support fast lookupsWednesday, November 7, 12
  • Index table principles Lookup5RowKey5 RowKey1 • Lookup by rowkey RowKey2 RowKey3 RowKey4 RowKey5 RowKey6 • Indexed • Cached (most times) RowKey7 RowKey8 RowKey9 RowKey10 RowKey11 RowKey12Wednesday, November 7, 12
  • Index table principles Col3 Col4 Col5 Col6 GetSlice6Col37Col6 Sequential Read RowKey5 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 • Get row by the key • Slice. Get data in one pass • Cached (sometimes)Wednesday, November 7, 12
  • Video by Username VideoId:<timestamp> .. VideoId:<timestamp> Username Wide row • Username is unique • One column for each new video uploaded • Column slice for time span. From x to y • VideoId is added the same time a Video record is added CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid, upload_date) );Wednesday, November 7, 12
  • Video by Tag VideoId .. VideoId tag timestamp timestamp • Tag is unique regardless of video • Great for “List videos with X tag” • Tags have to be updated in Video and Tag at the same time • Index integrity is maintained in app logic CREATE TABLE tag_index ( tag varchar, videoid varchar, timestamp timestamp, PRIMARY KEY (tag, videoid) );Wednesday, November 7, 12
  • Deployment • Replication factor? • Multi-datacenter? • Cost?Wednesday, November 7, 12
  • Deployment • Today != tomorrow • Scale when needed • Have expansion plan readyWednesday, November 7, 12
  • DataStax Enterprise • Analytics - Hadoop • Search - SolrWednesday, November 7, 12
  • Hadoop • Embedded with Cassandra • No single point of failure • Use native c* data • Hive, Pig, MahoutWednesday, November 7, 12
  • Solr • Embeded with Cassandra • Fast reverse-index • Shards Solr by key rangeWednesday, November 7, 12
  • OpsCenterWednesday, November 7, 12
  • Thank you! Connect with me at @PatrickMcFadin Or linkedInWednesday, November 7, 12