The world's next top data model


Published on


Published in: Technology, Business

The world's next top data model

  1. 1. #CASSANDRA13Patrick McFadin | Solution Architect, DataStaxThe Worlds Next Top Data ModelMonday, June 24, 13
  2. 2. #CASSANDRA13The saga continues!★ Data model is dead, long live the datamodel.★ Bridging from Relational to Cassandra★ Become a Super Modeler★ Core data modeling techniques usingCQLMonday, June 24, 13
  3. 3. #CASSANDRA13Because I love talking about thisJust to recap...Monday, June 24, 13
  4. 4. #CASSANDRA13Why does this matter?* Cassandra lives closer to your users or applications* Not a hammer for all use case nails* Proper use case, proper model...* Get it wrong and...Monday, June 24, 13
  5. 5. #CASSANDRA13When to use Cassandra** Need to be in more than one datacenter. active-active* Scaling from 0 to, uh, well... we’re not really sure.* Need as close to 100% uptime as possible.* Getting these from any other solution would just be mega $and...*nutshell version. These are all ORs not ANDsMonday, June 24, 13
  6. 6. #CASSANDRA13You get the datamodel right!Monday, June 24, 13
  7. 7. #CASSANDRA13So let’s do that* Four real world examples* Use case, what they were avoiding and model to accomplish* You may think this is you, but it isn’t. I hear these all the time.* All examples are in CQL3Monday, June 24, 13
  8. 8. #CASSANDRA13But wait you sayCQL doesn’t do dynamic wide rows!Monday, June 24, 13
  9. 9. #CASSANDRA13Yes it can!* CQL does wide rows the same way you did them in Thrift* No really* Read this blog post just trust me and I’ll show you howMonday, June 24, 13
  10. 10. #CASSANDRA13Customers giving you money is a good reason for uptimeShopping Cart Data ModelMonday, June 24, 13
  11. 11. #CASSANDRA13Shopping cart use case* Store shopping cart data reliably* Minimize (or eliminate) downtime. Multi-dc* Scale for the “Cyber Monday” problem* Every minute off-line is lost $$* Online shoppers want speed!The badMonday, June 24, 13
  12. 12. #CASSANDRA13Shopping cart data model* Each customer can haveone or more shopping carts* De-normalize data for fastaccess* Shopping cart == Onepartition (Row LevelIsolation)* Each item a new columnMonday, June 24, 13
  13. 13. #CASSANDRA13Shopping cart data modelCREATE TABLE user (! username varchar,! firstname varchar,! lastname varchar,! shopping_carts set<varchar>,! PRIMARY KEY (username));CREATE TABLE shopping_cart (! username varchar,! cart_name text! item_id int,! item_name varchar,description varchar,! price float,! item_detail map<varchar,varchar>! PRIMARY KEY ((username,cart_name),item_id));INSERT INTO shopping_cart(username,cart_name,item_id,item_name,description,price,item_detail)VALUES (pmcfadin,Gadgets I want,8675309,Garmin910XT,Multisport training watch,349.99,{Related:Timex sports watch,Volume Discount:10});INSERT INTO shopping_cart(username,cart_name,item_id,item_name,description,price,item_detail)VALUES (pmcfadin,Gadgets I want,9748575,Polaris FootPod,Bluetooth Smart foot pod,64.00{Related:Timex foot pod,Volume Discount:25});One partition (storage row) of dataItem details. Flexible for whatevPartition row key for one users cartCreates partition row keyMonday, June 24, 13
  14. 14. #CASSANDRA13Watching users, making decisions. Freaky, but cool.User Activity TrackingMonday, June 24, 13
  15. 15. #CASSANDRA13User activity use case* React to user input in real time* Support for multiple application pods* Scale for speed* Losing interactions is costly* Waiting for batch(hadoop) is to longThe badMonday, June 24, 13
  16. 16. #CASSANDRA13User activity data model* Interaction points stored peruser in short table* Long term interaction storedin similar table with datepartition* Process long term laterusing batch* Reverse time series to getlast N itemsMonday, June 24, 13
  17. 17. #CASSANDRA13User activity data modelCREATE TABLE user_activity (! username varchar,! interaction_time timeuuid,! activity_code varchar,! detail varchar,! PRIMARY KEY (username, interaction_time)) WITH CLUSTERING ORDER BY (interaction_time DESC);CREATE TABLE user_activity_history (! username varchar,! interaction_date varchar,! interaction_time timeuuid,! activity_code varchar,! detail varchar,! PRIMARY KEY ((username,interaction_date),interaction_time));INSERT INTO user_activity(username,interaction_time,activity_code,detail)VALUES (pmcfadin,0D1454E0-D202-11E2-8B8B-0800200C9A66,100,Normallogin)USING TTL 2592000;INSERT INTO user_activity_history(username,interaction_date,interaction_time,activity_code,detail)VALUES (pmcfadin,20130605,0D1454E0-D202-11E2-8B8B-0800200C9A66,100,Normal login);Reverse order based on timestampExpire after 30 daysMonday, June 24, 13
  18. 18. #CASSANDRA13Data model usageusername | interaction_time | detail | activity_code----------+--------------------------------------+------------------------------------------+------------------pmcfadin | 9ccc9df0-d076-11e2-923e-5d8390e664ec | Entered shopping area: Jewelry | 301pmcfadin | 9c652990-d076-11e2-923e-5d8390e664ec | Created shopping cart: Anniversary gifts | 202pmcfadin | 1b5cef90-d076-11e2-923e-5d8390e664ec | Deleted shopping cart: Gadgets I want | 205pmcfadin | 1b0e5a60-d076-11e2-923e-5d8390e664ec | Opened shopping cart: Gadgets I want | 201pmcfadin | 1b0be960-d076-11e2-923e-5d8390e664ec | Normal login | 100select * from user_activity limit 5;Maybe put a sale item for flowers too?Monday, June 24, 13
  19. 19. #CASSANDRA13Machines generate logs at a furious pace. Be ready.Log collection/aggregationMonday, June 24, 13
  20. 20. #CASSANDRA13Log collection use case* Collect log data at high speed* Cassandra near where logs are generated. Multi-datacenter* Dice data for various uses. Dashboard. Lookup. Etc.* The scale needed for RDBMS is cost prohibitive* Batch analysis of logs too late for some use casesThe badMonday, June 24, 13
  21. 21. #CASSANDRA13Log collection data model* Use Flume to collect and fan outdata to various tables* Tables for lookup based onsource and time* Tables for dashboard withaggregation and summationMonday, June 24, 13
  22. 22. #CASSANDRA13Log collection data modelCREATE TABLE log_lookup (! source varchar,! date_to_minute varchar,! timestamp timeuuid,! raw_log blob,! PRIMARY KEY ((source,date_to_minute),timestamp));CREATE TABLE login_success (! source varchar,! date_to_minute varchar,! successful_logins counter,! PRIMARY KEY (source,date_to_minute)) WITH CLUSTERING ORDER BY (date_to_minute DESC);CREATE TABLE login_failure (! source varchar,! date_to_minute varchar,! failed_logins counter,! PRIMARY KEY (source,date_to_minute)) WITH CLUSTERING ORDER BY (date_to_minute DESC);Consider storing raw logs as GZIPMonday, June 24, 13
  23. 23. #CASSANDRA13Log dashboard025507510010:01 10:03 10:05 10:07 10:09 10:11 10:13 10:15 10:17 10:19Sucessful LoginsFailed LoginsSELECT date_to_minute,successful_loginsFROM login_successLIMIT 20;SELECT date_to_minute,failed_loginsFROM login_failureLIMIT 20;Monday, June 24, 13
  24. 24. #CASSANDRA13Because mistaks mistakes happenUser Form VersioningMonday, June 24, 13
  25. 25. #CASSANDRA13Form versioning use case* Store every possible version efficiently* Scale to any number of users* Commit/Rollback functionality on a form* In RDBMS, many relations that need complicated join* Needs to be in cloud and local data centerThe badMonday, June 24, 13
  26. 26. #CASSANDRA13Form version data model* Each user has a form* Each form needs versioning* Separate table to store liveversion* Exclusive lock on a formMonday, June 24, 13
  27. 27. #CASSANDRA13Form version data modelCREATE TABLE working_version (! username varchar,! form_id int,! version_number int,! locked_by varchar,! form_attributes map<varchar,varchar>! PRIMARY KEY ((username, form_id), version_number)) WITH CLUSTERING ORDER BY (version_number DESC);INSERT INTO working_version(username, form_id, version_number, locked_by, form_attributes)VALUES (pmcfadin,1138,1,,{FirstName<text>:First Name: ,LastName<text>:Last Name: ,EmailAddress<text>:Email Address: ,Newsletter<radio>:Y,N});UPDATE working_versionSET locked_by = pmcfadinWHERE username = pmcfadinAND form_id = 1138AND version_number = 1;INSERT INTO working_version(username, form_id, version_number, locked_by, form_attributes)VALUES (pmcfadin,1138,2,null,{FirstName<text>:First Name: ,LastName<text>:Last Name: ,EmailAddress<text>:Email Address: ,Newsletter<checkbox>:Y});1. Insert first version2. Lock for one user3. Insert new version. Release lockMonday, June 24, 13
  28. 28. #CASSANDRA13That’s it!“Mind what you have learned. Save you it can.”- Yoda. Master Data ModelerMonday, June 24, 13
  29. 29. #CASSANDRA13Your data model is next!* Try out a few things* See what works* All else fails, engage an expert in the community* Want more? Follow me on twitter: @PatrickMcFadinMonday, June 24, 13