Your SlideShare is downloading. ×
0
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
The world's next top data model
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The world's next top data model

6,579

Published on

My

My

Published in: Technology, Business
1 Comment
13 Likes
Statistics
Notes
  • nice on sar inlove
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
6,579
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
283
Comments
1
Likes
13
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. #CASSANDRA13Patrick McFadin | Solution Architect, DataStaxThe Worlds Next Top Data ModelMonday, June 24, 13
  • 2. #CASSANDRA13The saga continues!★ Data model is dead, long live the datamodel.★ Bridging from Relational to Cassandra★ Become a Super Modeler★ Core data modeling techniques usingCQLMonday, June 24, 13
  • 3. #CASSANDRA13Because I love talking about thisJust to recap...Monday, June 24, 13
  • 4. #CASSANDRA13Why does this matter?* Cassandra lives closer to your users or applications* Not a hammer for all use case nails* Proper use case, proper model...* Get it wrong and...Monday, June 24, 13
  • 5. #CASSANDRA13When to use Cassandra** Need to be in more than one datacenter. active-active* Scaling from 0 to, uh, well... we’re not really sure.* Need as close to 100% uptime as possible.* Getting these from any other solution would just be mega $and...*nutshell version. These are all ORs not ANDsMonday, June 24, 13
  • 6. #CASSANDRA13You get the datamodel right!Monday, June 24, 13
  • 7. #CASSANDRA13So let’s do that* Four real world examples* Use case, what they were avoiding and model to accomplish* You may think this is you, but it isn’t. I hear these all the time.* All examples are in CQL3Monday, June 24, 13
  • 8. #CASSANDRA13But wait you sayCQL doesn’t do dynamic wide rows!Monday, June 24, 13
  • 9. #CASSANDRA13Yes it can!* CQL does wide rows the same way you did them in Thrift* No really* Read this blog posthttp://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows...or just trust me and I’ll show you howMonday, June 24, 13
  • 10. #CASSANDRA13Customers giving you money is a good reason for uptimeShopping Cart Data ModelMonday, June 24, 13
  • 11. #CASSANDRA13Shopping cart use case* Store shopping cart data reliably* Minimize (or eliminate) downtime. Multi-dc* Scale for the “Cyber Monday” problem* Every minute off-line is lost $$* Online shoppers want speed!The badMonday, June 24, 13
  • 12. #CASSANDRA13Shopping cart data model* Each customer can haveone or more shopping carts* De-normalize data for fastaccess* Shopping cart == Onepartition (Row LevelIsolation)* Each item a new columnMonday, June 24, 13
  • 13. #CASSANDRA13Shopping cart data modelCREATE TABLE user (! username varchar,! firstname varchar,! lastname varchar,! shopping_carts set<varchar>,! PRIMARY KEY (username));CREATE TABLE shopping_cart (! username varchar,! cart_name text! item_id int,! item_name varchar,description varchar,! price float,! item_detail map<varchar,varchar>! PRIMARY KEY ((username,cart_name),item_id));INSERT INTO shopping_cart(username,cart_name,item_id,item_name,description,price,item_detail)VALUES (pmcfadin,Gadgets I want,8675309,Garmin910XT,Multisport training watch,349.99,{Related:Timex sports watch,Volume Discount:10});INSERT INTO shopping_cart(username,cart_name,item_id,item_name,description,price,item_detail)VALUES (pmcfadin,Gadgets I want,9748575,Polaris FootPod,Bluetooth Smart foot pod,64.00{Related:Timex foot pod,Volume Discount:25});One partition (storage row) of dataItem details. Flexible for whatevPartition row key for one users cartCreates partition row keyMonday, June 24, 13
  • 14. #CASSANDRA13Watching users, making decisions. Freaky, but cool.User Activity TrackingMonday, June 24, 13
  • 15. #CASSANDRA13User activity use case* React to user input in real time* Support for multiple application pods* Scale for speed* Losing interactions is costly* Waiting for batch(hadoop) is to longThe badMonday, June 24, 13
  • 16. #CASSANDRA13User activity data model* Interaction points stored peruser in short table* Long term interaction storedin similar table with datepartition* Process long term laterusing batch* Reverse time series to getlast N itemsMonday, June 24, 13
  • 17. #CASSANDRA13User activity data modelCREATE TABLE user_activity (! username varchar,! interaction_time timeuuid,! activity_code varchar,! detail varchar,! PRIMARY KEY (username, interaction_time)) WITH CLUSTERING ORDER BY (interaction_time DESC);CREATE TABLE user_activity_history (! username varchar,! interaction_date varchar,! interaction_time timeuuid,! activity_code varchar,! detail varchar,! PRIMARY KEY ((username,interaction_date),interaction_time));INSERT INTO user_activity(username,interaction_time,activity_code,detail)VALUES (pmcfadin,0D1454E0-D202-11E2-8B8B-0800200C9A66,100,Normallogin)USING TTL 2592000;INSERT INTO user_activity_history(username,interaction_date,interaction_time,activity_code,detail)VALUES (pmcfadin,20130605,0D1454E0-D202-11E2-8B8B-0800200C9A66,100,Normal login);Reverse order based on timestampExpire after 30 daysMonday, June 24, 13
  • 18. #CASSANDRA13Data model usageusername | interaction_time | detail | activity_code----------+--------------------------------------+------------------------------------------+------------------pmcfadin | 9ccc9df0-d076-11e2-923e-5d8390e664ec | Entered shopping area: Jewelry | 301pmcfadin | 9c652990-d076-11e2-923e-5d8390e664ec | Created shopping cart: Anniversary gifts | 202pmcfadin | 1b5cef90-d076-11e2-923e-5d8390e664ec | Deleted shopping cart: Gadgets I want | 205pmcfadin | 1b0e5a60-d076-11e2-923e-5d8390e664ec | Opened shopping cart: Gadgets I want | 201pmcfadin | 1b0be960-d076-11e2-923e-5d8390e664ec | Normal login | 100select * from user_activity limit 5;Maybe put a sale item for flowers too?Monday, June 24, 13
  • 19. #CASSANDRA13Machines generate logs at a furious pace. Be ready.Log collection/aggregationMonday, June 24, 13
  • 20. #CASSANDRA13Log collection use case* Collect log data at high speed* Cassandra near where logs are generated. Multi-datacenter* Dice data for various uses. Dashboard. Lookup. Etc.* The scale needed for RDBMS is cost prohibitive* Batch analysis of logs too late for some use casesThe badMonday, June 24, 13
  • 21. #CASSANDRA13Log collection data model* Use Flume to collect and fan outdata to various tables* Tables for lookup based onsource and time* Tables for dashboard withaggregation and summationMonday, June 24, 13
  • 22. #CASSANDRA13Log collection data modelCREATE TABLE log_lookup (! source varchar,! date_to_minute varchar,! timestamp timeuuid,! raw_log blob,! PRIMARY KEY ((source,date_to_minute),timestamp));CREATE TABLE login_success (! source varchar,! date_to_minute varchar,! successful_logins counter,! PRIMARY KEY (source,date_to_minute)) WITH CLUSTERING ORDER BY (date_to_minute DESC);CREATE TABLE login_failure (! source varchar,! date_to_minute varchar,! failed_logins counter,! PRIMARY KEY (source,date_to_minute)) WITH CLUSTERING ORDER BY (date_to_minute DESC);Consider storing raw logs as GZIPMonday, June 24, 13
  • 23. #CASSANDRA13Log dashboard025507510010:01 10:03 10:05 10:07 10:09 10:11 10:13 10:15 10:17 10:19Sucessful LoginsFailed LoginsSELECT date_to_minute,successful_loginsFROM login_successLIMIT 20;SELECT date_to_minute,failed_loginsFROM login_failureLIMIT 20;Monday, June 24, 13
  • 24. #CASSANDRA13Because mistaks mistakes happenUser Form VersioningMonday, June 24, 13
  • 25. #CASSANDRA13Form versioning use case* Store every possible version efficiently* Scale to any number of users* Commit/Rollback functionality on a form* In RDBMS, many relations that need complicated join* Needs to be in cloud and local data centerThe badMonday, June 24, 13
  • 26. #CASSANDRA13Form version data model* Each user has a form* Each form needs versioning* Separate table to store liveversion* Exclusive lock on a formMonday, June 24, 13
  • 27. #CASSANDRA13Form version data modelCREATE TABLE working_version (! username varchar,! form_id int,! version_number int,! locked_by varchar,! form_attributes map<varchar,varchar>! PRIMARY KEY ((username, form_id), version_number)) WITH CLUSTERING ORDER BY (version_number DESC);INSERT INTO working_version(username, form_id, version_number, locked_by, form_attributes)VALUES (pmcfadin,1138,1,,{FirstName<text>:First Name: ,LastName<text>:Last Name: ,EmailAddress<text>:Email Address: ,Newsletter<radio>:Y,N});UPDATE working_versionSET locked_by = pmcfadinWHERE username = pmcfadinAND form_id = 1138AND version_number = 1;INSERT INTO working_version(username, form_id, version_number, locked_by, form_attributes)VALUES (pmcfadin,1138,2,null,{FirstName<text>:First Name: ,LastName<text>:Last Name: ,EmailAddress<text>:Email Address: ,Newsletter<checkbox>:Y});1. Insert first version2. Lock for one user3. Insert new version. Release lockMonday, June 24, 13
  • 28. #CASSANDRA13That’s it!“Mind what you have learned. Save you it can.”- Yoda. Master Data ModelerMonday, June 24, 13
  • 29. #CASSANDRA13Your data model is next!* Try out a few things* See what works* All else fails, engage an expert in the community* Want more? Follow me on twitter: @PatrickMcFadinMonday, June 24, 13

×