Patrick McFadin | Solution Architect, DataStax
The World's Next Top Data Model
Thursday, August 1, 13
The saga continues!
★ Data model is dead, long live the data
model.
★ Bridging from Relational to Cassandra
★ Become a Sup...
Because I love talking about this
Just to recap...
Thursday, August 1, 13
Why does this matter?
* Cassandra lives closer to your users or applications
* Not a hammer for all use case nails
* Prope...
When to use Cassandra*
* Need to be in more than one datacenter. active-active
* Scaling from 0 to, uh, well... we’re not ...
You get the data
model right!
Thursday, August 1, 13
So let’s do that
* Four real world examples
* Use case, what they were avoiding and model to accomplish
* You may think th...
But wait you say
CQL doesn’t do dynamic wide rows!
Thursday, August 1, 13
Yes it can!
* CQL does wide rows the same way you did them in Thrift
* No really
* Read this blog post
http://www.datastax...
Customers giving you money is a good reason for uptime
Shopping Cart Data Model
Thursday, August 1, 13
Shopping cart use case
* Store shopping cart data reliably
* Minimize (or eliminate) downtime. Multi-dc
* Scale for the “C...
Shopping cart data model
* Each customer can have
one or more shopping carts
* De-normalize data for fast
access
* Shoppin...
Shopping cart data model
CREATE TABLE user (
! username varchar,
! firstname varchar,
! lastname varchar,
! shopping_carts...
Watching users, making decisions. Freaky, but cool.
User Activity Tracking
Thursday, August 1, 13
User activity use case
* React to user input in real time
* Support for multiple application pods
* Scale for speed
* Losi...
User activity data model
* Interaction points stored per
user in short table
* Long term interaction stored
in similar tab...
#CASSANDRA13
User activity data model
CREATE TABLE user_activity (
! username varchar,
! interaction_time timeuuid,
! acti...
#CASSANDRA13
Data model usage
username | interaction_time | detail | activity_code
----------+----------------------------...
Machines generate logs at a furious pace. Be ready.
Log collection/aggregation
Thursday, August 1, 13
Log collection use case
* Collect log data at high speed
* Cassandra near where logs are generated. Multi-datacenter
* Dic...
Log collection data model
* Use Flume to collect and fan out
data to various tables
* Tables for lookup based on
source an...
#CASSANDRA13
Log collection data model
CREATE TABLE log_lookup (
! source varchar,
! date_to_minute varchar,
! timestamp t...
#CASSANDRA13
Log dashboard
0
25
50
75
100
10:01 10:03 10:05 10:07 10:09 10:11 10:13 10:15 10:17 10:19
Sucessful Logins
Fai...
Because mistaks mistakes happen
User Form Versioning
Thursday, August 1, 13
Form versioning use case
* Store every possible version efficiently
* Scale to any number of users
* Commit/Rollback funct...
Form version data model
* Each user has a form
* Each form needs versioning
* Separate table to store live
version
* Exclu...
Form version data model
CREATE TABLE working_version (
! username varchar,
! form_id int,
! version_number int,
! locked_b...
#CASSANDRA13
That’s it!
“Mind what you have learned. Save you it can.”
- Yoda. Master Data Modeler
Thursday, August 1, 13
#CASSANDRA13
Your data model is next!
* Try out a few things
* See what works
* All else fails, engage an expert in the co...
Upcoming SlideShare
Loading in...5
×

Cassandra Community Webinar | The World's Next Top Data Model

2,924

Published on

You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,924
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
30
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Cassandra Community Webinar | The World's Next Top Data Model

  1. 1. Patrick McFadin | Solution Architect, DataStax The World's Next Top Data Model Thursday, August 1, 13
  2. 2. The saga continues! ★ Data model is dead, long live the data model. ★ Bridging from Relational to Cassandra ★ Become a Super Modeler ★ Core data modeling techniques using CQL Thursday, August 1, 13
  3. 3. Because I love talking about this Just to recap... Thursday, August 1, 13
  4. 4. Why does this matter? * Cassandra lives closer to your users or applications * Not a hammer for all use case nails * Proper use case, proper model... * Get it wrong and... Thursday, August 1, 13
  5. 5. When to use Cassandra* * Need to be in more than one datacenter. active-active * Scaling from 0 to, uh, well... we’re not really sure. * Need as close to 100% uptime as possible. * Getting these from any other solution would just be mega $ and... *nutshell version. These are all ORs not ANDs Thursday, August 1, 13
  6. 6. You get the data model right! Thursday, August 1, 13
  7. 7. So let’s do that * Four real world examples * Use case, what they were avoiding and model to accomplish * You may think this is you, but it isn’t. I hear these all the time. * All examples are in CQL3 Thursday, August 1, 13
  8. 8. But wait you say CQL doesn’t do dynamic wide rows! Thursday, August 1, 13
  9. 9. Yes it can! * CQL does wide rows the same way you did them in Thrift * No really * Read this blog post http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows ...or just trust me and I’ll show you how Thursday, August 1, 13
  10. 10. Customers giving you money is a good reason for uptime Shopping Cart Data Model Thursday, August 1, 13
  11. 11. Shopping cart use case * Store shopping cart data reliably * Minimize (or eliminate) downtime. Multi-dc * Scale for the “Cyber Monday” problem * Every minute off-line is lost $$ * Online shoppers want speed! The bad Thursday, August 1, 13
  12. 12. Shopping cart data model * Each customer can have one or more shopping carts * De-normalize data for fast access * Shopping cart == One partition (Row Level Isolation) * Each item a new column Thursday, August 1, 13
  13. 13. Shopping cart data model CREATE TABLE user ( ! username varchar, ! firstname varchar, ! lastname varchar, ! shopping_carts set<varchar>, ! PRIMARY KEY (username) ); CREATE TABLE shopping_cart ( ! username varchar, ! cart_name text ! item_id int, ! item_name varchar, description varchar, ! price float, ! item_detail map<varchar,varchar> ! PRIMARY KEY ((username,cart_name),item_id) ); INSERT INTO shopping_cart (username,cart_name,item_id,item_name,description,price,item_detail) VALUES ('pmcfadin','Gadgets I want',8675309,'Garmin 910XT','Multisport training watch',349.99, {'Related':'Timex sports watch', 'Volume Discount':'10'}); INSERT INTO shopping_cart (username,cart_name,item_id,item_name,description,price,item_detail) VALUES ('pmcfadin','Gadgets I want',9748575,'Polaris Foot Pod','Bluetooth Smart foot pod',64.00 {'Related':'Timex foot pod', 'Volume Discount':'25'}); One partition (storage row) of data Item details. Flexible for whatev Partition row key for one users cart Creates partition row key Thursday, August 1, 13
  14. 14. Watching users, making decisions. Freaky, but cool. User Activity Tracking Thursday, August 1, 13
  15. 15. User activity use case * React to user input in real time * Support for multiple application pods * Scale for speed * Losing interactions is costly * Waiting for batch(hadoop) is to long The bad Thursday, August 1, 13
  16. 16. User activity data model * Interaction points stored per user in short table * Long term interaction stored in similar table with date partition * Process long term later using batch * Reverse time series to get last N items Thursday, August 1, 13
  17. 17. #CASSANDRA13 User activity data model CREATE TABLE user_activity ( ! username varchar, ! interaction_time timeuuid, ! activity_code varchar, ! detail varchar, ! PRIMARY KEY (username, interaction_time) ) WITH CLUSTERING ORDER BY (interaction_time DESC); CREATE TABLE user_activity_history ( ! username varchar, ! interaction_date varchar, ! interaction_time timeuuid, ! activity_code varchar, ! detail varchar, ! PRIMARY KEY ((username,interaction_date),interaction_time) ); INSERT INTO user_activity (username,interaction_time,activity_code,detail) VALUES ('pmcfadin',0D1454E0-D202-11E2-8B8B-0800200C9A66,'100','Normal login') USING TTL 2592000; INSERT INTO user_activity_history (username,interaction_date,interaction_time,activity_code,detail) VALUES ('pmcfadin','20130605',0D1454E0- D202-11E2-8B8B-0800200C9A66,'100','Normal login'); Reverse order based on timestamp Expire after 30 days Thursday, August 1, 13
  18. 18. #CASSANDRA13 Data model usage username | interaction_time | detail | activity_code ----------+--------------------------------------+------------------------------------------+------------------ pmcfadin | 9ccc9df0-d076-11e2-923e-5d8390e664ec | Entered shopping area: Jewelry | 301 pmcfadin | 9c652990-d076-11e2-923e-5d8390e664ec | Created shopping cart: Anniversary gifts | 202 pmcfadin | 1b5cef90-d076-11e2-923e-5d8390e664ec | Deleted shopping cart: Gadgets I want | 205 pmcfadin | 1b0e5a60-d076-11e2-923e-5d8390e664ec | Opened shopping cart: Gadgets I want | 201 pmcfadin | 1b0be960-d076-11e2-923e-5d8390e664ec | Normal login | 100 select * from user_activity limit 5; Maybe put a sale item for flowers too? Thursday, August 1, 13
  19. 19. Machines generate logs at a furious pace. Be ready. Log collection/aggregation Thursday, August 1, 13
  20. 20. Log collection use case * Collect log data at high speed * Cassandra near where logs are generated. Multi-datacenter * Dice data for various uses. Dashboard. Lookup. Etc. * The scale needed for RDBMS is cost prohibitive * Batch analysis of logs too late for some use cases The bad Thursday, August 1, 13
  21. 21. Log collection data model * Use Flume to collect and fan out data to various tables * Tables for lookup based on source and time * Tables for dashboard with aggregation and summation Thursday, August 1, 13
  22. 22. #CASSANDRA13 Log collection data model CREATE TABLE log_lookup ( ! source varchar, ! date_to_minute varchar, ! timestamp timeuuid, ! raw_log blob, ! PRIMARY KEY ((source,date_to_minute),timestamp) ); CREATE TABLE login_success ( ! source varchar, ! date_to_minute varchar, ! successful_logins counter, ! PRIMARY KEY (source,date_to_minute) ) WITH CLUSTERING ORDER BY (date_to_minute DESC); CREATE TABLE login_failure ( ! source varchar, ! date_to_minute varchar, ! failed_logins counter, ! PRIMARY KEY (source,date_to_minute) ) WITH CLUSTERING ORDER BY (date_to_minute DESC); Consider storing raw logs as GZIP Thursday, August 1, 13
  23. 23. #CASSANDRA13 Log dashboard 0 25 50 75 100 10:01 10:03 10:05 10:07 10:09 10:11 10:13 10:15 10:17 10:19 Sucessful Logins Failed Logins SELECT date_to_minute,successful_logins FROM login_success LIMIT 20; SELECT date_to_minute,failed_logins FROM login_failure LIMIT 20; Thursday, August 1, 13
  24. 24. Because mistaks mistakes happen User Form Versioning Thursday, August 1, 13
  25. 25. Form versioning use case * Store every possible version efficiently * Scale to any number of users * Commit/Rollback functionality on a form * In RDBMS, many relations that need complicated join * Needs to be in cloud and local data center The bad Thursday, August 1, 13
  26. 26. Form version data model * Each user has a form * Each form needs versioning * Separate table to store live version * Exclusive lock on a form Thursday, August 1, 13
  27. 27. Form version data model CREATE TABLE working_version ( ! username varchar, ! form_id int, ! version_number int, ! locked_by varchar, ! form_attributes map<varchar,varchar> ! PRIMARY KEY ((username, form_id), version_number) ) WITH CLUSTERING ORDER BY (version_number DESC); INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes) VALUES ('pmcfadin',1138,1,'', {'FirstName<text>':'First Name: ', 'LastName<text>':'Last Name: ', 'EmailAddress<text>':'Email Address: ', 'Newsletter<radio>':'Y,N'}); UPDATE working_version SET locked_by = 'pmcfadin' WHERE username = 'pmcfadin' AND form_id = 1138 AND version_number = 1; INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes) VALUES ('pmcfadin',1138,2,null, {'FirstName<text>':'First Name: ', 'LastName<text>':'Last Name: ', 'EmailAddress<text>':'Email Address: ', 'Newsletter<checkbox>':'Y'}); 1. Insert first version 2. Lock for one user 3. Insert new version. Release lock Thursday, August 1, 13
  28. 28. #CASSANDRA13 That’s it! “Mind what you have learned. Save you it can.” - Yoda. Master Data Modeler Thursday, August 1, 13
  29. 29. #CASSANDRA13 Your data model is next! * Try out a few things * See what works * All else fails, engage an expert in the community * Want more? Follow me on twitter: @PatrickMcFadin Thursday, August 1, 13
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×