The data model is dead,
long live the data model!!
Patrick McFadin
Senior Solutions Architect
DataStax
Thursday, May 2, 13
The data model is dead,
long live the data model!!
Patrick McFadin
Senior Solutions Architect
DataStax
Thursday, May 2, 13
Bridging the divide
The era of relational everything is over
The era of Polyglot Persistence* has begun
* http://www.martinfowler.com/bliki/PolyglotPersistence.html
Thursday, May 2, 13
Coming from a relational world
Tradeoffs are hard
Feature RDBMS Cassandra
Single Point of
Failure
Cross Datacenter
Linear Scaling
Data modeling
Thursday, May 2, 13
Background -The data model
•The data model is alive and well
• Models define the business requirements
• Define of the structure of your data
• Relational is just one type (Network model anyone?)
4
Wait? I thought NoSQL meant no model?
Thursday, May 2, 13
Background - ACID vs CAP
5
ACID
CAP - Pick two
Atomic - All or none
Consistency - Only valid data is written
Isolation - One operation at a time
Durability - Once committed, it stays that way
Consistency - All data on cluster
Availability - Cluster always accepts writes
Partition tolerance - Nodes in cluster can’t talk to each other
Thursday, May 2, 13
Background - ACID vs CAP
5
ACID
CAP - Pick two
Atomic - All or none
Consistency - Only valid data is written
Isolation - One operation at a time
Durability - Once committed, it stays that way
Consistency - All data on cluster
Availability - Cluster always accepts writes
Partition tolerance - Nodes in cluster can’t talk to each other
Thursday, May 2, 13
Background - ACID vs CAP
5
ACID
CAP - Pick two
Atomic - All or none
Consistency - Only valid data is written
Isolation - One operation at a time
Durability - Once committed, it stays that way
Consistency - All data on cluster
Availability - Cluster always accepts writes
Partition tolerance - Nodes in cluster can’t talk to each other
Cassandra let’s you tune this
Thursday, May 2, 13
Relational Background - Normal forms
•This IS the relational model
• 5 normal forms
• Need foreign keys
• Need joins
6
id First Last
1 Edgar Codd
2 Raymond Boyce
id Dept
1 Engineering
2 Math
Employees
Department
Thursday, May 2, 13
Background - How Cassandra Stores Data
• Model brought from big table*
• Row Key and a lot of columns
• Column names sorted (UTF8, Int,Timestamp, etc)
7
Column Name ... Column Name
ColumnValue ColumnValue
Timestamp Timestamp
TTL TTL
Row Key
1 2 Billion
* http://research.google.com/archive/bigtable.html
Thursday, May 2, 13
Background - How Cassandra Stores Data
• Rows belong to a node and are replicated
• Row lookups are fast
• Randomly distributed in cluster
8
RowKey1
RowKey2
RowKey3
RowKey4
RowKey5
RowKey6
RowKey7
RowKey8
RowKey9
RowKey10
RowKey11
RowKey12
Lookup5RowKey5
Thursday, May 2, 13
Relational Concept - Sequences
• Handy feature for auto-creation of Ids
• Guaranteed unique
• Depends on a single source of truth (one server)
9
INSERT INTO user (id, firstName, LastName)
VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)
Thursday, May 2, 13
Cassandra Concept - No sequences
• Difficult in a distributed system
• Requires a lock (perf killer)
• What to do?
- Use part of the data to create a unique index, or...
- UUID to the rescue!
10
Thursday, May 2, 13
Concept - UUID
• Universal Unique ID
• 128 bit number represented in character form
• Easily generated on the client
• Same as GUID for the MS folks
11
99051fe9-6a9c-46c2-b949-38ef78858dd0
RFC 4122 if you want a reference
Thursday, May 2, 13
Cassandra Concept - Entity model
• User table (!!)
• Username is the unique key
• Static but can be changed dynamically without downtime
12
CREATE TABLE users (
username varchar,
firstname varchar,
lastname varchar,
email varchar,
password varchar,
created_date timestamp,
PRIMARY KEY (username)
);
ALTER TABLE users ADD city text;
Thursday, May 2, 13
Relational Concept - De-normalization
•To combine relations into a single row
• Used in relational modeling to avoid complex joins
13
id First Last
1 Edgar Codd
2 Raymond Boyce
id Dept
1 Engineering
2 Math
Employees
Department
SELECT e.First, e.Last, d.Dept
FROM Department d, Employees e
WHERE 1 = e.id
AND e.id = d.id
Take this and then...
Thursday, May 2, 13
Relational Concept - De-normalization
• Combine table columns into a single view
• No joins
• All in how you set the data for fast reads
14
SELECT First, Last, Dept
FROM employees
WHERE id = ‘1’
id First Last Dept
1 Edgar Codd Engineering
2 Raymond Boyce Math
Employees
Thursday, May 2, 13
Cassandra Concept - One-to-Many
• Relationship without being relational
• Users have many videos
• Wait? Where is the foreign key?
15
username firstname lastname email
tcodd Edgar Codd tcodd@relational.com
rboyce Raymond Boyce rboyce@relational.com
videoid videoname username description tags
99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lol
b3a76c6b Math tcodd Now my dog plays dogs,piano,lol
Users
Videos
Thursday, May 2, 13
Cassandra Concept - One-to-many
• Static table to store videos
• UUID for unique video id
• Add username to denormalize
16
CREATE TABLE videos (
videoid uuid,
videoname varchar,
username varchar,
description varchar,
tags varchar,
upload_date timestamp,
PRIMARY KEY(videoid)
);
Thursday, May 2, 13
Cassandra Concept - One-to-Many
• Lookup video by username
• Write in two tables at once for fast lookups
17
CREATE TABLE username_video_index (
username varchar,
videoid uuid,
upload_date timestamp,
video_name varchar,
PRIMARY KEY (username, videoid)
);
SELECT video_name
FROM username_video_index
WHERE username = ‘ctodd’
AND videoid = ‘99051fe9’
Creates a wide row!
Thursday, May 2, 13
Cassandra concept - Many-to-many
• Users and videos have many comments
18
username firstname lastname email
tcodd Edgar Codd tcodd@relational.com
rboyce Raymond Boyce rboyce@relational.com
videoid videoname username description tags
99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lol
b3a76c6b Math tcodd Now my dog plays dogs,piano,lol
Users
Videos
username videoid comment
tcodd 99051fe9 Sweet!
rboyce b3a76c6b Boring :(
Comments
Thursday, May 2, 13
Cassandra concept - Many-to-many
• Model both sides of the view
• Insert both when comment is created
•View from either side
19
CREATE TABLE comments_by_video (
videoid uuid,
username varchar,
comment_ts timestamp,
comment varchar,
PRIMARY KEY (videoid,username)
);
CREATE TABLE comments_by_user (
username varchar,
videoid uuid,
comment_ts timestamp,
comment varchar,
PRIMARY KEY (username,videoid)
);
Thursday, May 2, 13
Cassandra concept - Many-to-many
• Model both sides of the view
• Insert both when comment is created
•View from either side
19
CREATE TABLE comments_by_video (
videoid uuid,
username varchar,
comment_ts timestamp,
comment varchar,
PRIMARY KEY (videoid,username)
);
CREATE TABLE comments_by_user (
username varchar,
videoid uuid,
comment_ts timestamp,
comment varchar,
PRIMARY KEY (username,videoid)
);
Don’t be afraid of writes. Bring it!
Thursday, May 2, 13
Relational Concept -Transactions
• Built in and easy to use
• Can be slow and heavy so don’t use them all the time
• Normal forms force ACID writes into many tables
20
lock
-change table one
-change table two
-change table three
commit
-or-
lock
-change table one
-change table two
-change table three
rollback
Thursday, May 2, 13
Crazy Concept - Do you need a transaction?
• Since they were easy in RDBMS, was it just default?
• Read this article
• In a nutshell,
21
http://www.eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf
Asynchronous transaction
Cashier takes your money
Barista makes your coffee
Error? Barista deals with it
Thursday, May 2, 13
Cassandra Concept -Transaction quality
• Requires a lock, which is costly in distributed systems
• Cassandra features can be used to advantage
- Row level isolation
- Atomic batches
22
Thursday, May 2, 13
Cassandra Concept -Transaction
•Track that something happened
• Use time stamps to preserve order
• Rectify when any doubt (just like banks do)
23
CREATE TABLE credit_transaction (
username varchar,
type varchar,
datetime timestamp,
credits int,
PRIMARY KEY (username,datetime,type)
) WITH CLUSTERING ORDER BY (datetime DESC, type ASC);
Create this table
Sort the columns in reverse order
so last action is first on the list
Thursday, May 2, 13
Cassandra Concept -Transaction
• All transactions are stored
•Think RPN calculator, latest first
24
ADD:2013-04-25
21:10:32.745
REMOVE:2013-04-25
15:45:22.813
ADD:2013-04-25
07:15:12.542
$20 $5 $100
tcodd
Rectify account: + $100
- $5
+ 20
---------
= $115 Current balance
Thursday, May 2, 13
Cassandra Concept -Transaction
25
Create credit_transaction record
with ADD +Timestamp
Read user record total_credits
and credit_timestamp
user credit_timestamp <
credit_transaction
timestamp?
Set back in user record
credit_timestamp and
incremented total_credits
Create credit_transaction record
with REMOVE +Timestamp
Read user record total_credits
and credit_timestamp
user credit_timestamp <
credit_transaction
timestamp?
Set back in user record
credit_timestamp and
decremented total_credits
Fail transaction
and rectify
Success
Add Credit Remove credit
Thursday, May 2, 13
And if that doesn’t work...
• Lightweight transactions coming soon.
• Cassandra 2.0
• See CASSANDRA-5062
26
Thursday, May 2, 13
But wait there is more!!
•The next in this series: May 16th
27
Become a super modeler
• Final will be at the Cassandra Summit: June 11th
The worlds next top data model
Thursday, May 2, 13
Be there!!!
28
Sony, eBay, Netflix, Intuit, Spotify... the list goes on. Don’t miss it.
Thursday, May 2, 13
ThankYou
Q&A
Thursday, May 2, 13

The data model is dead, long live the data model

  • 1.
    The data modelis dead, long live the data model!! Patrick McFadin Senior Solutions Architect DataStax Thursday, May 2, 13
  • 2.
    The data modelis dead, long live the data model!! Patrick McFadin Senior Solutions Architect DataStax Thursday, May 2, 13
  • 3.
    Bridging the divide Theera of relational everything is over The era of Polyglot Persistence* has begun * http://www.martinfowler.com/bliki/PolyglotPersistence.html Thursday, May 2, 13
  • 4.
    Coming from arelational world Tradeoffs are hard Feature RDBMS Cassandra Single Point of Failure Cross Datacenter Linear Scaling Data modeling Thursday, May 2, 13
  • 5.
    Background -The datamodel •The data model is alive and well • Models define the business requirements • Define of the structure of your data • Relational is just one type (Network model anyone?) 4 Wait? I thought NoSQL meant no model? Thursday, May 2, 13
  • 6.
    Background - ACIDvs CAP 5 ACID CAP - Pick two Atomic - All or none Consistency - Only valid data is written Isolation - One operation at a time Durability - Once committed, it stays that way Consistency - All data on cluster Availability - Cluster always accepts writes Partition tolerance - Nodes in cluster can’t talk to each other Thursday, May 2, 13
  • 7.
    Background - ACIDvs CAP 5 ACID CAP - Pick two Atomic - All or none Consistency - Only valid data is written Isolation - One operation at a time Durability - Once committed, it stays that way Consistency - All data on cluster Availability - Cluster always accepts writes Partition tolerance - Nodes in cluster can’t talk to each other Thursday, May 2, 13
  • 8.
    Background - ACIDvs CAP 5 ACID CAP - Pick two Atomic - All or none Consistency - Only valid data is written Isolation - One operation at a time Durability - Once committed, it stays that way Consistency - All data on cluster Availability - Cluster always accepts writes Partition tolerance - Nodes in cluster can’t talk to each other Cassandra let’s you tune this Thursday, May 2, 13
  • 9.
    Relational Background -Normal forms •This IS the relational model • 5 normal forms • Need foreign keys • Need joins 6 id First Last 1 Edgar Codd 2 Raymond Boyce id Dept 1 Engineering 2 Math Employees Department Thursday, May 2, 13
  • 10.
    Background - HowCassandra Stores Data • Model brought from big table* • Row Key and a lot of columns • Column names sorted (UTF8, Int,Timestamp, etc) 7 Column Name ... Column Name ColumnValue ColumnValue Timestamp Timestamp TTL TTL Row Key 1 2 Billion * http://research.google.com/archive/bigtable.html Thursday, May 2, 13
  • 11.
    Background - HowCassandra Stores Data • Rows belong to a node and are replicated • Row lookups are fast • Randomly distributed in cluster 8 RowKey1 RowKey2 RowKey3 RowKey4 RowKey5 RowKey6 RowKey7 RowKey8 RowKey9 RowKey10 RowKey11 RowKey12 Lookup5RowKey5 Thursday, May 2, 13
  • 12.
    Relational Concept -Sequences • Handy feature for auto-creation of Ids • Guaranteed unique • Depends on a single source of truth (one server) 9 INSERT INTO user (id, firstName, LastName) VALUES (seq.nextVal(), ‘Ted’, ‘Codd’) Thursday, May 2, 13
  • 13.
    Cassandra Concept -No sequences • Difficult in a distributed system • Requires a lock (perf killer) • What to do? - Use part of the data to create a unique index, or... - UUID to the rescue! 10 Thursday, May 2, 13
  • 14.
    Concept - UUID •Universal Unique ID • 128 bit number represented in character form • Easily generated on the client • Same as GUID for the MS folks 11 99051fe9-6a9c-46c2-b949-38ef78858dd0 RFC 4122 if you want a reference Thursday, May 2, 13
  • 15.
    Cassandra Concept -Entity model • User table (!!) • Username is the unique key • Static but can be changed dynamically without downtime 12 CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username) ); ALTER TABLE users ADD city text; Thursday, May 2, 13
  • 16.
    Relational Concept -De-normalization •To combine relations into a single row • Used in relational modeling to avoid complex joins 13 id First Last 1 Edgar Codd 2 Raymond Boyce id Dept 1 Engineering 2 Math Employees Department SELECT e.First, e.Last, d.Dept FROM Department d, Employees e WHERE 1 = e.id AND e.id = d.id Take this and then... Thursday, May 2, 13
  • 17.
    Relational Concept -De-normalization • Combine table columns into a single view • No joins • All in how you set the data for fast reads 14 SELECT First, Last, Dept FROM employees WHERE id = ‘1’ id First Last Dept 1 Edgar Codd Engineering 2 Raymond Boyce Math Employees Thursday, May 2, 13
  • 18.
    Cassandra Concept -One-to-Many • Relationship without being relational • Users have many videos • Wait? Where is the foreign key? 15 username firstname lastname email tcodd Edgar Codd tcodd@relational.com rboyce Raymond Boyce rboyce@relational.com videoid videoname username description tags 99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lol b3a76c6b Math tcodd Now my dog plays dogs,piano,lol Users Videos Thursday, May 2, 13
  • 19.
    Cassandra Concept -One-to-many • Static table to store videos • UUID for unique video id • Add username to denormalize 16 CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY(videoid) ); Thursday, May 2, 13
  • 20.
    Cassandra Concept -One-to-Many • Lookup video by username • Write in two tables at once for fast lookups 17 CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid) ); SELECT video_name FROM username_video_index WHERE username = ‘ctodd’ AND videoid = ‘99051fe9’ Creates a wide row! Thursday, May 2, 13
  • 21.
    Cassandra concept -Many-to-many • Users and videos have many comments 18 username firstname lastname email tcodd Edgar Codd tcodd@relational.com rboyce Raymond Boyce rboyce@relational.com videoid videoname username description tags 99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lol b3a76c6b Math tcodd Now my dog plays dogs,piano,lol Users Videos username videoid comment tcodd 99051fe9 Sweet! rboyce b3a76c6b Boring :( Comments Thursday, May 2, 13
  • 22.
    Cassandra concept -Many-to-many • Model both sides of the view • Insert both when comment is created •View from either side 19 CREATE TABLE comments_by_video ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username) ); CREATE TABLE comments_by_user ( username varchar, videoid uuid, comment_ts timestamp, comment varchar, PRIMARY KEY (username,videoid) ); Thursday, May 2, 13
  • 23.
    Cassandra concept -Many-to-many • Model both sides of the view • Insert both when comment is created •View from either side 19 CREATE TABLE comments_by_video ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username) ); CREATE TABLE comments_by_user ( username varchar, videoid uuid, comment_ts timestamp, comment varchar, PRIMARY KEY (username,videoid) ); Don’t be afraid of writes. Bring it! Thursday, May 2, 13
  • 24.
    Relational Concept -Transactions •Built in and easy to use • Can be slow and heavy so don’t use them all the time • Normal forms force ACID writes into many tables 20 lock -change table one -change table two -change table three commit -or- lock -change table one -change table two -change table three rollback Thursday, May 2, 13
  • 25.
    Crazy Concept -Do you need a transaction? • Since they were easy in RDBMS, was it just default? • Read this article • In a nutshell, 21 http://www.eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf Asynchronous transaction Cashier takes your money Barista makes your coffee Error? Barista deals with it Thursday, May 2, 13
  • 26.
    Cassandra Concept -Transactionquality • Requires a lock, which is costly in distributed systems • Cassandra features can be used to advantage - Row level isolation - Atomic batches 22 Thursday, May 2, 13
  • 27.
    Cassandra Concept -Transaction •Trackthat something happened • Use time stamps to preserve order • Rectify when any doubt (just like banks do) 23 CREATE TABLE credit_transaction ( username varchar, type varchar, datetime timestamp, credits int, PRIMARY KEY (username,datetime,type) ) WITH CLUSTERING ORDER BY (datetime DESC, type ASC); Create this table Sort the columns in reverse order so last action is first on the list Thursday, May 2, 13
  • 28.
    Cassandra Concept -Transaction •All transactions are stored •Think RPN calculator, latest first 24 ADD:2013-04-25 21:10:32.745 REMOVE:2013-04-25 15:45:22.813 ADD:2013-04-25 07:15:12.542 $20 $5 $100 tcodd Rectify account: + $100 - $5 + 20 --------- = $115 Current balance Thursday, May 2, 13
  • 29.
    Cassandra Concept -Transaction 25 Createcredit_transaction record with ADD +Timestamp Read user record total_credits and credit_timestamp user credit_timestamp < credit_transaction timestamp? Set back in user record credit_timestamp and incremented total_credits Create credit_transaction record with REMOVE +Timestamp Read user record total_credits and credit_timestamp user credit_timestamp < credit_transaction timestamp? Set back in user record credit_timestamp and decremented total_credits Fail transaction and rectify Success Add Credit Remove credit Thursday, May 2, 13
  • 30.
    And if thatdoesn’t work... • Lightweight transactions coming soon. • Cassandra 2.0 • See CASSANDRA-5062 26 Thursday, May 2, 13
  • 31.
    But wait thereis more!! •The next in this series: May 16th 27 Become a super modeler • Final will be at the Cassandra Summit: June 11th The worlds next top data model Thursday, May 2, 13
  • 32.
    Be there!!! 28 Sony, eBay,Netflix, Intuit, Spotify... the list goes on. Don’t miss it. Thursday, May 2, 13
  • 33.