SlideShare a Scribd company logo
The data model is dead,
long live the data model!!
Patrick McFadin
Senior Solutions Architect
DataStax
Thursday, May 2, 13
The data model is dead,
long live the data model!!
Patrick McFadin
Senior Solutions Architect
DataStax
Thursday, May 2, 13
Bridging the divide
The era of relational everything is over
The era of Polyglot Persistence* has begun
* http://www.martinfowler.com/bliki/PolyglotPersistence.html
Thursday, May 2, 13
Coming from a relational world
Tradeoffs are hard
Feature RDBMS Cassandra
Single Point of
Failure
Cross Datacenter
Linear Scaling
Data modeling
Thursday, May 2, 13
Background -The data model
•The data model is alive and well
• Models define the business requirements
• Define of the structure of your data
• Relational is just one type (Network model anyone?)
4
Wait? I thought NoSQL meant no model?
Thursday, May 2, 13
Background - ACID vs CAP
5
ACID
CAP - Pick two
Atomic - All or none
Consistency - Only valid data is written
Isolation - One operation at a time
Durability - Once committed, it stays that way
Consistency - All data on cluster
Availability - Cluster always accepts writes
Partition tolerance - Nodes in cluster can’t talk to each other
Thursday, May 2, 13
Background - ACID vs CAP
5
ACID
CAP - Pick two
Atomic - All or none
Consistency - Only valid data is written
Isolation - One operation at a time
Durability - Once committed, it stays that way
Consistency - All data on cluster
Availability - Cluster always accepts writes
Partition tolerance - Nodes in cluster can’t talk to each other
Thursday, May 2, 13
Background - ACID vs CAP
5
ACID
CAP - Pick two
Atomic - All or none
Consistency - Only valid data is written
Isolation - One operation at a time
Durability - Once committed, it stays that way
Consistency - All data on cluster
Availability - Cluster always accepts writes
Partition tolerance - Nodes in cluster can’t talk to each other
Cassandra let’s you tune this
Thursday, May 2, 13
Relational Background - Normal forms
•This IS the relational model
• 5 normal forms
• Need foreign keys
• Need joins
6
id First Last
1 Edgar Codd
2 Raymond Boyce
id Dept
1 Engineering
2 Math
Employees
Department
Thursday, May 2, 13
Background - How Cassandra Stores Data
• Model brought from big table*
• Row Key and a lot of columns
• Column names sorted (UTF8, Int,Timestamp, etc)
7
Column Name ... Column Name
ColumnValue ColumnValue
Timestamp Timestamp
TTL TTL
Row Key
1 2 Billion
* http://research.google.com/archive/bigtable.html
Thursday, May 2, 13
Background - How Cassandra Stores Data
• Rows belong to a node and are replicated
• Row lookups are fast
• Randomly distributed in cluster
8
RowKey1
RowKey2
RowKey3
RowKey4
RowKey5
RowKey6
RowKey7
RowKey8
RowKey9
RowKey10
RowKey11
RowKey12
Lookup5RowKey5
Thursday, May 2, 13
Relational Concept - Sequences
• Handy feature for auto-creation of Ids
• Guaranteed unique
• Depends on a single source of truth (one server)
9
INSERT INTO user (id, firstName, LastName)
VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)
Thursday, May 2, 13
Cassandra Concept - No sequences
• Difficult in a distributed system
• Requires a lock (perf killer)
• What to do?
- Use part of the data to create a unique index, or...
- UUID to the rescue!
10
Thursday, May 2, 13
Concept - UUID
• Universal Unique ID
• 128 bit number represented in character form
• Easily generated on the client
• Same as GUID for the MS folks
11
99051fe9-6a9c-46c2-b949-38ef78858dd0
RFC 4122 if you want a reference
Thursday, May 2, 13
Cassandra Concept - Entity model
• User table (!!)
• Username is the unique key
• Static but can be changed dynamically without downtime
12
CREATE TABLE users (
username varchar,
firstname varchar,
lastname varchar,
email varchar,
password varchar,
created_date timestamp,
PRIMARY KEY (username)
);
ALTER TABLE users ADD city text;
Thursday, May 2, 13
Relational Concept - De-normalization
•To combine relations into a single row
• Used in relational modeling to avoid complex joins
13
id First Last
1 Edgar Codd
2 Raymond Boyce
id Dept
1 Engineering
2 Math
Employees
Department
SELECT e.First, e.Last, d.Dept
FROM Department d, Employees e
WHERE 1 = e.id
AND e.id = d.id
Take this and then...
Thursday, May 2, 13
Relational Concept - De-normalization
• Combine table columns into a single view
• No joins
• All in how you set the data for fast reads
14
SELECT First, Last, Dept
FROM employees
WHERE id = ‘1’
id First Last Dept
1 Edgar Codd Engineering
2 Raymond Boyce Math
Employees
Thursday, May 2, 13
Cassandra Concept - One-to-Many
• Relationship without being relational
• Users have many videos
• Wait? Where is the foreign key?
15
username firstname lastname email
tcodd Edgar Codd tcodd@relational.com
rboyce Raymond Boyce rboyce@relational.com
videoid videoname username description tags
99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lol
b3a76c6b Math tcodd Now my dog plays dogs,piano,lol
Users
Videos
Thursday, May 2, 13
Cassandra Concept - One-to-many
• Static table to store videos
• UUID for unique video id
• Add username to denormalize
16
CREATE TABLE videos (
videoid uuid,
videoname varchar,
username varchar,
description varchar,
tags varchar,
upload_date timestamp,
PRIMARY KEY(videoid)
);
Thursday, May 2, 13
Cassandra Concept - One-to-Many
• Lookup video by username
• Write in two tables at once for fast lookups
17
CREATE TABLE username_video_index (
username varchar,
videoid uuid,
upload_date timestamp,
video_name varchar,
PRIMARY KEY (username, videoid)
);
SELECT video_name
FROM username_video_index
WHERE username = ‘ctodd’
AND videoid = ‘99051fe9’
Creates a wide row!
Thursday, May 2, 13
Cassandra concept - Many-to-many
• Users and videos have many comments
18
username firstname lastname email
tcodd Edgar Codd tcodd@relational.com
rboyce Raymond Boyce rboyce@relational.com
videoid videoname username description tags
99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lol
b3a76c6b Math tcodd Now my dog plays dogs,piano,lol
Users
Videos
username videoid comment
tcodd 99051fe9 Sweet!
rboyce b3a76c6b Boring :(
Comments
Thursday, May 2, 13
Cassandra concept - Many-to-many
• Model both sides of the view
• Insert both when comment is created
•View from either side
19
CREATE TABLE comments_by_video (
videoid uuid,
username varchar,
comment_ts timestamp,
comment varchar,
PRIMARY KEY (videoid,username)
);
CREATE TABLE comments_by_user (
username varchar,
videoid uuid,
comment_ts timestamp,
comment varchar,
PRIMARY KEY (username,videoid)
);
Thursday, May 2, 13
Cassandra concept - Many-to-many
• Model both sides of the view
• Insert both when comment is created
•View from either side
19
CREATE TABLE comments_by_video (
videoid uuid,
username varchar,
comment_ts timestamp,
comment varchar,
PRIMARY KEY (videoid,username)
);
CREATE TABLE comments_by_user (
username varchar,
videoid uuid,
comment_ts timestamp,
comment varchar,
PRIMARY KEY (username,videoid)
);
Don’t be afraid of writes. Bring it!
Thursday, May 2, 13
Relational Concept -Transactions
• Built in and easy to use
• Can be slow and heavy so don’t use them all the time
• Normal forms force ACID writes into many tables
20
lock
-change table one
-change table two
-change table three
commit
-or-
lock
-change table one
-change table two
-change table three
rollback
Thursday, May 2, 13
Crazy Concept - Do you need a transaction?
• Since they were easy in RDBMS, was it just default?
• Read this article
• In a nutshell,
21
http://www.eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf
Asynchronous transaction
Cashier takes your money
Barista makes your coffee
Error? Barista deals with it
Thursday, May 2, 13
Cassandra Concept -Transaction quality
• Requires a lock, which is costly in distributed systems
• Cassandra features can be used to advantage
- Row level isolation
- Atomic batches
22
Thursday, May 2, 13
Cassandra Concept -Transaction
•Track that something happened
• Use time stamps to preserve order
• Rectify when any doubt (just like banks do)
23
CREATE TABLE credit_transaction (
username varchar,
type varchar,
datetime timestamp,
credits int,
PRIMARY KEY (username,datetime,type)
) WITH CLUSTERING ORDER BY (datetime DESC, type ASC);
Create this table
Sort the columns in reverse order
so last action is first on the list
Thursday, May 2, 13
Cassandra Concept -Transaction
• All transactions are stored
•Think RPN calculator, latest first
24
ADD:2013-04-25
21:10:32.745
REMOVE:2013-04-25
15:45:22.813
ADD:2013-04-25
07:15:12.542
$20 $5 $100
tcodd
Rectify account: + $100
- $5
+ 20
---------
= $115 Current balance
Thursday, May 2, 13
Cassandra Concept -Transaction
25
Create credit_transaction record
with ADD +Timestamp
Read user record total_credits
and credit_timestamp
user credit_timestamp <
credit_transaction
timestamp?
Set back in user record
credit_timestamp and
incremented total_credits
Create credit_transaction record
with REMOVE +Timestamp
Read user record total_credits
and credit_timestamp
user credit_timestamp <
credit_transaction
timestamp?
Set back in user record
credit_timestamp and
decremented total_credits
Fail transaction
and rectify
Success
Add Credit Remove credit
Thursday, May 2, 13
And if that doesn’t work...
• Lightweight transactions coming soon.
• Cassandra 2.0
• See CASSANDRA-5062
26
Thursday, May 2, 13
But wait there is more!!
•The next in this series: May 16th
27
Become a super modeler
• Final will be at the Cassandra Summit: June 11th
The worlds next top data model
Thursday, May 2, 13
Be there!!!
28
Sony, eBay, Netflix, Intuit, Spotify... the list goes on. Don’t miss it.
Thursday, May 2, 13
ThankYou
Q&A
Thursday, May 2, 13

More Related Content

What's hot

High Availability and Disaster Recovery in PostgreSQL - EQUNIX
High Availability and Disaster Recovery in PostgreSQL - EQUNIXHigh Availability and Disaster Recovery in PostgreSQL - EQUNIX
High Availability and Disaster Recovery in PostgreSQL - EQUNIX
Julyanto SUTANDANG
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
TO THE NEW | Technology
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
Mydbops
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
Jean-François Gagné
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
Mydbops
 
InnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick FiguresInnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick Figures
Karwin Software Solutions LLC
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
MongoDB
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
Knoldus Inc.
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
PgDay.Seoul
 
Monitoring all Elements of Your Database Operations With Zabbix
Monitoring all Elements of Your Database Operations With ZabbixMonitoring all Elements of Your Database Operations With Zabbix
Monitoring all Elements of Your Database Operations With Zabbix
Zabbix
 
NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"
Sushant Choudhary
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Mydbops
 
HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18
Derek Downey
 
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
PgDay.Seoul
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용
I Goo Lee
 
Querying Distributed Tables in Citus
Querying Distributed Tables in CitusQuerying Distributed Tables in Citus
Querying Distributed Tables in Citus
Shubhangi Pardeshi
 
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
PgDay.Seoul
 
Automated master failover
Automated master failoverAutomated master failover
Automated master failover
Yoshinori Matsunobu
 
Architecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker servicesArchitecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker services
LINE Corporation
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 

What's hot (20)

High Availability and Disaster Recovery in PostgreSQL - EQUNIX
High Availability and Disaster Recovery in PostgreSQL - EQUNIXHigh Availability and Disaster Recovery in PostgreSQL - EQUNIX
High Availability and Disaster Recovery in PostgreSQL - EQUNIX
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
 
InnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick FiguresInnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick Figures
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
 
Monitoring all Elements of Your Database Operations With Zabbix
Monitoring all Elements of Your Database Operations With ZabbixMonitoring all Elements of Your Database Operations With Zabbix
Monitoring all Elements of Your Database Operations With Zabbix
 
NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"NewSQL: The Best of Both "OldSQL" and "NoSQL"
NewSQL: The Best of Both "OldSQL" and "NoSQL"
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18HandsOn ProxySQL Tutorial - PLSC18
HandsOn ProxySQL Tutorial - PLSC18
 
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
[pgday.Seoul 2022] POSTGRES 테스트코드로 기여하기 - 이동욱
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용
 
Querying Distributed Tables in Citus
Querying Distributed Tables in CitusQuerying Distributed Tables in Citus
Querying Distributed Tables in Citus
 
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
 
Automated master failover
Automated master failoverAutomated master failover
Automated master failover
 
Architecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker servicesArchitecture Sustaining LINE Sticker services
Architecture Sustaining LINE Sticker services
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
 

Similar to The data model is dead, long live the data model

Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
DataStax Academy
 
Hindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to CassandraHindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to Cassandra
Michael Kjellman
 
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael KjellmanC* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
DataStax Academy
 
What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?
Ivan Zoratti
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache Cassandra
Patrick McFadin
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
Patrick McFadin
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
nickmbailey
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
Jon Haddad
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Cesare Cugnasco
 
Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)
Nenad Bozic
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
Brent Theisen
 
Big Data with MySQL
Big Data with MySQLBig Data with MySQL
Big Data with MySQL
Ivan Zoratti
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
DataStax
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
Morningstar Tech Talks
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Databricks
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
Jesus Guzman
 
Cassandra at Zalando
Cassandra at ZalandoCassandra at Zalando
Cassandra at Zalando
Luis Mineiro
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Data Con LA
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
Christian Johannsen
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
Jon Haddad
 

Similar to The data model is dead, long live the data model (20)

Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
 
Hindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to CassandraHindsight is 20/20: MySQL to Cassandra
Hindsight is 20/20: MySQL to Cassandra
 
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael KjellmanC* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman
 
What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache Cassandra
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
 
Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)Coming to cassandra from relational world (New)
Coming to cassandra from relational world (New)
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Big Data with MySQL
Big Data with MySQLBig Data with MySQL
Big Data with MySQL
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim HunterWeb-Scale Graph Analytics with Apache Spark with Tim Hunter
Web-Scale Graph Analytics with Apache Spark with Tim Hunter
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra at Zalando
Cassandra at ZalandoCassandra at Zalando
Cassandra at Zalando
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 

More from Patrick McFadin

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
Patrick McFadin
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!
Patrick McFadin
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
Patrick McFadin
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
Patrick McFadin
 
Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.
Patrick McFadin
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
Patrick McFadin
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
Patrick McFadin
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
Patrick McFadin
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
Patrick McFadin
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
Patrick McFadin
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
Patrick McFadin
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Patrick McFadin
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
Patrick McFadin
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
Patrick McFadin
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
Patrick McFadin
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guide
Patrick McFadin
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
Patrick McFadin
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
Patrick McFadin
 

More from Patrick McFadin (20)

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guide
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
 

Recently uploaded

Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta
 
Overview of Enterprise-scale landing zones using Cloud Adoption Framework Rea...
Overview of Enterprise-scale landing zones using Cloud Adoption Framework Rea...Overview of Enterprise-scale landing zones using Cloud Adoption Framework Rea...
Overview of Enterprise-scale landing zones using Cloud Adoption Framework Rea...
MarceloMiranda38200
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
ishalveerrandhawa1
 
Gen-AI in Telcos: Strategies, Challenges & Impact | Torry Harris Integration ...
Gen-AI in Telcos: Strategies, Challenges & Impact | Torry Harris Integration ...Gen-AI in Telcos: Strategies, Challenges & Impact | Torry Harris Integration ...
Gen-AI in Telcos: Strategies, Challenges & Impact | Torry Harris Integration ...
Torry Harris
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
rajancomputerfbd
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
Empowering Businesses in the Digital Age
Empowering Businesses in the Digital AgeEmpowering Businesses in the Digital Age
Empowering Businesses in the Digital Age
Bert Blevins
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
The Role of Technology in Payroll Statutory Compliance (1).pdf
The Role of Technology in Payroll Statutory Compliance (1).pdfThe Role of Technology in Payroll Statutory Compliance (1).pdf
The Role of Technology in Payroll Statutory Compliance (1).pdf
paysquare consultancy
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Bert Blevins
 
ScrumGathering New Orleans 2024 Catherine Louis.pdf
ScrumGathering New Orleans 2024  Catherine Louis.pdfScrumGathering New Orleans 2024  Catherine Louis.pdf
ScrumGathering New Orleans 2024 Catherine Louis.pdf
Global Agile Consulting- CLL-Group, LLC
 
ARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdf
ARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdfARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdf
ARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdf
Inglês no Mundo Digital
 

Recently uploaded (20)

Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
 
Overview of Enterprise-scale landing zones using Cloud Adoption Framework Rea...
Overview of Enterprise-scale landing zones using Cloud Adoption Framework Rea...Overview of Enterprise-scale landing zones using Cloud Adoption Framework Rea...
Overview of Enterprise-scale landing zones using Cloud Adoption Framework Rea...
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Pigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdfPigging Solutions Sustainability brochure.pdf
Pigging Solutions Sustainability brochure.pdf
 
Calgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptxCalgary MuleSoft Meetup APM and IDP .pptx
Calgary MuleSoft Meetup APM and IDP .pptx
 
Gen-AI in Telcos: Strategies, Challenges & Impact | Torry Harris Integration ...
Gen-AI in Telcos: Strategies, Challenges & Impact | Torry Harris Integration ...Gen-AI in Telcos: Strategies, Challenges & Impact | Torry Harris Integration ...
Gen-AI in Telcos: Strategies, Challenges & Impact | Torry Harris Integration ...
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
Choose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presenceChoose our Linux Web Hosting for a seamless and successful online presence
Choose our Linux Web Hosting for a seamless and successful online presence
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
Empowering Businesses in the Digital Age
Empowering Businesses in the Digital AgeEmpowering Businesses in the Digital Age
Empowering Businesses in the Digital Age
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
The Role of Technology in Payroll Statutory Compliance (1).pdf
The Role of Technology in Payroll Statutory Compliance (1).pdfThe Role of Technology in Payroll Statutory Compliance (1).pdf
The Role of Technology in Payroll Statutory Compliance (1).pdf
 
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionAdvanced Techniques for Cyber Security Analysis and Anomaly Detection
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
 
ScrumGathering New Orleans 2024 Catherine Louis.pdf
ScrumGathering New Orleans 2024  Catherine Louis.pdfScrumGathering New Orleans 2024  Catherine Louis.pdf
ScrumGathering New Orleans 2024 Catherine Louis.pdf
 
ARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdf
ARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdfARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdf
ARTIFICIAL INTELLIGENCE (AI) IN MUSIC.pdf
 

The data model is dead, long live the data model

  • 1. The data model is dead, long live the data model!! Patrick McFadin Senior Solutions Architect DataStax Thursday, May 2, 13
  • 2. The data model is dead, long live the data model!! Patrick McFadin Senior Solutions Architect DataStax Thursday, May 2, 13
  • 3. Bridging the divide The era of relational everything is over The era of Polyglot Persistence* has begun * http://www.martinfowler.com/bliki/PolyglotPersistence.html Thursday, May 2, 13
  • 4. Coming from a relational world Tradeoffs are hard Feature RDBMS Cassandra Single Point of Failure Cross Datacenter Linear Scaling Data modeling Thursday, May 2, 13
  • 5. Background -The data model •The data model is alive and well • Models define the business requirements • Define of the structure of your data • Relational is just one type (Network model anyone?) 4 Wait? I thought NoSQL meant no model? Thursday, May 2, 13
  • 6. Background - ACID vs CAP 5 ACID CAP - Pick two Atomic - All or none Consistency - Only valid data is written Isolation - One operation at a time Durability - Once committed, it stays that way Consistency - All data on cluster Availability - Cluster always accepts writes Partition tolerance - Nodes in cluster can’t talk to each other Thursday, May 2, 13
  • 7. Background - ACID vs CAP 5 ACID CAP - Pick two Atomic - All or none Consistency - Only valid data is written Isolation - One operation at a time Durability - Once committed, it stays that way Consistency - All data on cluster Availability - Cluster always accepts writes Partition tolerance - Nodes in cluster can’t talk to each other Thursday, May 2, 13
  • 8. Background - ACID vs CAP 5 ACID CAP - Pick two Atomic - All or none Consistency - Only valid data is written Isolation - One operation at a time Durability - Once committed, it stays that way Consistency - All data on cluster Availability - Cluster always accepts writes Partition tolerance - Nodes in cluster can’t talk to each other Cassandra let’s you tune this Thursday, May 2, 13
  • 9. Relational Background - Normal forms •This IS the relational model • 5 normal forms • Need foreign keys • Need joins 6 id First Last 1 Edgar Codd 2 Raymond Boyce id Dept 1 Engineering 2 Math Employees Department Thursday, May 2, 13
  • 10. Background - How Cassandra Stores Data • Model brought from big table* • Row Key and a lot of columns • Column names sorted (UTF8, Int,Timestamp, etc) 7 Column Name ... Column Name ColumnValue ColumnValue Timestamp Timestamp TTL TTL Row Key 1 2 Billion * http://research.google.com/archive/bigtable.html Thursday, May 2, 13
  • 11. Background - How Cassandra Stores Data • Rows belong to a node and are replicated • Row lookups are fast • Randomly distributed in cluster 8 RowKey1 RowKey2 RowKey3 RowKey4 RowKey5 RowKey6 RowKey7 RowKey8 RowKey9 RowKey10 RowKey11 RowKey12 Lookup5RowKey5 Thursday, May 2, 13
  • 12. Relational Concept - Sequences • Handy feature for auto-creation of Ids • Guaranteed unique • Depends on a single source of truth (one server) 9 INSERT INTO user (id, firstName, LastName) VALUES (seq.nextVal(), ‘Ted’, ‘Codd’) Thursday, May 2, 13
  • 13. Cassandra Concept - No sequences • Difficult in a distributed system • Requires a lock (perf killer) • What to do? - Use part of the data to create a unique index, or... - UUID to the rescue! 10 Thursday, May 2, 13
  • 14. Concept - UUID • Universal Unique ID • 128 bit number represented in character form • Easily generated on the client • Same as GUID for the MS folks 11 99051fe9-6a9c-46c2-b949-38ef78858dd0 RFC 4122 if you want a reference Thursday, May 2, 13
  • 15. Cassandra Concept - Entity model • User table (!!) • Username is the unique key • Static but can be changed dynamically without downtime 12 CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username) ); ALTER TABLE users ADD city text; Thursday, May 2, 13
  • 16. Relational Concept - De-normalization •To combine relations into a single row • Used in relational modeling to avoid complex joins 13 id First Last 1 Edgar Codd 2 Raymond Boyce id Dept 1 Engineering 2 Math Employees Department SELECT e.First, e.Last, d.Dept FROM Department d, Employees e WHERE 1 = e.id AND e.id = d.id Take this and then... Thursday, May 2, 13
  • 17. Relational Concept - De-normalization • Combine table columns into a single view • No joins • All in how you set the data for fast reads 14 SELECT First, Last, Dept FROM employees WHERE id = ‘1’ id First Last Dept 1 Edgar Codd Engineering 2 Raymond Boyce Math Employees Thursday, May 2, 13
  • 18. Cassandra Concept - One-to-Many • Relationship without being relational • Users have many videos • Wait? Where is the foreign key? 15 username firstname lastname email tcodd Edgar Codd tcodd@relational.com rboyce Raymond Boyce rboyce@relational.com videoid videoname username description tags 99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lol b3a76c6b Math tcodd Now my dog plays dogs,piano,lol Users Videos Thursday, May 2, 13
  • 19. Cassandra Concept - One-to-many • Static table to store videos • UUID for unique video id • Add username to denormalize 16 CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY(videoid) ); Thursday, May 2, 13
  • 20. Cassandra Concept - One-to-Many • Lookup video by username • Write in two tables at once for fast lookups 17 CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid) ); SELECT video_name FROM username_video_index WHERE username = ‘ctodd’ AND videoid = ‘99051fe9’ Creates a wide row! Thursday, May 2, 13
  • 21. Cassandra concept - Many-to-many • Users and videos have many comments 18 username firstname lastname email tcodd Edgar Codd tcodd@relational.com rboyce Raymond Boyce rboyce@relational.com videoid videoname username description tags 99051fe9 My funny cat tcodd My cat plays the piano cats,piano,lol b3a76c6b Math tcodd Now my dog plays dogs,piano,lol Users Videos username videoid comment tcodd 99051fe9 Sweet! rboyce b3a76c6b Boring :( Comments Thursday, May 2, 13
  • 22. Cassandra concept - Many-to-many • Model both sides of the view • Insert both when comment is created •View from either side 19 CREATE TABLE comments_by_video ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username) ); CREATE TABLE comments_by_user ( username varchar, videoid uuid, comment_ts timestamp, comment varchar, PRIMARY KEY (username,videoid) ); Thursday, May 2, 13
  • 23. Cassandra concept - Many-to-many • Model both sides of the view • Insert both when comment is created •View from either side 19 CREATE TABLE comments_by_video ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username) ); CREATE TABLE comments_by_user ( username varchar, videoid uuid, comment_ts timestamp, comment varchar, PRIMARY KEY (username,videoid) ); Don’t be afraid of writes. Bring it! Thursday, May 2, 13
  • 24. Relational Concept -Transactions • Built in and easy to use • Can be slow and heavy so don’t use them all the time • Normal forms force ACID writes into many tables 20 lock -change table one -change table two -change table three commit -or- lock -change table one -change table two -change table three rollback Thursday, May 2, 13
  • 25. Crazy Concept - Do you need a transaction? • Since they were easy in RDBMS, was it just default? • Read this article • In a nutshell, 21 http://www.eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf Asynchronous transaction Cashier takes your money Barista makes your coffee Error? Barista deals with it Thursday, May 2, 13
  • 26. Cassandra Concept -Transaction quality • Requires a lock, which is costly in distributed systems • Cassandra features can be used to advantage - Row level isolation - Atomic batches 22 Thursday, May 2, 13
  • 27. Cassandra Concept -Transaction •Track that something happened • Use time stamps to preserve order • Rectify when any doubt (just like banks do) 23 CREATE TABLE credit_transaction ( username varchar, type varchar, datetime timestamp, credits int, PRIMARY KEY (username,datetime,type) ) WITH CLUSTERING ORDER BY (datetime DESC, type ASC); Create this table Sort the columns in reverse order so last action is first on the list Thursday, May 2, 13
  • 28. Cassandra Concept -Transaction • All transactions are stored •Think RPN calculator, latest first 24 ADD:2013-04-25 21:10:32.745 REMOVE:2013-04-25 15:45:22.813 ADD:2013-04-25 07:15:12.542 $20 $5 $100 tcodd Rectify account: + $100 - $5 + 20 --------- = $115 Current balance Thursday, May 2, 13
  • 29. Cassandra Concept -Transaction 25 Create credit_transaction record with ADD +Timestamp Read user record total_credits and credit_timestamp user credit_timestamp < credit_transaction timestamp? Set back in user record credit_timestamp and incremented total_credits Create credit_transaction record with REMOVE +Timestamp Read user record total_credits and credit_timestamp user credit_timestamp < credit_transaction timestamp? Set back in user record credit_timestamp and decremented total_credits Fail transaction and rectify Success Add Credit Remove credit Thursday, May 2, 13
  • 30. And if that doesn’t work... • Lightweight transactions coming soon. • Cassandra 2.0 • See CASSANDRA-5062 26 Thursday, May 2, 13
  • 31. But wait there is more!! •The next in this series: May 16th 27 Become a super modeler • Final will be at the Cassandra Summit: June 11th The worlds next top data model Thursday, May 2, 13
  • 32. Be there!!! 28 Sony, eBay, Netflix, Intuit, Spotify... the list goes on. Don’t miss it. Thursday, May 2, 13