SlideShare a Scribd company logo
A Cassandra Data Model for Serving up 
Cat Videos 
Luke Tillman (@LukeTillman) 
Language Evangelist at DataStax
Who are you?! 
• Evangelist with a focus on the .NET Community 
• Long-time Developer 
• Recently presented at Cassandra Summit 2014 with Microsoft 
• Very Recent Denver Transplant 
2
1 What is this KillrVideo thing you speak of? 
2 Know Thy Data and Thy Queries 
3 Denormalize All the Things 
4 Building Healthy Relationships 
5 Knowing Your Limits 
3
What is this KillrVideo thing you speak of? 
4
KillrVideo, a Video Sharing Site 
• Think a YouTube competitor 
– Users add videos, rate them, comment on them, etc. 
– Can search for videos by tag
See the Live Demo, Get the Code 
• Live demo available at http://www.killrvideo.com 
– Written in C# 
– Live Demo running in Azure 
– Open source: https://github.com/luketillman/killrvideo-csharp 
• Interesting use case because of different data modeling 
challenges and the scale of something like YouTube 
– More than 1 billion unique users visit YouTube each month 
– 100 hours of video are uploaded to YouTube every minute 
6
Just How Popular are Cats on the Internet? 
7 
http://mashable.com/2013/07/08/cats-bacon-rule-internet/
Just How Popular are Cats on the Internet? 
8 
http://mashable.com/2013/07/08/cats-bacon-rule-internet/
Know Thy Data and Thy Queries
Getting to Know Your Data 
• What things do I have in the system? 
• What are the relationships between them? 
• This is your conceptual data model 
• You already do this in the RDBMS world
Some of the Entities and Relationships in KillrVideo 
11 
User 
id 
firstname 
lastname 
email 
password Video 
id 
name 
description 
location 
preview_image 
tags features 
Comment 
id 
comment 
adds 
timestamp 
posts 
timestamp 
1 
n 
n 
1 
1 
n 
n 
m 
rates 
rating
Getting to Know Your Queries 
• What are your application’s workflows? 
• How will I access the data? 
• Knowing your queries in advance is NOT optional 
• Different from RDBMS because I can’t just JOIN or create a new 
indexes to support new queries 
12
Some Application Workflows in KillrVideo 
13 
User Logs 
into site 
Show basic 
information 
about user 
Show videos 
added by a 
user 
Show 
comments 
posted by a 
user 
Search for a 
video by tag 
Show latest 
videos added 
to the site 
Show 
comments 
for a video 
Show ratings 
for a video 
Show video 
and its 
details
Some Queries in KillrVideo to Support Workflows 
14 
Users 
User Logs 
into site 
Find user by email 
address 
Show basic 
information 
about user 
Find user by id 
Comments 
Show 
comments 
for a video 
Find comments by 
video (latest first) 
Show 
comments 
posted by a 
user 
Find comments by 
user (latest first) 
Ratings 
Show ratings 
for a video Find ratings by video
Some Queries in KillrVideo to Support Workflows 
15 
Videos 
Search for a 
video by tag Find video by tag 
Show latest 
videos added 
to the site 
Find videos by date 
(latest first) 
Show video 
and its 
details 
Find video by id 
Show videos 
added by a 
user 
Find videos by user 
(latest first)
Denormalize All the Things
Breaking the Relational Mindset 
• Disk is cheap now 
• Writes in Cassandra are FAST 
• Many times we end up with a “table per query” 
• Take advantage of atomic batches to write to multiple tables 
• Similar to materialized views from the RDBMS world 
17
Users – The Relational Way 
User Logs 
into site 
Find user by email 
address 
Show basic 
information 
about user 
Find user by id 
• Single Users table with all user data and an Id Primary Key 
• Add an index on email address to allow queries by email
Users – The Cassandra Way 
User Logs 
into site 
Find user by email 
address 
Show basic 
information 
about user 
Find user by id 
CREATE 
TABLE 
user_credentials 
( 
email 
text, 
password 
text, 
userid 
uuid, 
PRIMARY 
KEY 
(email) 
); 
CREATE 
TABLE 
users 
( 
userid 
uuid, 
firstname 
text, 
lastname 
text, 
email 
text, 
created_date 
timestamp, 
PRIMARY 
KEY 
(userid) 
);
Videos Everywhere! 
20 
Show video 
and its 
details 
Find video by id 
Show videos 
added by a 
user 
Find videos by user 
(latest first) 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
location 
text, 
location_type 
int, 
preview_image_location 
text, 
tags 
set<text>, 
added_date 
timestamp, 
PRIMARY 
KEY 
(videoid) 
); 
CREATE 
TABLE 
user_videos 
( 
userid 
uuid, 
added_date 
timestamp, 
videoid 
uuid, 
name 
text, 
preview_image_location 
text, 
PRIMARY 
KEY 
(userid, 
added_date, 
videoid) 
) 
WITH 
CLUSTERING 
ORDER 
BY 
( 
added_date 
DESC, 
videoid 
ASC);
Videos Everywhere! 
Considerations When Duplicating Data 
• Can the data change? 
• How likely is it to change or how frequently will it change? 
• Do I have all the information I need to update duplicates and 
maintain consistency? 
21 
Search for a 
video by tag Find video by tag 
Show latest 
videos added 
to the site 
Find videos by date 
(latest first)
Building Healthy Relationships
Modeling Relationships – Collection Types 
• Cassandra doesn’t support JOINs, but your data will still have 
relationships (and you can still model that in Cassandra) 
• One tool available is CQL collection types 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
location 
text, 
location_type 
int, 
preview_image_location 
text, 
tags 
set<text>, 
added_date 
timestamp, 
PRIMARY 
KEY 
(videoid) 
);
Modeling Relationships – Client Side Joins 
24 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
location 
text, 
location_type 
int, 
preview_image_location 
text, 
tags 
set<text>, 
added_date 
timestamp, 
PRIMARY 
KEY 
(videoid) 
); 
Currently requires query for video, 
followed by query for user by id based 
on results of first query 
CREATE 
TABLE 
users 
( 
userid 
uuid, 
firstname 
text, 
lastname 
text, 
email 
text, 
created_date 
timestamp, 
PRIMARY 
KEY 
(userid) 
);
Modeling Relationships – Client Side Joins 
• What is the cost? Might be OK in small situations 
• Do NOT scale 
• Avoid when possible 
25
Modeling Relationships – Client Side Joins 
26 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
... 
user_firstname 
text, 
user_lastname 
text, 
user_email 
text, 
PRIMARY 
KEY 
(videoid) 
); 
CREATE 
TABLE 
users_by_video 
( 
videoid 
uuid, 
userid 
uuid, 
firstname 
text, 
lastname 
text, 
email 
text, 
PRIMARY 
KEY 
(videoid) 
); 
or
Modeling Relationships – Client Side Joins 
• Remember the considerations when you duplicate data 
• What happens if a user changes their name or email address? 
• Can I update the duplicated data? 
27
Knowing Your Limits
Cassandra Rules Can Impact Your Design 
• Video Ratings – use counters to track sum of all ratings and 
count of ratings 
CREATE 
TABLE 
videos 
( 
videoid 
uuid, 
userid 
uuid, 
name 
text, 
description 
text, 
... 
rating_counter 
counter, 
rating_total 
counter, 
PRIMARY 
KEY 
(videoid) 
); 
CREATE 
TABLE 
video_ratings 
( 
videoid 
uuid, 
rating_counter 
counter, 
rating_total 
counter, 
PRIMARY 
KEY 
(videoid) 
); 
• Counters are a good example of something with special rules
Single Nodes Have Limits Too 
• Latest videos are bucketed by 
day 
• Means all reads/writes to latest 
videos are going to same 
partition (and thus the same 
nodes) 
• Could create a hotspot 
30 
Show latest 
videos added 
to the site 
Find videos by date 
(latest first) 
CREATE 
TABLE 
latest_videos 
( 
yyyymmdd 
text, 
added_date 
timestamp, 
videoid 
uuid, 
name 
text, 
preview_image_location 
text, 
PRIMARY 
KEY 
(yyyymmdd, 
added_date, 
videoid) 
) 
WITH 
CLUSTERING 
ORDER 
BY 
( 
added_date 
DESC, 
videoid 
ASC 
);
Single Nodes Have Limits Too 
• Mitigate by adding data to the 
Partition Key to spread load 
• Data that’s already naturally a 
part of the domain 
– Latest videos by category? 
• Arbitrary data, like a bucket 
number 
– Round robin at the app level 
31 
Show latest 
videos added 
to the site 
Find videos by date 
(latest first) 
CREATE 
TABLE 
latest_videos 
( 
yyyymmdd 
text, 
bucket_number 
int, 
added_date 
timestamp, 
videoid 
uuid, 
name 
text, 
preview_image_location 
text, 
PRIMARY 
KEY 
( 
(yyyymmdd, 
bucket_number) 
added_date, 
videoid) 
) 
...
Questions? 
32 
Follow me on Twitter for updates or to ask questions later: @LukeTillman

More Related Content

Similar to Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
Subhabrata Mukherjee
 
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Gina Montgomery, V-TSP
 
Microsoft Stream - The Video Evolution
Microsoft Stream - The Video EvolutionMicrosoft Stream - The Video Evolution
Microsoft Stream - The Video Evolution
Colin Phillips
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
Patrick McFadin
 
CBDW2014 - Behavior Driven Development with TestBox
CBDW2014 - Behavior Driven Development with TestBoxCBDW2014 - Behavior Driven Development with TestBox
CBDW2014 - Behavior Driven Development with TestBox
Ortus Solutions, Corp
 
ITB2017 - Intro to Behavior Driven Development
ITB2017 - Intro to Behavior Driven DevelopmentITB2017 - Intro to Behavior Driven Development
ITB2017 - Intro to Behavior Driven Development
Ortus Solutions, Corp
 
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache SparkBig Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
BigDataExpo
 
01 introduction to entity framework
01   introduction to entity framework01   introduction to entity framework
01 introduction to entity framework
Марина Босова
 
01 introduction to entity framework
01   introduction to entity framework01   introduction to entity framework
01 introduction to entity framework
Maxim Shaptala
 
Itproadd 01 60 minute version
Itproadd 01 60 minute versionItproadd 01 60 minute version
Itproadd 01 60 minute version
Tarique_1
 
BUILD 2014 - Building end-to-end video experience with Azure Media Services
BUILD 2014 - Building end-to-end video experience with Azure Media ServicesBUILD 2014 - Building end-to-end video experience with Azure Media Services
BUILD 2014 - Building end-to-end video experience with Azure Media Services
Mingfei Yan
 
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft Stream
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft StreamECS19 - Michael Greth - Best Practice with Company Video on Microsoft Stream
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft Stream
European Collaboration Summit
 
Best Practice with Microsoft Stream and Company Video
Best Practice with Microsoft Stream and Company VideoBest Practice with Microsoft Stream and Company Video
Best Practice with Microsoft Stream and Company Video
Michael Greth
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Amazon Web Services
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
Derek Jacoby
 
October 2014 - USG Rock Eagle - Drupal 101
October 2014 - USG Rock Eagle - Drupal 101October 2014 - USG Rock Eagle - Drupal 101
October 2014 - USG Rock Eagle - Drupal 101
Eric Sembrat
 
Crossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful DegradationCrossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful Degradation
C4Media
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like system
Aree Oh
 
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...
AwsReinventSlides
 

Similar to Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos (20)

YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
 
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
 
Database History From Codd to Brewer
Database History From Codd to BrewerDatabase History From Codd to Brewer
Database History From Codd to Brewer
 
Microsoft Stream - The Video Evolution
Microsoft Stream - The Video EvolutionMicrosoft Stream - The Video Evolution
Microsoft Stream - The Video Evolution
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
CBDW2014 - Behavior Driven Development with TestBox
CBDW2014 - Behavior Driven Development with TestBoxCBDW2014 - Behavior Driven Development with TestBox
CBDW2014 - Behavior Driven Development with TestBox
 
ITB2017 - Intro to Behavior Driven Development
ITB2017 - Intro to Behavior Driven DevelopmentITB2017 - Intro to Behavior Driven Development
ITB2017 - Intro to Behavior Driven Development
 
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache SparkBig Data Expo 2015 - Hortonworks Effective use of Apache Spark
Big Data Expo 2015 - Hortonworks Effective use of Apache Spark
 
01 introduction to entity framework
01   introduction to entity framework01   introduction to entity framework
01 introduction to entity framework
 
01 introduction to entity framework
01   introduction to entity framework01   introduction to entity framework
01 introduction to entity framework
 
Itproadd 01 60 minute version
Itproadd 01 60 minute versionItproadd 01 60 minute version
Itproadd 01 60 minute version
 
BUILD 2014 - Building end-to-end video experience with Azure Media Services
BUILD 2014 - Building end-to-end video experience with Azure Media ServicesBUILD 2014 - Building end-to-end video experience with Azure Media Services
BUILD 2014 - Building end-to-end video experience with Azure Media Services
 
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft Stream
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft StreamECS19 - Michael Greth - Best Practice with Company Video on Microsoft Stream
ECS19 - Michael Greth - Best Practice with Company Video on Microsoft Stream
 
Best Practice with Microsoft Stream and Company Video
Best Practice with Microsoft Stream and Company VideoBest Practice with Microsoft Stream and Company Video
Best Practice with Microsoft Stream and Company Video
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
 
Untangling - fall2017 - week 9
Untangling - fall2017 - week 9Untangling - fall2017 - week 9
Untangling - fall2017 - week 9
 
October 2014 - USG Rock Eagle - Drupal 101
October 2014 - USG Rock Eagle - Drupal 101October 2014 - USG Rock Eagle - Drupal 101
October 2014 - USG Rock Eagle - Drupal 101
 
Crossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful DegradationCrossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful Degradation
 
[System design] Design a tweeter-like system
[System design] Design a tweeter-like system[System design] Design a tweeter-like system
[System design] Design a tweeter-like system
 
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...
AWS re:Invent 2016: Content and Data Platforms at Vevo: Rebuilding and Scalin...
 

More from DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
DataStax Academy
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
DataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
DataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
DataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 

Recently uploaded

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 

Recently uploaded (20)

Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 

Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

  • 1. A Cassandra Data Model for Serving up Cat Videos Luke Tillman (@LukeTillman) Language Evangelist at DataStax
  • 2. Who are you?! • Evangelist with a focus on the .NET Community • Long-time Developer • Recently presented at Cassandra Summit 2014 with Microsoft • Very Recent Denver Transplant 2
  • 3. 1 What is this KillrVideo thing you speak of? 2 Know Thy Data and Thy Queries 3 Denormalize All the Things 4 Building Healthy Relationships 5 Knowing Your Limits 3
  • 4. What is this KillrVideo thing you speak of? 4
  • 5. KillrVideo, a Video Sharing Site • Think a YouTube competitor – Users add videos, rate them, comment on them, etc. – Can search for videos by tag
  • 6. See the Live Demo, Get the Code • Live demo available at http://www.killrvideo.com – Written in C# – Live Demo running in Azure – Open source: https://github.com/luketillman/killrvideo-csharp • Interesting use case because of different data modeling challenges and the scale of something like YouTube – More than 1 billion unique users visit YouTube each month – 100 hours of video are uploaded to YouTube every minute 6
  • 7. Just How Popular are Cats on the Internet? 7 http://mashable.com/2013/07/08/cats-bacon-rule-internet/
  • 8. Just How Popular are Cats on the Internet? 8 http://mashable.com/2013/07/08/cats-bacon-rule-internet/
  • 9. Know Thy Data and Thy Queries
  • 10. Getting to Know Your Data • What things do I have in the system? • What are the relationships between them? • This is your conceptual data model • You already do this in the RDBMS world
  • 11. Some of the Entities and Relationships in KillrVideo 11 User id firstname lastname email password Video id name description location preview_image tags features Comment id comment adds timestamp posts timestamp 1 n n 1 1 n n m rates rating
  • 12. Getting to Know Your Queries • What are your application’s workflows? • How will I access the data? • Knowing your queries in advance is NOT optional • Different from RDBMS because I can’t just JOIN or create a new indexes to support new queries 12
  • 13. Some Application Workflows in KillrVideo 13 User Logs into site Show basic information about user Show videos added by a user Show comments posted by a user Search for a video by tag Show latest videos added to the site Show comments for a video Show ratings for a video Show video and its details
  • 14. Some Queries in KillrVideo to Support Workflows 14 Users User Logs into site Find user by email address Show basic information about user Find user by id Comments Show comments for a video Find comments by video (latest first) Show comments posted by a user Find comments by user (latest first) Ratings Show ratings for a video Find ratings by video
  • 15. Some Queries in KillrVideo to Support Workflows 15 Videos Search for a video by tag Find video by tag Show latest videos added to the site Find videos by date (latest first) Show video and its details Find video by id Show videos added by a user Find videos by user (latest first)
  • 17. Breaking the Relational Mindset • Disk is cheap now • Writes in Cassandra are FAST • Many times we end up with a “table per query” • Take advantage of atomic batches to write to multiple tables • Similar to materialized views from the RDBMS world 17
  • 18. Users – The Relational Way User Logs into site Find user by email address Show basic information about user Find user by id • Single Users table with all user data and an Id Primary Key • Add an index on email address to allow queries by email
  • 19. Users – The Cassandra Way User Logs into site Find user by email address Show basic information about user Find user by id CREATE TABLE user_credentials ( email text, password text, userid uuid, PRIMARY KEY (email) ); CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );
  • 20. Videos Everywhere! 20 Show video and its details Find video by id Show videos added by a user Find videos by user (latest first) CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid) ) WITH CLUSTERING ORDER BY ( added_date DESC, videoid ASC);
  • 21. Videos Everywhere! Considerations When Duplicating Data • Can the data change? • How likely is it to change or how frequently will it change? • Do I have all the information I need to update duplicates and maintain consistency? 21 Search for a video by tag Find video by tag Show latest videos added to the site Find videos by date (latest first)
  • 23. Modeling Relationships – Collection Types • Cassandra doesn’t support JOINs, but your data will still have relationships (and you can still model that in Cassandra) • One tool available is CQL collection types CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );
  • 24. Modeling Relationships – Client Side Joins 24 CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); Currently requires query for video, followed by query for user by id based on results of first query CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );
  • 25. Modeling Relationships – Client Side Joins • What is the cost? Might be OK in small situations • Do NOT scale • Avoid when possible 25
  • 26. Modeling Relationships – Client Side Joins 26 CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, ... user_firstname text, user_lastname text, user_email text, PRIMARY KEY (videoid) ); CREATE TABLE users_by_video ( videoid uuid, userid uuid, firstname text, lastname text, email text, PRIMARY KEY (videoid) ); or
  • 27. Modeling Relationships – Client Side Joins • Remember the considerations when you duplicate data • What happens if a user changes their name or email address? • Can I update the duplicated data? 27
  • 29. Cassandra Rules Can Impact Your Design • Video Ratings – use counters to track sum of all ratings and count of ratings CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, ... rating_counter counter, rating_total counter, PRIMARY KEY (videoid) ); CREATE TABLE video_ratings ( videoid uuid, rating_counter counter, rating_total counter, PRIMARY KEY (videoid) ); • Counters are a good example of something with special rules
  • 30. Single Nodes Have Limits Too • Latest videos are bucketed by day • Means all reads/writes to latest videos are going to same partition (and thus the same nodes) • Could create a hotspot 30 Show latest videos added to the site Find videos by date (latest first) CREATE TABLE latest_videos ( yyyymmdd text, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (yyyymmdd, added_date, videoid) ) WITH CLUSTERING ORDER BY ( added_date DESC, videoid ASC );
  • 31. Single Nodes Have Limits Too • Mitigate by adding data to the Partition Key to spread load • Data that’s already naturally a part of the domain – Latest videos by category? • Arbitrary data, like a bucket number – Round robin at the app level 31 Show latest videos added to the site Find videos by date (latest first) CREATE TABLE latest_videos ( yyyymmdd text, bucket_number int, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY ( (yyyymmdd, bucket_number) added_date, videoid) ) ...
  • 32. Questions? 32 Follow me on Twitter for updates or to ask questions later: @LukeTillman