SlideShare a Scribd company logo
Building an Activity
Feed with Cassandra
Mark Dunphy, Software Engineer
Behance/Adobe
@dunphtastic
Disclaimer
Not an operations person.
Will pretend to be one for the purpose of this talk.
Quick Overview
What is the Behance Activity Feed?
• Actions
• Comments, Appreciations, Etc
• Entities
• Projects, Works in Progress
• Actors
• Users
Project Entity
Actions taken
by actors
Activity Fan Out
User A publishes
a new project
Write to Follower A’s feed
Write to Follower B’s feed
Write to Follower C’s feed
Write to Follower D’s feed
Now that that’s over…
MongoDB
2011
• Smaller user base (~340,000).
• Built very quickly. Worked well at the time.
• Not well researched.
Fast forward to 2014
• Frequent node failures
• Heavy disk fragmentation caused by deletes
• Slow reads from disk. Started storing in RAM.
• Primary -> Secondary caused downtime for
some.
• Scaled out vertically and horizontally.
Why Cassandra?
• Riak
• Very close. Community seemed lacking.
• Redis
• No native cluster. Too much maintenance.
• Memcached/MySQL
• Too much complex app logic.
Cassandra Wins.
• Fantastic community. #cassandra on Twitter
• Easy to read documentation
• Linearly scalable. Easy to grow cluster.
• Low maintenance overhead for ops team.
• Handles time series data very well.
Learning
• Cassandra Summit 2014
• Other team in Adobe
• Long nights reading documentation
Our Data
• Ephemeral
• “Source of truth” lives in a MySQL database
• Okay with *some* data loss
Our Rules
• User’s feed is comprised of entities with one set
of actions
• User’s feed only contains one of any given entity
• An entity’s set of actions contains up to seven of
the most recent actions taken by that user’s
network
Planning
Language Support
• Most services on Behance are PHP
• No official Datastax PHP driver
–Mark Dunphy, 2014
“Looks like I’m learning python.”
Go to Production
No, nothing is working yet. I didn’t skip a slide.
• App/cluster in production before anything works
• Test real life load
• Fail spectacularly without anybody noticing
• Deploy risky changes without fear
• Run alongside MongoDB
January 19th, 2015
Query Patterns
• “Create your data models based on the queries
you want to run” - Basically Everybody
• Wanted to…
• Read a user’s feed entities by type and time of
most recent action…separately.
• Write/Update a user’s feed entities with new
actions while knowing only user id and entity id
Data Models
–Mark Dunphy, January 2015
“An UPDATE in Cassandra works like an
UPSERT! Let’s store the user’s entire feed in a
single row in a table! It’s so simple!”
First Data Model
CREATE TYPE activity.action (
created_on timestamp,
secondary_entity_id int,
actor_id int,
verb_id int
);
CREATE TYPE activity.entity (
entity_type_id int,
entity_id int
);
CREATE TABLE activity.project_actions (
modified_on timestamp,
entity_id int,
user_id int,
actions list<frozen<action>>,
PRIMARY KEY(user_id, entity_id)
)
CREATE TABLE activity.feeds (
modified_entities list<frozen<entity>>,
modified_on timestamp,
project_ids list<int>,
user_id int,
wip_revision_ids list<int>,
PRIMARY KEY(user_id)
)
First Data Model
First Data Model
Moments Before Everything Exploded
–Mark Dunphy, January 2015
“Okay let’s keep nearly the same model, but
use INSERT and DELETE instead of always
UPDATE. Just use batch statements.”
Second Data Model
Second Data Model
This was also a very very bad idea.
• Lose the benefit of Cassandra being distributed
• All queries go through the same coordinator
which puts a lot of stress and responsibility on
one node.
• Use concurrency and prepared statements
instead. Datastax drivers make this easy.
Second Data Model
Second Data Model
Oops
Okay…
Now we’ve got it.
Winning Data Model
CREATE TYPE activity.action (
created_on timestamp,
secondary_entity_id int,
actor_id int,
verb_id int
);
CREATE TABLE activity.projects (
created_on timestamp,
user_id int,
entity_id int,
actions list<frozen<action>>,
PRIMARY KEY(user_id, created_on, entity_id)
)
CREATE TABLE activity.project_actions (
modified_on timestamp,
entity_id int,
user_id int,
actions list<frozen<action>>,
PRIMARY KEY(user_id, entity_id)
)
Much Nicer
Write Strategy
• “User A comments on Project A. User B follows
User A.”
• Request out to add the comment action to User
B’s feed
• Read existing actions for that entity (Project A) in
B’s feed. Push new action on top.
• Write new actions list into new “row” in projects
table
Read Strategy
• SELECT * FROM projects WHERE user_id
= 123 AND created_on > 123214373
• Optimized for quick/easy reads. More important
that a user’s feed loads quickly than it updating
quickly.
• Use timestamp to “page” through data.
Lessons Learned
• Duplicate your data to achieve desired queries.
Storage is cheap. Writes are cheap.
• Think outside the box. Cassandra is not
relational.
• Never ever ever ignore inserts/deletes in favor of
an update only workflow. Never. It is literally
insane.
Final Specs
• 16 node cluster on AWS EC2 c3.8xlarge
• Mix of SizeTieredCompactionStrategy and
DateTieredCompactionStrategy
• NetworkTopologyStrategy
• Replication factor 3
• ConsistencyLevel = ONE for most requests
Final Specs
• Bursty write volume. Consistent read volume.
• 5k to 80k writes per second
• 2k to 4k reads per second
Questions?
I might have answers.
Thank you!
Mark Dunphy, Software Engineer
Behance/Adobe
@dunphtastic

More Related Content

What's hot

A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
Altinity Ltd
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
Gleb Kanterov
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Altinity Ltd
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and how
Altinity Ltd
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
ScyllaDB
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
DataStax
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
Flink Forward
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introduction
leanderlee2
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Henning Jacobs
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
Altinity Ltd
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Kevin Weil
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET Developers
Luke Tillman
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 

What's hot (20)

A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and how
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introduction
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET Developers
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 

Viewers also liked

Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
MongoDB
 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
MongoDB
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data FeedSocialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
MongoDB
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
Tony Tam
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
MongoDB
 
MongoGraph - MongoDB Meets the Semantic Web
MongoGraph - MongoDB Meets the Semantic WebMongoGraph - MongoDB Meets the Semantic Web
MongoGraph - MongoDB Meets the Semantic Web
DATAVERSITY
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB
 
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDBMongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
MongoDB
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
MongoDB
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
MongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
MongoDB
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseChris Clarke
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
Alex Sharp
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Ravi Teja
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
MongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBookMongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBook
MongoDB
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB
 

Viewers also liked (20)

Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
 
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data FeedSocialite, the Open Source Status Feed Part 3: Scaling the Data Feed
Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
 
MongoGraph - MongoDB Meets the Semantic Web
MongoGraph - MongoDB Meets the Semantic WebMongoGraph - MongoDB Meets the Semantic Web
MongoGraph - MongoDB Meets the Semantic Web
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDBMongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBookMongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBook
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 

Similar to Building an Activity Feed with Cassandra

All about that reactive ui
All about that reactive uiAll about that reactive ui
All about that reactive ui
Paul van Zyl
 
F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4
malorie_pinterest
 
Bridging Current Reality & Future Vision with Reality Maps
Bridging Current Reality & Future Vision with Reality MapsBridging Current Reality & Future Vision with Reality Maps
Bridging Current Reality & Future Vision with Reality Maps
Malini Rao
 
Data Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichData Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference Zurich
Memi Beltrame
 
Cracking web development
Cracking web developmentCracking web development
Cracking web development
Eyal Kenig
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Databricks
 
Getting your project off the ground (BuildStuffLt)
Getting your project off the ground (BuildStuffLt)Getting your project off the ground (BuildStuffLt)
Getting your project off the ground (BuildStuffLt)
Johannes Brodwall
 
Microservices: Architecture and Practice
Microservices:  Architecture and PracticeMicroservices:  Architecture and Practice
Microservices: Architecture and Practice
Iron.io
 
React state management with Redux and MobX
React state management with Redux and MobXReact state management with Redux and MobX
React state management with Redux and MobX
Darko Kukovec
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
Trieu Nguyen
 
Full stack conference talk slides
Full stack conference talk slidesFull stack conference talk slides
Full stack conference talk slides
Sameer Al-Sakran
 
SciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystemsSciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystems
James Howison
 
redpill Forensics
redpill Forensicsredpill Forensics
redpill Forensics
Peter Presnell
 
Drupal 8 Initiatives
Drupal 8 InitiativesDrupal 8 Initiatives
Drupal 8 Initiatives
Angela Byron
 
GDG Helwan Introduction to python
GDG Helwan Introduction to pythonGDG Helwan Introduction to python
GDG Helwan Introduction to python
Mohamed Hegazy
 
Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)
NETUserGroupBern
 
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
Thinkful
 

Similar to Building an Activity Feed with Cassandra (20)

All about that reactive ui
All about that reactive uiAll about that reactive ui
All about that reactive ui
 
2014 Picking a Platform by Anand Kulkarni
2014 Picking a Platform by Anand Kulkarni2014 Picking a Platform by Anand Kulkarni
2014 Picking a Platform by Anand Kulkarni
 
F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4
 
Bridging Current Reality & Future Vision with Reality Maps
Bridging Current Reality & Future Vision with Reality MapsBridging Current Reality & Future Vision with Reality Maps
Bridging Current Reality & Future Vision with Reality Maps
 
Data Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference ZurichData Driven Design - Frontend Conference Zurich
Data Driven Design - Frontend Conference Zurich
 
Cracking web development
Cracking web developmentCracking web development
Cracking web development
 
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
Solving Data Discovery Challenges at Lyft with Amundsen, an Open-source Metad...
 
Getting your project off the ground (BuildStuffLt)
Getting your project off the ground (BuildStuffLt)Getting your project off the ground (BuildStuffLt)
Getting your project off the ground (BuildStuffLt)
 
Microservices: Architecture and Practice
Microservices:  Architecture and PracticeMicroservices:  Architecture and Practice
Microservices: Architecture and Practice
 
React state management with Redux and MobX
React state management with Redux and MobXReact state management with Redux and MobX
React state management with Redux and MobX
 
Building Reactive Real-time Data Pipeline
Building Reactive Real-time Data PipelineBuilding Reactive Real-time Data Pipeline
Building Reactive Real-time Data Pipeline
 
Full stack conference talk slides
Full stack conference talk slidesFull stack conference talk slides
Full stack conference talk slides
 
SciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystemsSciSoftDays Talk - Howison: Spreading the work in software ecosystems
SciSoftDays Talk - Howison: Spreading the work in software ecosystems
 
306 belmont ssp08agileit
306 belmont ssp08agileit306 belmont ssp08agileit
306 belmont ssp08agileit
 
redpill Forensics
redpill Forensicsredpill Forensics
redpill Forensics
 
Drupal 8 Initiatives
Drupal 8 InitiativesDrupal 8 Initiatives
Drupal 8 Initiatives
 
GDG Helwan Introduction to python
GDG Helwan Introduction to pythonGDG Helwan Introduction to python
GDG Helwan Introduction to python
 
Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)
 
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
Build a Web App with JavaScript and jQuery (5:18:17, Los Angeles)
 
Gaurav agarwal
Gaurav agarwalGaurav agarwal
Gaurav agarwal
 

Recently uploaded

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 

Recently uploaded (20)

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 

Building an Activity Feed with Cassandra

  • 1. Building an Activity Feed with Cassandra Mark Dunphy, Software Engineer Behance/Adobe @dunphtastic
  • 2. Disclaimer Not an operations person. Will pretend to be one for the purpose of this talk.
  • 3. Quick Overview What is the Behance Activity Feed?
  • 4.
  • 5. • Actions • Comments, Appreciations, Etc • Entities • Projects, Works in Progress • Actors • Users
  • 8. User A publishes a new project Write to Follower A’s feed Write to Follower B’s feed Write to Follower C’s feed Write to Follower D’s feed
  • 11. • Smaller user base (~340,000). • Built very quickly. Worked well at the time. • Not well researched.
  • 13. • Frequent node failures • Heavy disk fragmentation caused by deletes • Slow reads from disk. Started storing in RAM. • Primary -> Secondary caused downtime for some. • Scaled out vertically and horizontally.
  • 15. • Riak • Very close. Community seemed lacking. • Redis • No native cluster. Too much maintenance. • Memcached/MySQL • Too much complex app logic.
  • 17. • Fantastic community. #cassandra on Twitter • Easy to read documentation • Linearly scalable. Easy to grow cluster. • Low maintenance overhead for ops team. • Handles time series data very well.
  • 19. • Cassandra Summit 2014 • Other team in Adobe • Long nights reading documentation
  • 21. • Ephemeral • “Source of truth” lives in a MySQL database • Okay with *some* data loss
  • 23. • User’s feed is comprised of entities with one set of actions • User’s feed only contains one of any given entity • An entity’s set of actions contains up to seven of the most recent actions taken by that user’s network
  • 25. Language Support • Most services on Behance are PHP • No official Datastax PHP driver
  • 26. –Mark Dunphy, 2014 “Looks like I’m learning python.”
  • 27. Go to Production No, nothing is working yet. I didn’t skip a slide.
  • 28. • App/cluster in production before anything works • Test real life load • Fail spectacularly without anybody noticing • Deploy risky changes without fear • Run alongside MongoDB
  • 30. Query Patterns • “Create your data models based on the queries you want to run” - Basically Everybody • Wanted to… • Read a user’s feed entities by type and time of most recent action…separately. • Write/Update a user’s feed entities with new actions while knowing only user id and entity id
  • 32. –Mark Dunphy, January 2015 “An UPDATE in Cassandra works like an UPSERT! Let’s store the user’s entire feed in a single row in a table! It’s so simple!” First Data Model
  • 33. CREATE TYPE activity.action ( created_on timestamp, secondary_entity_id int, actor_id int, verb_id int ); CREATE TYPE activity.entity ( entity_type_id int, entity_id int );
  • 34. CREATE TABLE activity.project_actions ( modified_on timestamp, entity_id int, user_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, entity_id) )
  • 35. CREATE TABLE activity.feeds ( modified_entities list<frozen<entity>>, modified_on timestamp, project_ids list<int>, user_id int, wip_revision_ids list<int>, PRIMARY KEY(user_id) )
  • 37. First Data Model Moments Before Everything Exploded
  • 38. –Mark Dunphy, January 2015 “Okay let’s keep nearly the same model, but use INSERT and DELETE instead of always UPDATE. Just use batch statements.” Second Data Model
  • 39. Second Data Model This was also a very very bad idea.
  • 40. • Lose the benefit of Cassandra being distributed • All queries go through the same coordinator which puts a lot of stress and responsibility on one node. • Use concurrency and prepared statements instead. Datastax drivers make this easy. Second Data Model
  • 45. CREATE TYPE activity.action ( created_on timestamp, secondary_entity_id int, actor_id int, verb_id int );
  • 46. CREATE TABLE activity.projects ( created_on timestamp, user_id int, entity_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, created_on, entity_id) )
  • 47. CREATE TABLE activity.project_actions ( modified_on timestamp, entity_id int, user_id int, actions list<frozen<action>>, PRIMARY KEY(user_id, entity_id) )
  • 49. Write Strategy • “User A comments on Project A. User B follows User A.” • Request out to add the comment action to User B’s feed • Read existing actions for that entity (Project A) in B’s feed. Push new action on top. • Write new actions list into new “row” in projects table
  • 50. Read Strategy • SELECT * FROM projects WHERE user_id = 123 AND created_on > 123214373 • Optimized for quick/easy reads. More important that a user’s feed loads quickly than it updating quickly. • Use timestamp to “page” through data.
  • 51. Lessons Learned • Duplicate your data to achieve desired queries. Storage is cheap. Writes are cheap. • Think outside the box. Cassandra is not relational. • Never ever ever ignore inserts/deletes in favor of an update only workflow. Never. It is literally insane.
  • 52. Final Specs • 16 node cluster on AWS EC2 c3.8xlarge • Mix of SizeTieredCompactionStrategy and DateTieredCompactionStrategy • NetworkTopologyStrategy • Replication factor 3 • ConsistencyLevel = ONE for most requests
  • 53. Final Specs • Bursty write volume. Consistent read volume. • 5k to 80k writes per second • 2k to 4k reads per second
  • 55. Thank you! Mark Dunphy, Software Engineer Behance/Adobe @dunphtastic