SlideShare a Scribd company logo
1 of 25
Building a Social Platform
Part 3:
Scaling the Data Feed
Socialite
• Reference Implementation
– Various Fanout Feed Models
– User Graph Implementation
– Content storage
• Configurable models and options
• REST API in Dropwizard (Yammer)
– https://dropwizard.github.io/dropwizard/
• Built-in benchmarking
https://github.com/10gen-labs/socialite
Architecture
GraphServiceProxy
ContentProxy
Feed Service
• Two main functions :
– Aggregating “followed” content for a user
– Forwarding user’s content to “followers”
• Common implementation models :
– Fanout on read
• Query content of all followed users on fly
– Fanout on write
• Add to “cache” of each user’s timeline for every post
• Various storage models for the timeline
Fanout On Read
Fanout On Read
Pros
Simple implementation
No extra storage for timelines
Cons
– Timeline reads (typically) hit all shards
– Often involves reading more data than required
– May require additional indexing on Content
Fanout On Write
Fanout On Write
Pros
Timeline can be single document read
Dormant users easily excluded
Working set minimized
Cons
– Fanout for large follower lists can be expensive
– Additional storage for materialized timelines
Fanout On Write
• Three different approaches
– Time buckets
– Size buckets
– Cache
• Each has different pros & cons
Timeline Buckets - Time
Upsert to time range buckets for each user
> db.timed_buckets.find().pretty()
{
"_id" : {"_u" : "jsr", "_t" : 516935},
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"},
{"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"}
]
}
{
"_id" : {"_u" : "ian", "_t" : 516935},
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}
]
}
{
"_id" : {"_u" : "jsr", "_t" : 516934 },
"_c" : [
{"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"}
]
}
Timeline Buckets - Size
More complex, but more consistently sized
> db.sized_buckets.find().pretty()
{
"_id" : ObjectId("...122"),
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"},
{"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"},
{"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"}
],
"_s" : 3,
"_u" : "jsr"
}
{
"_id" : ObjectId("...011"),
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}
],
"_s" : 1,
"_u" : "ian"
}
Timeline - Cache
Store a limited cache, fall back to fanout on read
– Create single cache doc on demand with upsert
– Limit size of cache with $slice
– Timeout docs with TTL for inactive users
> db.timeline_cache.find().pretty()
{
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"},
{"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"},
{"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"}
],
"_u" : "jsr"
}
{
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}
],
"_u" : "ian"
}
Embedding vs Linking Content
Embedded content for direct access
– Great when it is small, predictable in size
Link to content, store only metadata
– Read only desired content on demand
– Further stabilizes cache document sizes
> db.timeline_cache.findOne({”_id" : "jsr"})
{
"_c" : [
{"_id" : ObjectId("...dc1”)},
{"_id" : ObjectId("...dd2”)},
{"_id" : ObjectId("...da7”)}
],
”_id" : "jsr"
}
Socialite Feed Service
• Implemented four models as plugins
– FanoutOnRead
– FanoutOnWrite – Buckets (size)
– FanoutOnWrite – Buckets (time)
– FanoutOnWrite - Cache
• Switchable by config
• Store content by reference or value
• Benchmark-able back to back
Benchmark by feed type
Benchmarking the Feed
• Biggest challenge: scaling the feed
• High cost of "fanout on write"
• Popular user posts => # operations:
– Content collection insert: 1
– Timeline Cache: on average, 130+ cache document
updates
• SCATTER GATHER (slowest shard determines latency)
Benchmarking the Feed
• Timeline is different from content!
– "It's a Cache"
IT CAN BE REBUILT!
Benchmarking the Feed
• MongoDB as a cache
IT CAN BE REBUILT!
Effect of removing the cache and forcing drop-back to
fanout on read and rebuilding of the cache:
Benchmarking the Feed
Benchmarking the Feed
Benchmarking the Feed
Benchmarking the Feed
• Results
– last two weeks
– ran load with one million users
– ran load with ten million users (currently running)
– used avg send rate 1K/s; 2K/s; reads 10K-20k/s
– 22 AWS c3.2xlarge servers (7.5GB RAM)
– 18 across six shards (3 content, 3 user graph)
– 4 mongos and app machines
– 2 c2x4xlarge servers (30GB RAM)
– timeline feed cache (six shards)
Summary
Socialite
• Real Working Implementation
– Implements All Components
– Configurable models and options
• Built-in benchmarking
• Questions?
– We will be at "Ask The Experts" this afternoon!
https://github.com/10gen-labs/socialite
https://github.com/10gen-labs/socialite
https://github.com/10gen-labs/socialite
Thank You!

More Related Content

What's hot

Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documents
MongoDB
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
Alex Litvinok
 
Building a Social Network with MongoDB
  Building a Social Network with MongoDB  Building a Social Network with MongoDB
Building a Social Network with MongoDB
Fred Chu
 
Modeling Data in MongoDB
Modeling Data in MongoDBModeling Data in MongoDB
Modeling Data in MongoDB
lehresman
 

What's hot (19)

MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
 
Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documents
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
MongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - InboxesMongoDB Advanced Schema Design - Inboxes
MongoDB Advanced Schema Design - Inboxes
 
Building Your First MongoDB App ~ Metadata Catalog
Building Your First MongoDB App ~ Metadata CatalogBuilding Your First MongoDB App ~ Metadata Catalog
Building Your First MongoDB App ~ Metadata Catalog
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
 
Mongo DB schema design patterns
Mongo DB schema design patternsMongo DB schema design patterns
Mongo DB schema design patterns
 
Building a Social Network with MongoDB
  Building a Social Network with MongoDB  Building a Social Network with MongoDB
Building a Social Network with MongoDB
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
 
Modeling Data in MongoDB
Modeling Data in MongoDBModeling Data in MongoDB
Modeling Data in MongoDB
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
Mongo db tutorials
Mongo db tutorialsMongo db tutorials
Mongo db tutorials
 
Data Modeling for the Real World
Data Modeling for the Real WorldData Modeling for the Real World
Data Modeling for the Real World
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
 

Viewers also liked

Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
Chris Clarke
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
Dan McKinley
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
MongoDB
 

Viewers also liked (20)

Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...
 
Building an Activity Feed with Cassandra
Building an Activity Feed with CassandraBuilding an Activity Feed with Cassandra
Building an Activity Feed with Cassandra
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
 
MongoGraph - MongoDB Meets the Semantic Web
MongoGraph - MongoDB Meets the Semantic WebMongoGraph - MongoDB Meets the Semantic Web
MongoGraph - MongoDB Meets the Semantic Web
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDBMongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
MongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBookMongoDB World 2016: Poster Sessions eBook
MongoDB World 2016: Poster Sessions eBook
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
How Auto Trader enables the UK's largest digital automotive marketplace
How Auto Trader enables the UK's largest digital automotive marketplaceHow Auto Trader enables the UK's largest digital automotive marketplace
How Auto Trader enables the UK's largest digital automotive marketplace
 

Similar to Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed

Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
MongoDB APAC
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
DATAVERSITY
 

Similar to Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed (20)

Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Starting with MongoDB
Starting with MongoDBStarting with MongoDB
Starting with MongoDB
 
Webinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDBWebinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDB
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
 
MongoDB NYC Python
MongoDB NYC PythonMongoDB NYC Python
MongoDB NYC Python
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
 
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup
 
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behl
 
Dealing with Azure Cosmos DB
Dealing with Azure Cosmos DBDealing with Azure Cosmos DB
Dealing with Azure Cosmos DB
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross LawleySpark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 

Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed

  • 1. Building a Social Platform Part 3: Scaling the Data Feed
  • 2. Socialite • Reference Implementation – Various Fanout Feed Models – User Graph Implementation – Content storage • Configurable models and options • REST API in Dropwizard (Yammer) – https://dropwizard.github.io/dropwizard/ • Built-in benchmarking https://github.com/10gen-labs/socialite
  • 4. Feed Service • Two main functions : – Aggregating “followed” content for a user – Forwarding user’s content to “followers” • Common implementation models : – Fanout on read • Query content of all followed users on fly – Fanout on write • Add to “cache” of each user’s timeline for every post • Various storage models for the timeline
  • 6. Fanout On Read Pros Simple implementation No extra storage for timelines Cons – Timeline reads (typically) hit all shards – Often involves reading more data than required – May require additional indexing on Content
  • 8. Fanout On Write Pros Timeline can be single document read Dormant users easily excluded Working set minimized Cons – Fanout for large follower lists can be expensive – Additional storage for materialized timelines
  • 9. Fanout On Write • Three different approaches – Time buckets – Size buckets – Cache • Each has different pros & cons
  • 10. Timeline Buckets - Time Upsert to time range buckets for each user > db.timed_buckets.find().pretty() { "_id" : {"_u" : "jsr", "_t" : 516935}, "_c" : [ {"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}, {"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"} ] } { "_id" : {"_u" : "ian", "_t" : 516935}, "_c" : [ {"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"} ] } { "_id" : {"_u" : "jsr", "_t" : 516934 }, "_c" : [ {"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"} ] }
  • 11. Timeline Buckets - Size More complex, but more consistently sized > db.sized_buckets.find().pretty() { "_id" : ObjectId("...122"), "_c" : [ {"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}, {"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"}, {"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"} ], "_s" : 3, "_u" : "jsr" } { "_id" : ObjectId("...011"), "_c" : [ {"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"} ], "_s" : 1, "_u" : "ian" }
  • 12. Timeline - Cache Store a limited cache, fall back to fanout on read – Create single cache doc on demand with upsert – Limit size of cache with $slice – Timeout docs with TTL for inactive users > db.timeline_cache.find().pretty() { "_c" : [ {"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}, {"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"}, {"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"} ], "_u" : "jsr" } { "_c" : [ {"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"} ], "_u" : "ian" }
  • 13. Embedding vs Linking Content Embedded content for direct access – Great when it is small, predictable in size Link to content, store only metadata – Read only desired content on demand – Further stabilizes cache document sizes > db.timeline_cache.findOne({”_id" : "jsr"}) { "_c" : [ {"_id" : ObjectId("...dc1”)}, {"_id" : ObjectId("...dd2”)}, {"_id" : ObjectId("...da7”)} ], ”_id" : "jsr" }
  • 14. Socialite Feed Service • Implemented four models as plugins – FanoutOnRead – FanoutOnWrite – Buckets (size) – FanoutOnWrite – Buckets (time) – FanoutOnWrite - Cache • Switchable by config • Store content by reference or value • Benchmark-able back to back
  • 16. Benchmarking the Feed • Biggest challenge: scaling the feed • High cost of "fanout on write" • Popular user posts => # operations: – Content collection insert: 1 – Timeline Cache: on average, 130+ cache document updates • SCATTER GATHER (slowest shard determines latency)
  • 17. Benchmarking the Feed • Timeline is different from content! – "It's a Cache" IT CAN BE REBUILT!
  • 18. Benchmarking the Feed • MongoDB as a cache
  • 19. IT CAN BE REBUILT! Effect of removing the cache and forcing drop-back to fanout on read and rebuilding of the cache: Benchmarking the Feed
  • 22. Benchmarking the Feed • Results – last two weeks – ran load with one million users – ran load with ten million users (currently running) – used avg send rate 1K/s; 2K/s; reads 10K-20k/s – 22 AWS c3.2xlarge servers (7.5GB RAM) – 18 across six shards (3 content, 3 user graph) – 4 mongos and app machines – 2 c2x4xlarge servers (30GB RAM) – timeline feed cache (six shards)
  • 24. Socialite • Real Working Implementation – Implements All Components – Configurable models and options • Built-in benchmarking • Questions? – We will be at "Ask The Experts" this afternoon! https://github.com/10gen-labs/socialite https://github.com/10gen-labs/socialite

Editor's Notes

  1. For a Social Platform to store and deliver streaming timelines over long periods of time, careful attention must be paid to the way content is stored. We provide a detailed look into storing an infinite timeline of data while optimizing indexing and sharding configuration for access the most recent window of data. We will also look at some overall performance metrics from Socialite as we scale from a single replica set to a large sharded environment.
  2. image at https://dropwizard.github.io/dropwizard of the hat 
  3. BRUTAL!!!
  4. Variants?
  5. Should you embed the messages/content into "cache"/buckets/etc. or just store references?
  6. WHICH ONE DID WE IMPLEMENT IN SOCIALITE??? All work with Async Service(? or mention later) And we did benchmark them! -> Asya
  7. examining latency of reading content by fanout type - note two types of latency – for sender and for recipient. scaling throughput... THIS WILL NOT SCALE LINEARLY(!) *RERUN WITH SEVERAL SHARDS* replace with new screenshot
  8. MongoDB as a cache Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ??? Cache only for active users. Number of updates across all cache / number of documents updated
  9. MongoDB as a cache Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ??? Cache only for active users.
  10. MongoDB as a cache Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ??? Cache only for active users.
  11. MongoDB as a cache Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ??? Cache only for active users.
  12. MongoDB as a cache Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ??? Cache only for active users.
  13. MongoDB as a cache Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ??? Cache only for active users.
  14. MongoDB as a cache Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ??? Cache only for active users.
  15. Some kind of wrap-up
  16. image at https://dropwizard.github.io/dropwizard of the hat 