Your SlideShare is downloading. ×
0
Building a Social Platform
with MongoDB
MongoDB Inc
Darren Wood & Asya Kamsky
#MongoDBWorld
Building a Social Platform
Part 2:
Managing the Social Graph
Socialite
• Open Source
• Reference Implementation
– Various Fanout Feed Models
– User Graph Implementation
– Content stor...
Architecture
GraphServiceProxy
ContentProxy
Graph Data - Social
John Kate
follows
Bob
Pete
Graph Data - Social
John Kate
follows
Bob
Pete
Recommendation ?
Graph Data - Promotional
John Kate
follows
Bob
Pete
Acme
Soda
Mention
Recommendation ?
Graph Data - Everywhere
• Retail
• Complex product catalogues
• Product recommendation engines
• Manufacturing and Logisti...
Design Considerations
The Tale of Two Biebers
VS
The Tale of Two Biebers
VS
Follower Churn
• Tempting to focus on scaling content
• Follow requests rival message send rates
• Twitter enforces per da...
Edge Metadata
• Models – friends/followers
• Requirements typically start simple
• Add Groups, Favorites, Relationships
Storing Graphs in MongoDB
Option One – Embedding Edges
Embedded Edge Arrays
• Storing connections with user (popular choice)
 Most compact form
 Efficient for reads
• However…...
Embedded Edge Arrays
• Creating Rich Graph Information
– Can become cumbersome
{
"_id" : "djw",
"fullname" : "Darren Wood"...
Option Two – Edge Collection
Edge Collections
• Document per edge
• Very flexible for adding edge data
> db.followers.findOne()
{
"_id" : ObjectId(…),
...
Operational issues
• Updates of embedded arrays
– grow non-linearly with number of indexed array
elements
• Updating edge ...
Edge Insert Rate
Edge Collection
Indexing Strategies
Finding Followers
Consider our single followercollection :
> db.followers.find({from : "djw"}, {_id:0, to:1})
{
"to" : "js...
Finding Following
What about who a user is following?
Can use a reverse covered index :
{
"v" : 1,
"key" : { "from" : 1, "...
Finding Following
Wait ! There is an issue with the reverse index…..
SHARDING !
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 ...
Dual Edge Collections
Dual Edge Collections
When "following" queries are common
– Not always the case
– Consider overhead carefully
Can use dual...
Edge Query Rate Comparison
Number of shards
vs
Number of queries
Followers collection
with forward and
reverse indexes
Two...
Follower Counts
Can use the edge indexes :
How to determine these counts ?
> db.followers.find({_f : "djw"}).count()
> db....
Socialite User Service
• Manages user profiles and the follower graph
• Supports arbitrary user data passthrough
• Options...
Next up @ 11:50am :
Scaling the Data Feed
• Delivering user content to followers
• Comparing fanout models
• Caching user ...
Building a Social Platform
with MongoDB
MongoDB Inc
Darren Wood & Asya Kamsky
#MongoDBWorld
Upcoming SlideShare
Loading in...5
×

Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

5,188

Published on

There are many possible approaches to storing and querying relationships between users in social networks. This section will dive into the details of storing a social user graph in MongoDB. It will cover the various schema designs for storing the follower networks of users and propose an optimal design for insert and query performance, as well as looking at performance differences between them.

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,188
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
61
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • Scaling the delivery of posts and content to the follower networks of millions of users has many challenges. In this section we look at the various approaches to fanning out posts and look at a performance comparison between them. We will highlight some tricks for caching the recent timeline of active users to drive down read latency.
  • image at https://dropwizard.github.io/dropwizard of the hat 

  • Tempting to focus on scaling content
    Follow requests rival message send rates
    Twitter enforces per day follow limits
  • Single Collection
  • How to test, show how growing documents are very painful to update.

    Add the MTV or appmetrics mtools plot showing what happens to outliers.

  • actual performance – show how inserting million users was easy – no point even trying to update embedded documents...
  • side-point of
  • NEED TO GENERATE FOR broadcast (scatter gather) for following,
    direct for followers.
    Number of total queries by number of shards...
    TO GET WHOM THE USER IS FOLLOWING

  • talk about real life trade-offs
  • hidden in original
  • Transcript of "Socialite, the Open Source Status Feed Part 2: Managing the Social Graph"

    1. 1. Building a Social Platform with MongoDB MongoDB Inc Darren Wood & Asya Kamsky #MongoDBWorld
    2. 2. Building a Social Platform Part 2: Managing the Social Graph
    3. 3. Socialite • Open Source • Reference Implementation – Various Fanout Feed Models – User Graph Implementation – Content storage • Configurable models and options • REST API in Dropwizard (Yammer) – https://dropwizard.github.io/dropwizard/ • Built-in benchmarking https://github.com/10gen-labs/socialite
    4. 4. Architecture GraphServiceProxy ContentProxy
    5. 5. Graph Data - Social John Kate follows Bob Pete
    6. 6. Graph Data - Social John Kate follows Bob Pete Recommendation ?
    7. 7. Graph Data - Promotional John Kate follows Bob Pete Acme Soda Mention Recommendation ?
    8. 8. Graph Data - Everywhere • Retail • Complex product catalogues • Product recommendation engines • Manufacturing and Logistics • Tracing failures to faulty component batches • Determining fallout from supply interruption • Healthcare • Patient/Physician interactions
    9. 9. Design Considerations
    10. 10. The Tale of Two Biebers VS
    11. 11. The Tale of Two Biebers VS
    12. 12. Follower Churn • Tempting to focus on scaling content • Follow requests rival message send rates • Twitter enforces per day follow limits
    13. 13. Edge Metadata • Models – friends/followers • Requirements typically start simple • Add Groups, Favorites, Relationships
    14. 14. Storing Graphs in MongoDB
    15. 15. Option One – Embedding Edges
    16. 16. Embedded Edge Arrays • Storing connections with user (popular choice)  Most compact form  Efficient for reads • However…. – User documents grow – Upper limit on degree (document size) – Difficult to annotate (and index) edge { "_id" : "djw", "fullname" : "Darren Wood", "country" : "Australia", "followers" : [ "jsr", "ian"], "following" : [ "jsr", "pete"] }
    17. 17. Embedded Edge Arrays • Creating Rich Graph Information – Can become cumbersome { "_id" : "djw", "fullname" : "Darren Wood", "country" : "Australia", "friends" : [ {"uid" : "jsr", "grp" : "school"}, {"uid" : "ian", "grp" : "work"} ] } { "_id" : "djw", "fullname" : "Darren Wood", "country" : "Australia", "friends" : [ "jsr", "ian"], "group" : [ ”school", ”work"] }
    18. 18. Option Two – Edge Collection
    19. 19. Edge Collections • Document per edge • Very flexible for adding edge data > db.followers.findOne() { "_id" : ObjectId(…), "from" : "djw", "to" : "jsr" } > db.friends.findOne() { "_id" : ObjectId(…), "from" : "djw", "to" : "jsr", "grp" : "work", "ts" : Date("2013-07-10") }
    20. 20. Operational issues • Updates of embedded arrays – grow non-linearly with number of indexed array elements • Updating edge collection => inserts – grows close to linearly with existing number of edges/user
    21. 21. Edge Insert Rate
    22. 22. Edge Collection Indexing Strategies
    23. 23. Finding Followers Consider our single followercollection : > db.followers.find({from : "djw"}, {_id:0, to:1}) { "to" : "jsr" } Using index : { "v" : 1, "key" : { "from" : 1, "to" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "from_1_to_1" } Covered index when searching on "from" for all followers Specify only if multiple edges cannot exist
    24. 24. Finding Following What about who a user is following? Can use a reverse covered index : { "v" : 1, "key" : { "from" : 1, "to" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "from_1_to_1" } { "v" : 1, "key" : { "to" : 1, "from" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "to_1_from_1" } Notice the flipped field order here
    25. 25. Finding Following Wait ! There is an issue with the reverse index….. SHARDING ! { "v" : 1, "key" : { "from" : 1, "to" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "from_1_to_1" } { "v" : 1, "key" : { "to" : 1, "from" : 1 }, "unique" : true, "ns" : "socialite.followers", "name" : "to_1_from_1" } If we shard this collection by "from", looking up followers for a specific user is "targeted" to a shard To find who the user is following however, it must scatter-gather the query to all shards
    26. 26. Dual Edge Collections
    27. 27. Dual Edge Collections When "following" queries are common – Not always the case – Consider overhead carefully Can use dual collections storing – One for each direction – Edges are duplicated reversed – Can be sharded independently
    28. 28. Edge Query Rate Comparison Number of shards vs Number of queries Followers collection with forward and reverse indexes Two collections, followers, following one index each 1 10,000 10,000 3 90,000 30,000 6 360,000 60,000 12 1,440,000 120,000
    29. 29. Follower Counts Can use the edge indexes : How to determine these counts ? > db.followers.find({_f : "djw"}).count() > db.following.find({_f : "djw"}).count() However this can be heavy weight - Especially for rendering landing page - Consider maintaining counts on user document
    30. 30. Socialite User Service • Manages user profiles and the follower graph • Supports arbitrary user data passthrough • Options for graph storage – Uses edge collections (can shard by _f) – Options for maintaining separate follower/ing graphs – Storing counts vs counting { "_id" : ObjectId("52cd1d32a0ee9a1a76d369bb"), "_f" : "jsr", "_t" : "djw" } { "v" : 1, "key" : {"_f" : 1, "_t" : 1}, "unique" : true, }
    31. 31. Next up @ 11:50am : Scaling the Data Feed • Delivering user content to followers • Comparing fanout models • Caching user timelines for fast retrieval • Embedding vs Linking Content
    32. 32. Building a Social Platform with MongoDB MongoDB Inc Darren Wood & Asya Kamsky #MongoDBWorld
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×