Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Building a Social Platform
with MongoDB
MongoDB Inc
Darren Wood & Asya Kamsky
#MongoDBWorld

Building a Social Platform
Part 2:
Managing the Social Graph

Socialite
• Open Source
• Reference Implementation
– Various Fanout Feed Models
– User Graph Implementation
– Content storage
• Configurable models and options
• REST API in Dropwizard (Yammer)
– https://dropwizard.github.io/dropwizard/
• Built-in benchmarking
https://github.com/10gen-labs/socialite

Architecture
GraphServiceProxy
ContentProxy

Graph Data - Social
John Kate
follows
Bob
Pete

Graph Data - Social
John Kate
follows
Bob
Pete
Recommendation ?

Graph Data - Promotional
John Kate
follows
Bob
Pete
Acme
Soda
Mention
Recommendation ?

Graph Data - Everywhere
• Retail
• Complex product catalogues
• Product recommendation engines
• Manufacturing and Logistics
• Tracing failures to faulty component batches
• Determining fallout from supply interruption
• Healthcare
• Patient/Physician interactions

Follower Churn
• Tempting to focus on scaling content
• Follow requests rival message send rates
• Twitter enforces per day follow limits

Edge Metadata
• Models – friends/followers
• Requirements typically start simple
• Add Groups, Favorites, Relationships

Option One – Embedding Edges

Embedded Edge Arrays
• Storing connections with user (popular choice)
 Most compact form
 Efficient for reads
• However….
– User documents grow
– Upper limit on degree (document size)
– Difficult to annotate (and index) edge
{
"_id" : "djw",
"fullname" : "Darren Wood",
"country" : "Australia",
"followers" : [ "jsr", "ian"],
"following" : [ "jsr", "pete"]
}

Embedded Edge Arrays
• Creating Rich Graph Information
– Can become cumbersome
{
"_id" : "djw",
"friends" : [
{"uid" : "jsr", "grp" : "school"},
{"uid" : "ian", "grp" : "work"} ]
}
{
"_id" : "djw",
"friends" : [ "jsr", "ian"],
"group" : [ ”school", ”work"]
}

Option Two – Edge Collection

Edge Collections
• Document per edge
• Very flexible for adding edge data
> db.followers.findOne()
{
"_id" : ObjectId(…),
"from" : "djw",
"to" : "jsr"
}
> db.friends.findOne()
{
"_id" : ObjectId(…),
"from" : "djw",
"to" : "jsr",
"grp" : "work",
"ts" : Date("2013-07-10")
}

Operational issues
• Updates of embedded arrays
– grow non-linearly with number of indexed array
elements
• Updating edge collection => inserts
– grows close to linearly with existing number of
edges/user

Edge Collection
Indexing Strategies

Finding Followers
Consider our single followercollection :
> db.followers.find({from : "djw"}, {_id:0, to:1})
{
"to" : "jsr"
}
Using index :
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "from_1_to_1"
}
Covered index when
searching on "from"
for all followers
Specify only if
multiple edges cannot
exist

Finding Following
What about who a user is following?
Can use a reverse covered index :
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
}
{
"v" : 1,
"key" : { "to" : 1, "from" : 1 },
"unique" : true,
"name" : "to_1_from_1"
}
Notice the flipped
field order here

Finding Following
Wait ! There is an issue with the reverse index…..
SHARDING !
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
}
{
"v" : 1,
"key" : { "to" : 1, "from" : 1 },
"unique" : true,
"name" : "to_1_from_1"
}
If we shard this collection by
"from", looking up followers
for a specific user is
"targeted" to a shard
To find who the user is
following however, it must
scatter-gather the query to
all shards

Dual Edge Collections
When "following" queries are common
– Not always the case
– Consider overhead carefully
Can use dual collections storing
– One for each direction
– Edges are duplicated reversed
– Can be sharded independently

Edge Query Rate Comparison
Number of shards
vs
Number of queries
Followers collection
with forward and
reverse indexes
Two collections,
followers, following
one index each
1 10,000 10,000
3 90,000 30,000
6 360,000 60,000
12 1,440,000 120,000

Follower Counts
Can use the edge indexes :
How to determine these counts ?
> db.followers.find({_f : "djw"}).count()
> db.following.find({_f : "djw"}).count()
However this can be heavy weight
- Especially for rendering landing page
- Consider maintaining counts on user document

Socialite User Service
• Manages user profiles and the follower graph
• Supports arbitrary user data passthrough
• Options for graph storage
– Uses edge collections (can shard by _f)
– Options for maintaining separate follower/ing graphs
– Storing counts vs counting
{
"_id" : ObjectId("52cd1d32a0ee9a1a76d369bb"),
"_f" : "jsr",
"_t" : "djw"
}
{
"v" : 1,
"key" : {"_f" : 1, "_t" : 1},
"unique" : true,
}

Next up @ 11:50am :
Scaling the Data Feed
• Delivering user content to followers
• Comparing fanout models
• Caching user timelines for fast retrieval
• Embedding vs Linking Content

Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

More Related Content

What's hot

Similar to Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

More from MongoDB

Recently uploaded

Socialite, the Open Source Status Feed Part 2: Managing the Social Graph

Editor's Notes