2. MONGO LAB’S SOCIAL
DATA• Built in 5/14 on Java framework
• Proven able to support 10 million users
with minimum hardware scaling
• Aimed at providing a production capsule
reusable framework for social data
management
• Also tried to demonstrate best practices for
social data modeling, indexing and
querying in MongoDB
• Demonstrated how flexible a service
architecture coupled with MongoDB
sharding can create a scalable platform for
social data
5. WHY NODE?
• Node.JS shines in real-time web applications
• Push technology over web sockets
• Quick delivery
• Easier entry point than Java
6. USER GRAPH SERVICE
• Manages the users of the system and
more importantly the follower graph.
• Service must allow users to be added and
removed from the system and users to
“follow” the post of others.
• Service determines which user timelines
receive the posts by any given author (the
followers of that user)
• Must be carefully designed to scale to
significant read and write loads due to the
significant churn within the graph as users
elect to follow and unfollow users
9. CONTENT SERVICE• Dedicated to storing and retrieving raw content (posts)
generated by users.
• Serves as the system of record for all posts
• Stores the full post content indexed primarily by post id
• Allows content to be added by user and allocated a unique
id
• Perform basic validation on content
• Allow content to be found by id
• Allow content to be queried by user, anchor (a content id
that represents a point in time) and returned ordered by
time
12. FEED SERVICE
• Receive posts from a user
• Forwards the content of the post to the Content Service
• Serves timeline feeds for all users based on the posts of
accounts they follow
13. FEED SERVICE
• Fanout on Read
• Fanout on Write
One major design decision for a feed service is how and when to
assemble the timeline feed for users. The implementations each
fall into the two broad categories.
14. THE BAD - FANOUT ON
READ
• Simple implementation - almost pass
through to the content service
• Storage efficient - only stores a single
copy of message and it’s metadata
Advantages
• Multi-shard random IO - assembling
a feed will typically require reads that
span a number of shards. These
queries will involve a lot of random IO
and often read more data than is
necessary
• No caching - if two reads of the same
user’s timeline occurs in quick
succession, the entire query must be
re-processed irrespective of whether
the result will have changed.
• Large following lists - gathering the
timeline for a use with a large follower
list will be very expensive and at some
point would require splitting into
Disadvantages
15. THE BAD - FANOUT ON
READ
When to use
The Fanout on Read model is typically best suited for
• Timelines are viewed far less frequently than posts are
made
• Number of users and amount of content is smaller scale
and unsharded
• Users typically follow few people
• It is very common for older sections of the timeline to be
viewed
16. THE GOOD - FANOUT ON
WRITE
• Faster reads - having everything
already assembled reduces
Advantages
• Duplication - there is a lot of
duplication of the original message,
especially in highly connected
networks where the average user has
a lot of followers
• More complicated - this
implementation must use a
findAndModify command rather than
an upset to detect when a new
document needs to be created
Disadvantages