Slides from the MongoDB MeetUp "IRC Bots and Activity Feeds with MongoDB - At BranchOut", presented by the San Francisco MongoDB User Group and 10gen.
http://www.meetup.com/San-Francisco-MongoDB-User-Group/events/95713262/
Over the past year, we've used MongoDB to power more and more of BranchOut's functionality, including some cool social features such as a Facebook-like activity feed. In this talk, I discuss the design decisions that went into developing these features and outline how Mongo is used under the hood. I discuss not only what makes Mongo a good technology choice, but also list a few things about Mongo that need to be worked around.
If you have any questions regarding these slides, feel free to reach out to me on Twitter: @nate510.
Thanks!
Creating social features at BranchOut using MongoDB
1. Building Social Features
with MongoDB
Nathan Smith
BranchOut.com
Jan. 22, 2013
Tuesday, January 22, 13
2. BranchOut
A more social professional network
• Connect with your colleagues (follow)
• Activity feed of their professional activity
• Timeline of an individual’s posts
Tuesday, January 22, 13
3. BranchOut
A more social professional network
• 30M installed users
• 750MM total user records
• Average 300 connections per installed user
Tuesday, January 22, 13
6. MongoDB @ BranchOut
• 100% MySQL until ~July 2012
• Much of our data fits well into a document
model
Tuesday, January 22, 13
7. MongoDB @ BranchOut
• 100% MySQL until ~July 2012
• Much of our data fits well into a document
model
• Our data design avoids RDBMS features
Tuesday, January 22, 13
10. Follow System
Business logic
• Limit of 2000 followees (people you follow)
Tuesday, January 22, 13
11. Follow System
Business logic
• Limit of 2000 followees (people you follow)
• Unlimited followers
Tuesday, January 22, 13
12. Follow System
Business logic
• Limit of 2000 followees (people you follow)
• Unlimited followers
• Both lists reflect updates in near-real time
Tuesday, January 22, 13
13. Follow System
Traditional RDBMS (i.e. MySQL)
follower_uid followee_uid follow_time
123 456 2013-01-22 15:43:00
456 123 2013-01-22 15:52:00
Tuesday, January 22, 13
14. Follow System
Traditional RDBMS (i.e. MySQL)
follower_uid followee_uid follow_time
123 456 2013-01-22 15:43:00
456 123 2013-01-22 15:52:00
Advantage: Easy inserts, deletes
Tuesday, January 22, 13
15. Follow System
Traditional RDBMS (i.e. MySQL)
follower_uid followee_uid follow_time
123 456 2013-01-22 15:43:00
456 123 2013-01-22 15:52:00
Advantage: Easy inserts, deletes
Disadvantage: Data locality, index size
Tuesday, January 22, 13
16. Follow System
MongoDB (first pass)
followee: {
_id: 123
uids: [456, 567, 678]
}
Tuesday, January 22, 13
24. Follow System
Follower document size
Tuesday, January 22, 13
25. Follow System
Follower document size
• Max Mongo doc size: 16MB
Tuesday, January 22, 13
26. Follow System
Follower document size
• Max Mongo doc size: 16MB
• Number of people who follow our
community manager: 30MM
Tuesday, January 22, 13
27. Follow System
Follower document size
• Max Mongo doc size: 16MB
• Number of people who follow our
community manager: 30MM
• 30MM uids × 8 bytes/uid = 240MB
Tuesday, January 22, 13
28. Follow System
Follower document size
• Max Mongo doc size: 16MB
• Number of people who follow our
community manager: 30MM
• 30MM uids × 8 bytes/uid = 240MB
• Max followers per doc: ~2MM
Tuesday, January 22, 13
33. Activity Feed
Push vs Pull architecture
Tuesday, January 22, 13
34. Activity Feed
Push vs Pull architecture
Tuesday, January 22, 13
35. Activity Feed
Push vs Pull architecture
Tuesday, January 22, 13
36. Activity Feed
Business logic
Tuesday, January 22, 13
37. Activity Feed
Business logic
• All connections and followees appear in your feed
Tuesday, January 22, 13
38. Activity Feed
Business logic
• All connections and followees appear in your feed
• Reverse chron sort order (but should support other
rankings)
Tuesday, January 22, 13
39. Activity Feed
Business logic
• All connections and followees appear in your feed
• Reverse chron sort order (but should support other
rankings)
• Support for evolving set of feed event types
Tuesday, January 22, 13
40. Activity Feed
Business logic
• All connections and followees appear in your feed
• Reverse chron sort order (but should support other
rankings)
• Support for evolving set of feed event types
• Tagging creates multiple feed events for the same
underlying object
Tuesday, January 22, 13
41. Activity Feed
Business logic
• All connections and followees appear in your feed
• Reverse chron sort order (but should support other
rankings)
• Support for evolving set of feed event types
• Tagging creates multiple feed events for the same
underlying object
• Feed events are not ephemeral -- Timeline
Tuesday, January 22, 13
42. Activity Feed
Traditional RDBMS (i.e. MySQL)
activity_id uid event_time type oid1 oid2
1 123 2013-01-22 15:43:00 photo 123abc 789ghi
2 345 2013-01-22 15:52:00 status 456def foobar
Tuesday, January 22, 13
43. Activity Feed
Traditional RDBMS (i.e. MySQL)
activity_id uid event_time type oid1 oid2
1 123 2013-01-22 15:43:00 photo 123abc 789ghi
2 345 2013-01-22 15:52:00 status 456def foobar
Advantage: Easy inserts
Tuesday, January 22, 13
44. Activity Feed
Traditional RDBMS (i.e. MySQL)
activity_id uid event_time type oid1 oid2
1 123 2013-01-22 15:43:00 photo 123abc 789ghi
2 345 2013-01-22 15:52:00 status 456def foobar
Advantage: Easy inserts
Disadvantages: Rigid schema adapts poorly to
new activity types, doesn’t scale
Tuesday, January 22, 13
47. Activity Feed
Algorithm
1. Load user_feed_cards for all connections
Tuesday, January 22, 13
48. Activity Feed
Algorithm
1. Load user_feed_cards for all connections
2. Calculate which user_feed_months to load
Tuesday, January 22, 13
49. Activity Feed
Algorithm
1. Load user_feed_cards for all connections
2. Calculate which user_feed_months to load
3. Load user_feed_months
Tuesday, January 22, 13
50. Activity Feed
Algorithm
1. Load user_feed_cards for all connections
2. Calculate which user_feed_months to load
3. Load user_feed_months
4. Aggregate events that refer to the same story
Tuesday, January 22, 13
51. Activity Feed
Algorithm
1. Load user_feed_cards for all connections
2. Calculate which user_feed_months to load
3. Load user_feed_months
4. Aggregate events that refer to the same story
5. Sort (reverse chron)
Tuesday, January 22, 13
52. Activity Feed
Algorithm
1. Load user_feed_cards for all connections
2. Calculate which user_feed_months to load
3. Load user_feed_months
4. Aggregate events that refer to the same story
5. Sort (reverse chron)
6. Load content, comments, etc. and build stories
Tuesday, January 22, 13
54. Activity Feed
Performance
• Response times average under 500 ms (98th
percentile under 1 sec
Tuesday, January 22, 13
55. Activity Feed
Performance
• Response times average under 500 ms (98th
percentile under 1 sec
• Design expected to scale well horizontally
Tuesday, January 22, 13
56. Activity Feed
Performance
• Response times average under 500 ms (98th
percentile under 1 sec
• Design expected to scale well horizontally
• Need to continue to optimize
Tuesday, January 22, 13
57. Building Social Features
with MongoDB
Nathan Smith
BrO: http://branchout.com/nate
FB: http://facebook.com/neocortica
Twitter: @nate510
Email: nate@branchout.com
Aditya Agarwal on Facebook’s architecture: http://www.infoq.com/presentations/Facebook-Software-Stack
Dan McKinley on Etsy’s activity feed: http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture
Good Quora questions on activity feeds:
http://www.quora.com/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed
http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed
Tuesday, January 22, 13