SlideShare a Scribd company logo
1 of 57
Download to read offline
Building Social Features
                         with MongoDB
                              Nathan Smith
                             BranchOut.com
                              Jan. 22, 2013




Tuesday, January 22, 13
BranchOut
                          A more social professional network



                    • Connect with your colleagues (follow)
                    • Activity feed of their professional activity
                    • Timeline of an individual’s posts


Tuesday, January 22, 13
BranchOut
                          A more social professional network



                    • 30M installed users
                    • 750MM total user records
                    • Average 300 connections per installed user


Tuesday, January 22, 13
MongoDB @ BranchOut




Tuesday, January 22, 13
MongoDB @ BranchOut


                    • 100% MySQL until ~July 2012




Tuesday, January 22, 13
MongoDB @ BranchOut


                    • 100% MySQL until ~July 2012
                    • Much of our data fits well into a document
                          model




Tuesday, January 22, 13
MongoDB @ BranchOut


                    • 100% MySQL until ~July 2012
                    • Much of our data fits well into a document
                          model
                    • Our data design avoids RDBMS features


Tuesday, January 22, 13
Follow System




Tuesday, January 22, 13
Follow System
                             Business logic




Tuesday, January 22, 13
Follow System
                                   Business logic



                    • Limit of 2000 followees (people you follow)




Tuesday, January 22, 13
Follow System
                                   Business logic



                    • Limit of 2000 followees (people you follow)
                    • Unlimited followers



Tuesday, January 22, 13
Follow System
                                   Business logic



                    • Limit of 2000 followees (people you follow)
                    • Unlimited followers
                    • Both lists reflect updates in near-real time


Tuesday, January 22, 13
Follow System
                                Traditional RDBMS (i.e. MySQL)

                   follower_uid          followee_uid     follow_time
                          123                456        2013-01-22 15:43:00

                          456                123        2013-01-22 15:52:00




Tuesday, January 22, 13
Follow System
                                Traditional RDBMS (i.e. MySQL)

                   follower_uid          followee_uid     follow_time
                          123                456        2013-01-22 15:43:00

                          456                123        2013-01-22 15:52:00




         Advantage: Easy inserts, deletes




Tuesday, January 22, 13
Follow System
                                Traditional RDBMS (i.e. MySQL)

                   follower_uid          followee_uid     follow_time
                          123                456        2013-01-22 15:43:00

                          456                123        2013-01-22 15:52:00




         Advantage: Easy inserts, deletes

         Disadvantage: Data locality, index size



Tuesday, January 22, 13
Follow System
                           MongoDB (first pass)


                           followee: {
                             _id: 123
                             uids: [456, 567, 678]
                           }




Tuesday, January 22, 13
Follow System
                           MongoDB (first pass)


                           followee: {
                             _id: 123
                             uids: [456, 567, 678]
                           }




         Advantage: Compact data, read locality




Tuesday, January 22, 13
Follow System
                           MongoDB (first pass)


                           followee: {
                             _id: 123
                             uids: [456, 567, 678]
                           }




         Advantage: Compact data, read locality
         Disadvantage: Can’t display a user’s followers


Tuesday, January 22, 13
Follow System
                          Can’t display a user’s followers (easily)


                                   followee: {
                                     _id: 123
                                     uids: [456, 567, 678]
                                   }
                                                                ...with multi-key index on
                                                                uids




               db.follow.find({uids: 456}, {_id: 1});




Tuesday, January 22, 13
Follow System
                          Can’t display a user’s followers (easily)


                                   followee: {
                                     _id: 123
                                     uids: [456, 567, 678]
                                   }
                                                                ...with multi-key index on
                                                                uids




               db.follow.find({uids: 456}, {_id: 1});

                          Expensive! Also, no guarantee of order.

Tuesday, January 22, 13
Follow System
                                     MongoDB (second pass)
                                                 follower: {
                                                   _id: 1,
                          followee: {
                                                   uids: [2]
                            _id: 1,
                                                 },
                            uids: [2, 3]
                                                 follower: {
                          },
                                                    _id: 2,
                          followee: {
                                                    uids: [1]
                             _id: 2,
                                                 }
                             uids: [1, 3]
                                                 follower: {
                          }
                                                    _id: 3,
                                                    uids: [1, 2]
                                                 }




Tuesday, January 22, 13
Follow System
                                     MongoDB (second pass)
                                                 follower: {
                                                   _id: 1,
                          followee: {
                                                   uids: [2]
                            _id: 1,
                                                 },
                            uids: [2, 3]
                                                 follower: {
                          },
                                                    _id: 2,
                          followee: {
                                                    uids: [1]
                             _id: 2,
                                                 }
                             uids: [1, 3]
                                                 follower: {
                          }
                                                    _id: 3,
                                                    uids: [1, 2]
                                                 }



         Advantages: Local data, fast selects



Tuesday, January 22, 13
Follow System
                                     MongoDB (second pass)
                                                 follower: {
                                                   _id: 1,
                          followee: {
                                                   uids: [2]
                            _id: 1,
                                                 },
                            uids: [2, 3]
                                                 follower: {
                          },
                                                    _id: 2,
                          followee: {
                                                    uids: [1]
                             _id: 2,
                                                 }
                             uids: [1, 3]
                                                 follower: {
                          }
                                                    _id: 3,
                                                    uids: [1, 2]
                                                 }



         Advantages: Local data, fast selects
         Disadvantages: Follower doc size

Tuesday, January 22, 13
Follow System
                           Follower document size




Tuesday, January 22, 13
Follow System
                           Follower document size

                • Max Mongo doc size: 16MB




Tuesday, January 22, 13
Follow System
                                 Follower document size

                • Max Mongo doc size: 16MB
                • Number of people who follow our
                          community manager: 30MM




Tuesday, January 22, 13
Follow System
                                 Follower document size

                • Max Mongo doc size: 16MB
                • Number of people who follow our
                          community manager: 30MM
                • 30MM uids × 8 bytes/uid = 240MB


Tuesday, January 22, 13
Follow System
                                 Follower document size

                • Max Mongo doc size: 16MB
                • Number of people who follow our
                          community manager: 30MM
                • 30MM uids × 8 bytes/uid = 240MB
                • Max followers per doc: ~2MM

Tuesday, January 22, 13
Follow System
                                       MongoDB (final pass)
                                                  follower: {
                          followee: {               _id: “1”,
                            _id: 1,                 uids: [2,3,4,...],
                            uids: [2, 3]            count: 20001,
                          },                        next_page: 2
                          followee: {             },
                             _id: 2,              follower: {
                             uids: [1, 3]            _id: “1_p2”,
                          }                          uids: [23,24,25,...],
                                                     count: 10000
                                                  }




Tuesday, January 22, 13
Follow System
                                       MongoDB (final pass)
                                                  follower: {
                          followee: {               _id: “1”,
                            _id: 1,                 uids: [2,3,4,...],
                            uids: [2, 3]            count: 20001,
                                                            10001,
                          },                        next_page: 23
                          followee: {             },
                             _id: 2,              follower: {
                             uids: [1, 3]            _id: “1_p2”,
                          }                          uids: [23,24,25,...],
                                                     count: 10000
                                                  }




Tuesday, January 22, 13
Follow System
                                       MongoDB (final pass)
                                                  follower: {
                          followee: {               _id: “1”,
                            _id: 1,                 uids: [2,3,4,...],
                            uids: [2, 3]            count: 20001,
                                                            10001,
                          },                        next_page: 23
                          followee: {             },
                             _id: 2,              follower: {
                             uids: [1, 3]            _id: “1_p2”,
                          }                          uids: [23,24,25,...],
                                                     count: 10000
                                                  }




           Asynchronous thread manages follower documents


Tuesday, January 22, 13
Activity Feed




Tuesday, January 22, 13
Activity Feed
                          Push vs Pull architecture




Tuesday, January 22, 13
Activity Feed
                          Push vs Pull architecture




Tuesday, January 22, 13
Activity Feed
                          Push vs Pull architecture




Tuesday, January 22, 13
Activity Feed
                             Business logic




Tuesday, January 22, 13
Activity Feed
                                          Business logic

                •         All connections and followees appear in your feed




Tuesday, January 22, 13
Activity Feed
                                          Business logic

                •         All connections and followees appear in your feed

                •         Reverse chron sort order (but should support other
                          rankings)




Tuesday, January 22, 13
Activity Feed
                                           Business logic

                •         All connections and followees appear in your feed

                •         Reverse chron sort order (but should support other
                          rankings)

                •         Support for evolving set of feed event types




Tuesday, January 22, 13
Activity Feed
                                           Business logic

                •         All connections and followees appear in your feed

                •         Reverse chron sort order (but should support other
                          rankings)

                •         Support for evolving set of feed event types

                •         Tagging creates multiple feed events for the same
                          underlying object




Tuesday, January 22, 13
Activity Feed
                                           Business logic

                •         All connections and followees appear in your feed

                •         Reverse chron sort order (but should support other
                          rankings)

                •         Support for evolving set of feed event types

                •         Tagging creates multiple feed events for the same
                          underlying object

                •         Feed events are not ephemeral -- Timeline


Tuesday, January 22, 13
Activity Feed
                              Traditional RDBMS (i.e. MySQL)

           activity_id         uid     event_time          type     oid1     oid2
                          1    123   2013-01-22 15:43:00   photo    123abc   789ghi

                          2    345   2013-01-22 15:52:00   status   456def   foobar




Tuesday, January 22, 13
Activity Feed
                              Traditional RDBMS (i.e. MySQL)

           activity_id         uid     event_time          type     oid1     oid2
                          1    123   2013-01-22 15:43:00   photo    123abc   789ghi

                          2    345   2013-01-22 15:52:00   status   456def   foobar




         Advantage: Easy inserts




Tuesday, January 22, 13
Activity Feed
                              Traditional RDBMS (i.e. MySQL)

           activity_id         uid     event_time          type     oid1     oid2
                          1    123   2013-01-22 15:43:00   photo    123abc   789ghi

                          2    345   2013-01-22 15:52:00   status   456def   foobar




         Advantage: Easy inserts
         Disadvantages: Rigid schema adapts poorly to
         new activity types, doesn’t scale


Tuesday, January 22, 13
Activity Feed
                                            MongoDB

                          user_feed_card              user_feed_month

                     ufc:{                      ufm:{
                       _id: 123, // UID           _id: “123_2013_01”,
                       total_events: 18,          events: [
                       2013_01_total: 4,            {
                       2012_12_total: 8,              uid: 123,
                       2012_11_total: 6,              type: “photo_upload”,
                       ...other counts...             content_id: “abcd9876”,
                     }                                timestamp: 1358824502,
                                                      ...more metadata...
                                                    },
                                                    ...more events...
                                                  ]
                                                }




Tuesday, January 22, 13
Activity Feed
                             Algorithm




Tuesday, January 22, 13
Activity Feed
                                      Algorithm
                1. Load user_feed_cards for all connections




Tuesday, January 22, 13
Activity Feed
                                      Algorithm
                1. Load user_feed_cards for all connections
                2. Calculate which user_feed_months to load




Tuesday, January 22, 13
Activity Feed
                                      Algorithm
                1. Load user_feed_cards for all connections
                2. Calculate which user_feed_months to load
                3. Load user_feed_months




Tuesday, January 22, 13
Activity Feed
                                      Algorithm
                1. Load user_feed_cards for all connections
                2. Calculate which user_feed_months to load
                3. Load user_feed_months
                4. Aggregate events that refer to the same story




Tuesday, January 22, 13
Activity Feed
                                      Algorithm
                1. Load user_feed_cards for all connections
                2. Calculate which user_feed_months to load
                3. Load user_feed_months
                4. Aggregate events that refer to the same story
                5. Sort (reverse chron)




Tuesday, January 22, 13
Activity Feed
                                      Algorithm
                1. Load user_feed_cards for all connections
                2. Calculate which user_feed_months to load
                3. Load user_feed_months
                4. Aggregate events that refer to the same story
                5. Sort (reverse chron)
                6. Load content, comments, etc. and build stories



Tuesday, January 22, 13
Activity Feed
                             Performance




Tuesday, January 22, 13
Activity Feed
                                        Performance


                • Response times average under 500 ms (98th
                          percentile under 1 sec




Tuesday, January 22, 13
Activity Feed
                                        Performance


                • Response times average under 500 ms (98th
                          percentile under 1 sec
                • Design expected to scale well horizontally



Tuesday, January 22, 13
Activity Feed
                                        Performance


                • Response times average under 500 ms (98th
                          percentile under 1 sec
                • Design expected to scale well horizontally
                • Need to continue to optimize


Tuesday, January 22, 13
Building Social Features
                         with MongoDB
                                                                  Nathan Smith
                                                       BrO: http://branchout.com/nate
                                                      FB: http://facebook.com/neocortica
                                                              Twitter: @nate510
                                                        Email: nate@branchout.com




                    Aditya Agarwal on Facebook’s architecture: http://www.infoq.com/presentations/Facebook-Software-Stack

                   Dan McKinley on Etsy’s activity feed: http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture

                                                 Good Quora questions on activity feeds:
                   http://www.quora.com/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed
                            http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed




Tuesday, January 22, 13

More Related Content

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Creating social features at BranchOut using MongoDB

  • 1. Building Social Features with MongoDB Nathan Smith BranchOut.com Jan. 22, 2013 Tuesday, January 22, 13
  • 2. BranchOut A more social professional network • Connect with your colleagues (follow) • Activity feed of their professional activity • Timeline of an individual’s posts Tuesday, January 22, 13
  • 3. BranchOut A more social professional network • 30M installed users • 750MM total user records • Average 300 connections per installed user Tuesday, January 22, 13
  • 5. MongoDB @ BranchOut • 100% MySQL until ~July 2012 Tuesday, January 22, 13
  • 6. MongoDB @ BranchOut • 100% MySQL until ~July 2012 • Much of our data fits well into a document model Tuesday, January 22, 13
  • 7. MongoDB @ BranchOut • 100% MySQL until ~July 2012 • Much of our data fits well into a document model • Our data design avoids RDBMS features Tuesday, January 22, 13
  • 9. Follow System Business logic Tuesday, January 22, 13
  • 10. Follow System Business logic • Limit of 2000 followees (people you follow) Tuesday, January 22, 13
  • 11. Follow System Business logic • Limit of 2000 followees (people you follow) • Unlimited followers Tuesday, January 22, 13
  • 12. Follow System Business logic • Limit of 2000 followees (people you follow) • Unlimited followers • Both lists reflect updates in near-real time Tuesday, January 22, 13
  • 13. Follow System Traditional RDBMS (i.e. MySQL) follower_uid followee_uid follow_time 123 456 2013-01-22 15:43:00 456 123 2013-01-22 15:52:00 Tuesday, January 22, 13
  • 14. Follow System Traditional RDBMS (i.e. MySQL) follower_uid followee_uid follow_time 123 456 2013-01-22 15:43:00 456 123 2013-01-22 15:52:00 Advantage: Easy inserts, deletes Tuesday, January 22, 13
  • 15. Follow System Traditional RDBMS (i.e. MySQL) follower_uid followee_uid follow_time 123 456 2013-01-22 15:43:00 456 123 2013-01-22 15:52:00 Advantage: Easy inserts, deletes Disadvantage: Data locality, index size Tuesday, January 22, 13
  • 16. Follow System MongoDB (first pass) followee: { _id: 123 uids: [456, 567, 678] } Tuesday, January 22, 13
  • 17. Follow System MongoDB (first pass) followee: { _id: 123 uids: [456, 567, 678] } Advantage: Compact data, read locality Tuesday, January 22, 13
  • 18. Follow System MongoDB (first pass) followee: { _id: 123 uids: [456, 567, 678] } Advantage: Compact data, read locality Disadvantage: Can’t display a user’s followers Tuesday, January 22, 13
  • 19. Follow System Can’t display a user’s followers (easily) followee: { _id: 123 uids: [456, 567, 678] } ...with multi-key index on uids db.follow.find({uids: 456}, {_id: 1}); Tuesday, January 22, 13
  • 20. Follow System Can’t display a user’s followers (easily) followee: { _id: 123 uids: [456, 567, 678] } ...with multi-key index on uids db.follow.find({uids: 456}, {_id: 1}); Expensive! Also, no guarantee of order. Tuesday, January 22, 13
  • 21. Follow System MongoDB (second pass) follower: { _id: 1, followee: { uids: [2] _id: 1, }, uids: [2, 3] follower: { }, _id: 2, followee: { uids: [1] _id: 2, } uids: [1, 3] follower: { } _id: 3, uids: [1, 2] } Tuesday, January 22, 13
  • 22. Follow System MongoDB (second pass) follower: { _id: 1, followee: { uids: [2] _id: 1, }, uids: [2, 3] follower: { }, _id: 2, followee: { uids: [1] _id: 2, } uids: [1, 3] follower: { } _id: 3, uids: [1, 2] } Advantages: Local data, fast selects Tuesday, January 22, 13
  • 23. Follow System MongoDB (second pass) follower: { _id: 1, followee: { uids: [2] _id: 1, }, uids: [2, 3] follower: { }, _id: 2, followee: { uids: [1] _id: 2, } uids: [1, 3] follower: { } _id: 3, uids: [1, 2] } Advantages: Local data, fast selects Disadvantages: Follower doc size Tuesday, January 22, 13
  • 24. Follow System Follower document size Tuesday, January 22, 13
  • 25. Follow System Follower document size • Max Mongo doc size: 16MB Tuesday, January 22, 13
  • 26. Follow System Follower document size • Max Mongo doc size: 16MB • Number of people who follow our community manager: 30MM Tuesday, January 22, 13
  • 27. Follow System Follower document size • Max Mongo doc size: 16MB • Number of people who follow our community manager: 30MM • 30MM uids × 8 bytes/uid = 240MB Tuesday, January 22, 13
  • 28. Follow System Follower document size • Max Mongo doc size: 16MB • Number of people who follow our community manager: 30MM • 30MM uids × 8 bytes/uid = 240MB • Max followers per doc: ~2MM Tuesday, January 22, 13
  • 29. Follow System MongoDB (final pass) follower: { followee: { _id: “1”, _id: 1, uids: [2,3,4,...], uids: [2, 3] count: 20001, }, next_page: 2 followee: { }, _id: 2, follower: { uids: [1, 3] _id: “1_p2”, } uids: [23,24,25,...], count: 10000 } Tuesday, January 22, 13
  • 30. Follow System MongoDB (final pass) follower: { followee: { _id: “1”, _id: 1, uids: [2,3,4,...], uids: [2, 3] count: 20001, 10001, }, next_page: 23 followee: { }, _id: 2, follower: { uids: [1, 3] _id: “1_p2”, } uids: [23,24,25,...], count: 10000 } Tuesday, January 22, 13
  • 31. Follow System MongoDB (final pass) follower: { followee: { _id: “1”, _id: 1, uids: [2,3,4,...], uids: [2, 3] count: 20001, 10001, }, next_page: 23 followee: { }, _id: 2, follower: { uids: [1, 3] _id: “1_p2”, } uids: [23,24,25,...], count: 10000 } Asynchronous thread manages follower documents Tuesday, January 22, 13
  • 33. Activity Feed Push vs Pull architecture Tuesday, January 22, 13
  • 34. Activity Feed Push vs Pull architecture Tuesday, January 22, 13
  • 35. Activity Feed Push vs Pull architecture Tuesday, January 22, 13
  • 36. Activity Feed Business logic Tuesday, January 22, 13
  • 37. Activity Feed Business logic • All connections and followees appear in your feed Tuesday, January 22, 13
  • 38. Activity Feed Business logic • All connections and followees appear in your feed • Reverse chron sort order (but should support other rankings) Tuesday, January 22, 13
  • 39. Activity Feed Business logic • All connections and followees appear in your feed • Reverse chron sort order (but should support other rankings) • Support for evolving set of feed event types Tuesday, January 22, 13
  • 40. Activity Feed Business logic • All connections and followees appear in your feed • Reverse chron sort order (but should support other rankings) • Support for evolving set of feed event types • Tagging creates multiple feed events for the same underlying object Tuesday, January 22, 13
  • 41. Activity Feed Business logic • All connections and followees appear in your feed • Reverse chron sort order (but should support other rankings) • Support for evolving set of feed event types • Tagging creates multiple feed events for the same underlying object • Feed events are not ephemeral -- Timeline Tuesday, January 22, 13
  • 42. Activity Feed Traditional RDBMS (i.e. MySQL) activity_id uid event_time type oid1 oid2 1 123 2013-01-22 15:43:00 photo 123abc 789ghi 2 345 2013-01-22 15:52:00 status 456def foobar Tuesday, January 22, 13
  • 43. Activity Feed Traditional RDBMS (i.e. MySQL) activity_id uid event_time type oid1 oid2 1 123 2013-01-22 15:43:00 photo 123abc 789ghi 2 345 2013-01-22 15:52:00 status 456def foobar Advantage: Easy inserts Tuesday, January 22, 13
  • 44. Activity Feed Traditional RDBMS (i.e. MySQL) activity_id uid event_time type oid1 oid2 1 123 2013-01-22 15:43:00 photo 123abc 789ghi 2 345 2013-01-22 15:52:00 status 456def foobar Advantage: Easy inserts Disadvantages: Rigid schema adapts poorly to new activity types, doesn’t scale Tuesday, January 22, 13
  • 45. Activity Feed MongoDB user_feed_card user_feed_month ufc:{ ufm:{ _id: 123, // UID _id: “123_2013_01”, total_events: 18, events: [ 2013_01_total: 4, { 2012_12_total: 8, uid: 123, 2012_11_total: 6, type: “photo_upload”, ...other counts... content_id: “abcd9876”, } timestamp: 1358824502, ...more metadata... }, ...more events... ] } Tuesday, January 22, 13
  • 46. Activity Feed Algorithm Tuesday, January 22, 13
  • 47. Activity Feed Algorithm 1. Load user_feed_cards for all connections Tuesday, January 22, 13
  • 48. Activity Feed Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load Tuesday, January 22, 13
  • 49. Activity Feed Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load 3. Load user_feed_months Tuesday, January 22, 13
  • 50. Activity Feed Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load 3. Load user_feed_months 4. Aggregate events that refer to the same story Tuesday, January 22, 13
  • 51. Activity Feed Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load 3. Load user_feed_months 4. Aggregate events that refer to the same story 5. Sort (reverse chron) Tuesday, January 22, 13
  • 52. Activity Feed Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load 3. Load user_feed_months 4. Aggregate events that refer to the same story 5. Sort (reverse chron) 6. Load content, comments, etc. and build stories Tuesday, January 22, 13
  • 53. Activity Feed Performance Tuesday, January 22, 13
  • 54. Activity Feed Performance • Response times average under 500 ms (98th percentile under 1 sec Tuesday, January 22, 13
  • 55. Activity Feed Performance • Response times average under 500 ms (98th percentile under 1 sec • Design expected to scale well horizontally Tuesday, January 22, 13
  • 56. Activity Feed Performance • Response times average under 500 ms (98th percentile under 1 sec • Design expected to scale well horizontally • Need to continue to optimize Tuesday, January 22, 13
  • 57. Building Social Features with MongoDB Nathan Smith BrO: http://branchout.com/nate FB: http://facebook.com/neocortica Twitter: @nate510 Email: nate@branchout.com Aditya Agarwal on Facebook’s architecture: http://www.infoq.com/presentations/Facebook-Software-Stack Dan McKinley on Etsy’s activity feed: http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture Good Quora questions on activity feeds: http://www.quora.com/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed Tuesday, January 22, 13