SlideShare a Scribd company logo
1 of 57
Download to read offline
Building Social Features
                         with MongoDB
                              Nathan Smith
                             BranchOut.com
                              Jan. 22, 2013




Tuesday, January 22, 13
BranchOut
                          A more social professional network



                    • Connect with your colleagues (follow)
                    • Activity feed of their professional activity
                    • Timeline of an individual’s posts


Tuesday, January 22, 13
BranchOut
                          A more social professional network



                    • 30M installed users
                    • 750MM total user records
                    • Average 300 connections per installed user


Tuesday, January 22, 13
MongoDB @ BranchOut




Tuesday, January 22, 13
MongoDB @ BranchOut


                    • 100% MySQL until ~July 2012




Tuesday, January 22, 13
MongoDB @ BranchOut


                    • 100% MySQL until ~July 2012
                    • Much of our data fits well into a document
                          model




Tuesday, January 22, 13
MongoDB @ BranchOut


                    • 100% MySQL until ~July 2012
                    • Much of our data fits well into a document
                          model
                    • Our data design avoids RDBMS features


Tuesday, January 22, 13
Follow System




Tuesday, January 22, 13
Follow System
                             Business logic




Tuesday, January 22, 13
Follow System
                                   Business logic



                    • Limit of 2000 followees (people you follow)




Tuesday, January 22, 13
Follow System
                                   Business logic



                    • Limit of 2000 followees (people you follow)
                    • Unlimited followers



Tuesday, January 22, 13
Follow System
                                   Business logic



                    • Limit of 2000 followees (people you follow)
                    • Unlimited followers
                    • Both lists reflect updates in near-real time


Tuesday, January 22, 13
Follow System
                                Traditional RDBMS (i.e. MySQL)

                   follower_uid          followee_uid     follow_time
                          123                456        2013-01-22 15:43:00

                          456                123        2013-01-22 15:52:00




Tuesday, January 22, 13
Follow System
                                Traditional RDBMS (i.e. MySQL)

                   follower_uid          followee_uid     follow_time
                          123                456        2013-01-22 15:43:00

                          456                123        2013-01-22 15:52:00




         Advantage: Easy inserts, deletes




Tuesday, January 22, 13
Follow System
                                Traditional RDBMS (i.e. MySQL)

                   follower_uid          followee_uid     follow_time
                          123                456        2013-01-22 15:43:00

                          456                123        2013-01-22 15:52:00




         Advantage: Easy inserts, deletes

         Disadvantage: Data locality, index size



Tuesday, January 22, 13
Follow System
                           MongoDB (first pass)


                           followee: {
                             _id: 123
                             uids: [456, 567, 678]
                           }




Tuesday, January 22, 13
Follow System
                           MongoDB (first pass)


                           followee: {
                             _id: 123
                             uids: [456, 567, 678]
                           }




         Advantage: Compact data, read locality




Tuesday, January 22, 13
Follow System
                           MongoDB (first pass)


                           followee: {
                             _id: 123
                             uids: [456, 567, 678]
                           }




         Advantage: Compact data, read locality
         Disadvantage: Can’t display a user’s followers


Tuesday, January 22, 13
Follow System
                          Can’t display a user’s followers (easily)


                                   followee: {
                                     _id: 123
                                     uids: [456, 567, 678]
                                   }
                                                                ...with multi-key index on
                                                                uids




               db.follow.find({uids: 456}, {_id: 1});




Tuesday, January 22, 13
Follow System
                          Can’t display a user’s followers (easily)


                                   followee: {
                                     _id: 123
                                     uids: [456, 567, 678]
                                   }
                                                                ...with multi-key index on
                                                                uids




               db.follow.find({uids: 456}, {_id: 1});

                          Expensive! Also, no guarantee of order.

Tuesday, January 22, 13
Follow System
                                     MongoDB (second pass)
                                                 follower: {
                                                   _id: 1,
                          followee: {
                                                   uids: [2]
                            _id: 1,
                                                 },
                            uids: [2, 3]
                                                 follower: {
                          },
                                                    _id: 2,
                          followee: {
                                                    uids: [1]
                             _id: 2,
                                                 }
                             uids: [1, 3]
                                                 follower: {
                          }
                                                    _id: 3,
                                                    uids: [1, 2]
                                                 }




Tuesday, January 22, 13
Follow System
                                     MongoDB (second pass)
                                                 follower: {
                                                   _id: 1,
                          followee: {
                                                   uids: [2]
                            _id: 1,
                                                 },
                            uids: [2, 3]
                                                 follower: {
                          },
                                                    _id: 2,
                          followee: {
                                                    uids: [1]
                             _id: 2,
                                                 }
                             uids: [1, 3]
                                                 follower: {
                          }
                                                    _id: 3,
                                                    uids: [1, 2]
                                                 }



         Advantages: Local data, fast selects



Tuesday, January 22, 13
Follow System
                                     MongoDB (second pass)
                                                 follower: {
                                                   _id: 1,
                          followee: {
                                                   uids: [2]
                            _id: 1,
                                                 },
                            uids: [2, 3]
                                                 follower: {
                          },
                                                    _id: 2,
                          followee: {
                                                    uids: [1]
                             _id: 2,
                                                 }
                             uids: [1, 3]
                                                 follower: {
                          }
                                                    _id: 3,
                                                    uids: [1, 2]
                                                 }



         Advantages: Local data, fast selects
         Disadvantages: Follower doc size

Tuesday, January 22, 13
Follow System
                           Follower document size




Tuesday, January 22, 13
Follow System
                           Follower document size

                • Max Mongo doc size: 16MB




Tuesday, January 22, 13
Follow System
                                 Follower document size

                • Max Mongo doc size: 16MB
                • Number of people who follow our
                          community manager: 30MM




Tuesday, January 22, 13
Follow System
                                 Follower document size

                • Max Mongo doc size: 16MB
                • Number of people who follow our
                          community manager: 30MM
                • 30MM uids × 8 bytes/uid = 240MB


Tuesday, January 22, 13
Follow System
                                 Follower document size

                • Max Mongo doc size: 16MB
                • Number of people who follow our
                          community manager: 30MM
                • 30MM uids × 8 bytes/uid = 240MB
                • Max followers per doc: ~2MM

Tuesday, January 22, 13
Follow System
                                       MongoDB (final pass)
                                                  follower: {
                          followee: {               _id: “1”,
                            _id: 1,                 uids: [2,3,4,...],
                            uids: [2, 3]            count: 20001,
                          },                        next_page: 2
                          followee: {             },
                             _id: 2,              follower: {
                             uids: [1, 3]            _id: “1_p2”,
                          }                          uids: [23,24,25,...],
                                                     count: 10000
                                                  }




Tuesday, January 22, 13
Follow System
                                       MongoDB (final pass)
                                                  follower: {
                          followee: {               _id: “1”,
                            _id: 1,                 uids: [2,3,4,...],
                            uids: [2, 3]            count: 20001,
                                                            10001,
                          },                        next_page: 23
                          followee: {             },
                             _id: 2,              follower: {
                             uids: [1, 3]            _id: “1_p2”,
                          }                          uids: [23,24,25,...],
                                                     count: 10000
                                                  }




Tuesday, January 22, 13
Follow System
                                       MongoDB (final pass)
                                                  follower: {
                          followee: {               _id: “1”,
                            _id: 1,                 uids: [2,3,4,...],
                            uids: [2, 3]            count: 20001,
                                                            10001,
                          },                        next_page: 23
                          followee: {             },
                             _id: 2,              follower: {
                             uids: [1, 3]            _id: “1_p2”,
                          }                          uids: [23,24,25,...],
                                                     count: 10000
                                                  }




           Asynchronous thread manages follower documents


Tuesday, January 22, 13
Activity Feed




Tuesday, January 22, 13
Activity Feed
                          Push vs Pull architecture




Tuesday, January 22, 13
Activity Feed
                          Push vs Pull architecture




Tuesday, January 22, 13
Activity Feed
                          Push vs Pull architecture




Tuesday, January 22, 13
Activity Feed
                             Business logic




Tuesday, January 22, 13
Activity Feed
                                          Business logic

                •         All connections and followees appear in your feed




Tuesday, January 22, 13
Activity Feed
                                          Business logic

                •         All connections and followees appear in your feed

                •         Reverse chron sort order (but should support other
                          rankings)




Tuesday, January 22, 13
Activity Feed
                                           Business logic

                •         All connections and followees appear in your feed

                •         Reverse chron sort order (but should support other
                          rankings)

                •         Support for evolving set of feed event types




Tuesday, January 22, 13
Activity Feed
                                           Business logic

                •         All connections and followees appear in your feed

                •         Reverse chron sort order (but should support other
                          rankings)

                •         Support for evolving set of feed event types

                •         Tagging creates multiple feed events for the same
                          underlying object




Tuesday, January 22, 13
Activity Feed
                                           Business logic

                •         All connections and followees appear in your feed

                •         Reverse chron sort order (but should support other
                          rankings)

                •         Support for evolving set of feed event types

                •         Tagging creates multiple feed events for the same
                          underlying object

                •         Feed events are not ephemeral -- Timeline


Tuesday, January 22, 13
Activity Feed
                              Traditional RDBMS (i.e. MySQL)

           activity_id         uid     event_time          type     oid1     oid2
                          1    123   2013-01-22 15:43:00   photo    123abc   789ghi

                          2    345   2013-01-22 15:52:00   status   456def   foobar




Tuesday, January 22, 13
Activity Feed
                              Traditional RDBMS (i.e. MySQL)

           activity_id         uid     event_time          type     oid1     oid2
                          1    123   2013-01-22 15:43:00   photo    123abc   789ghi

                          2    345   2013-01-22 15:52:00   status   456def   foobar




         Advantage: Easy inserts




Tuesday, January 22, 13
Activity Feed
                              Traditional RDBMS (i.e. MySQL)

           activity_id         uid     event_time          type     oid1     oid2
                          1    123   2013-01-22 15:43:00   photo    123abc   789ghi

                          2    345   2013-01-22 15:52:00   status   456def   foobar




         Advantage: Easy inserts
         Disadvantages: Rigid schema adapts poorly to
         new activity types, doesn’t scale


Tuesday, January 22, 13
Activity Feed
                                            MongoDB

                          user_feed_card              user_feed_month

                     ufc:{                      ufm:{
                       _id: 123, // UID           _id: “123_2013_01”,
                       total_events: 18,          events: [
                       2013_01_total: 4,            {
                       2012_12_total: 8,              uid: 123,
                       2012_11_total: 6,              type: “photo_upload”,
                       ...other counts...             content_id: “abcd9876”,
                     }                                timestamp: 1358824502,
                                                      ...more metadata...
                                                    },
                                                    ...more events...
                                                  ]
                                                }




Tuesday, January 22, 13
Activity Feed
                             Algorithm




Tuesday, January 22, 13
Activity Feed
                                      Algorithm
                1. Load user_feed_cards for all connections




Tuesday, January 22, 13
Activity Feed
                                      Algorithm
                1. Load user_feed_cards for all connections
                2. Calculate which user_feed_months to load




Tuesday, January 22, 13
Activity Feed
                                      Algorithm
                1. Load user_feed_cards for all connections
                2. Calculate which user_feed_months to load
                3. Load user_feed_months




Tuesday, January 22, 13
Activity Feed
                                      Algorithm
                1. Load user_feed_cards for all connections
                2. Calculate which user_feed_months to load
                3. Load user_feed_months
                4. Aggregate events that refer to the same story




Tuesday, January 22, 13
Activity Feed
                                      Algorithm
                1. Load user_feed_cards for all connections
                2. Calculate which user_feed_months to load
                3. Load user_feed_months
                4. Aggregate events that refer to the same story
                5. Sort (reverse chron)




Tuesday, January 22, 13
Activity Feed
                                      Algorithm
                1. Load user_feed_cards for all connections
                2. Calculate which user_feed_months to load
                3. Load user_feed_months
                4. Aggregate events that refer to the same story
                5. Sort (reverse chron)
                6. Load content, comments, etc. and build stories



Tuesday, January 22, 13
Activity Feed
                             Performance




Tuesday, January 22, 13
Activity Feed
                                        Performance


                • Response times average under 500 ms (98th
                          percentile under 1 sec




Tuesday, January 22, 13
Activity Feed
                                        Performance


                • Response times average under 500 ms (98th
                          percentile under 1 sec
                • Design expected to scale well horizontally



Tuesday, January 22, 13
Activity Feed
                                        Performance


                • Response times average under 500 ms (98th
                          percentile under 1 sec
                • Design expected to scale well horizontally
                • Need to continue to optimize


Tuesday, January 22, 13
Building Social Features
                         with MongoDB
                                                                  Nathan Smith
                                                       BrO: http://branchout.com/nate
                                                      FB: http://facebook.com/neocortica
                                                              Twitter: @nate510
                                                        Email: nate@branchout.com




                    Aditya Agarwal on Facebook’s architecture: http://www.infoq.com/presentations/Facebook-Software-Stack

                   Dan McKinley on Etsy’s activity feed: http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture

                                                 Good Quora questions on activity feeds:
                   http://www.quora.com/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed
                            http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed




Tuesday, January 22, 13

More Related Content

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Creating social features at BranchOut using MongoDB

  • 1. Building Social Features with MongoDB Nathan Smith BranchOut.com Jan. 22, 2013 Tuesday, January 22, 13
  • 2. BranchOut A more social professional network • Connect with your colleagues (follow) • Activity feed of their professional activity • Timeline of an individual’s posts Tuesday, January 22, 13
  • 3. BranchOut A more social professional network • 30M installed users • 750MM total user records • Average 300 connections per installed user Tuesday, January 22, 13
  • 5. MongoDB @ BranchOut • 100% MySQL until ~July 2012 Tuesday, January 22, 13
  • 6. MongoDB @ BranchOut • 100% MySQL until ~July 2012 • Much of our data fits well into a document model Tuesday, January 22, 13
  • 7. MongoDB @ BranchOut • 100% MySQL until ~July 2012 • Much of our data fits well into a document model • Our data design avoids RDBMS features Tuesday, January 22, 13
  • 9. Follow System Business logic Tuesday, January 22, 13
  • 10. Follow System Business logic • Limit of 2000 followees (people you follow) Tuesday, January 22, 13
  • 11. Follow System Business logic • Limit of 2000 followees (people you follow) • Unlimited followers Tuesday, January 22, 13
  • 12. Follow System Business logic • Limit of 2000 followees (people you follow) • Unlimited followers • Both lists reflect updates in near-real time Tuesday, January 22, 13
  • 13. Follow System Traditional RDBMS (i.e. MySQL) follower_uid followee_uid follow_time 123 456 2013-01-22 15:43:00 456 123 2013-01-22 15:52:00 Tuesday, January 22, 13
  • 14. Follow System Traditional RDBMS (i.e. MySQL) follower_uid followee_uid follow_time 123 456 2013-01-22 15:43:00 456 123 2013-01-22 15:52:00 Advantage: Easy inserts, deletes Tuesday, January 22, 13
  • 15. Follow System Traditional RDBMS (i.e. MySQL) follower_uid followee_uid follow_time 123 456 2013-01-22 15:43:00 456 123 2013-01-22 15:52:00 Advantage: Easy inserts, deletes Disadvantage: Data locality, index size Tuesday, January 22, 13
  • 16. Follow System MongoDB (first pass) followee: { _id: 123 uids: [456, 567, 678] } Tuesday, January 22, 13
  • 17. Follow System MongoDB (first pass) followee: { _id: 123 uids: [456, 567, 678] } Advantage: Compact data, read locality Tuesday, January 22, 13
  • 18. Follow System MongoDB (first pass) followee: { _id: 123 uids: [456, 567, 678] } Advantage: Compact data, read locality Disadvantage: Can’t display a user’s followers Tuesday, January 22, 13
  • 19. Follow System Can’t display a user’s followers (easily) followee: { _id: 123 uids: [456, 567, 678] } ...with multi-key index on uids db.follow.find({uids: 456}, {_id: 1}); Tuesday, January 22, 13
  • 20. Follow System Can’t display a user’s followers (easily) followee: { _id: 123 uids: [456, 567, 678] } ...with multi-key index on uids db.follow.find({uids: 456}, {_id: 1}); Expensive! Also, no guarantee of order. Tuesday, January 22, 13
  • 21. Follow System MongoDB (second pass) follower: { _id: 1, followee: { uids: [2] _id: 1, }, uids: [2, 3] follower: { }, _id: 2, followee: { uids: [1] _id: 2, } uids: [1, 3] follower: { } _id: 3, uids: [1, 2] } Tuesday, January 22, 13
  • 22. Follow System MongoDB (second pass) follower: { _id: 1, followee: { uids: [2] _id: 1, }, uids: [2, 3] follower: { }, _id: 2, followee: { uids: [1] _id: 2, } uids: [1, 3] follower: { } _id: 3, uids: [1, 2] } Advantages: Local data, fast selects Tuesday, January 22, 13
  • 23. Follow System MongoDB (second pass) follower: { _id: 1, followee: { uids: [2] _id: 1, }, uids: [2, 3] follower: { }, _id: 2, followee: { uids: [1] _id: 2, } uids: [1, 3] follower: { } _id: 3, uids: [1, 2] } Advantages: Local data, fast selects Disadvantages: Follower doc size Tuesday, January 22, 13
  • 24. Follow System Follower document size Tuesday, January 22, 13
  • 25. Follow System Follower document size • Max Mongo doc size: 16MB Tuesday, January 22, 13
  • 26. Follow System Follower document size • Max Mongo doc size: 16MB • Number of people who follow our community manager: 30MM Tuesday, January 22, 13
  • 27. Follow System Follower document size • Max Mongo doc size: 16MB • Number of people who follow our community manager: 30MM • 30MM uids × 8 bytes/uid = 240MB Tuesday, January 22, 13
  • 28. Follow System Follower document size • Max Mongo doc size: 16MB • Number of people who follow our community manager: 30MM • 30MM uids × 8 bytes/uid = 240MB • Max followers per doc: ~2MM Tuesday, January 22, 13
  • 29. Follow System MongoDB (final pass) follower: { followee: { _id: “1”, _id: 1, uids: [2,3,4,...], uids: [2, 3] count: 20001, }, next_page: 2 followee: { }, _id: 2, follower: { uids: [1, 3] _id: “1_p2”, } uids: [23,24,25,...], count: 10000 } Tuesday, January 22, 13
  • 30. Follow System MongoDB (final pass) follower: { followee: { _id: “1”, _id: 1, uids: [2,3,4,...], uids: [2, 3] count: 20001, 10001, }, next_page: 23 followee: { }, _id: 2, follower: { uids: [1, 3] _id: “1_p2”, } uids: [23,24,25,...], count: 10000 } Tuesday, January 22, 13
  • 31. Follow System MongoDB (final pass) follower: { followee: { _id: “1”, _id: 1, uids: [2,3,4,...], uids: [2, 3] count: 20001, 10001, }, next_page: 23 followee: { }, _id: 2, follower: { uids: [1, 3] _id: “1_p2”, } uids: [23,24,25,...], count: 10000 } Asynchronous thread manages follower documents Tuesday, January 22, 13
  • 33. Activity Feed Push vs Pull architecture Tuesday, January 22, 13
  • 34. Activity Feed Push vs Pull architecture Tuesday, January 22, 13
  • 35. Activity Feed Push vs Pull architecture Tuesday, January 22, 13
  • 36. Activity Feed Business logic Tuesday, January 22, 13
  • 37. Activity Feed Business logic • All connections and followees appear in your feed Tuesday, January 22, 13
  • 38. Activity Feed Business logic • All connections and followees appear in your feed • Reverse chron sort order (but should support other rankings) Tuesday, January 22, 13
  • 39. Activity Feed Business logic • All connections and followees appear in your feed • Reverse chron sort order (but should support other rankings) • Support for evolving set of feed event types Tuesday, January 22, 13
  • 40. Activity Feed Business logic • All connections and followees appear in your feed • Reverse chron sort order (but should support other rankings) • Support for evolving set of feed event types • Tagging creates multiple feed events for the same underlying object Tuesday, January 22, 13
  • 41. Activity Feed Business logic • All connections and followees appear in your feed • Reverse chron sort order (but should support other rankings) • Support for evolving set of feed event types • Tagging creates multiple feed events for the same underlying object • Feed events are not ephemeral -- Timeline Tuesday, January 22, 13
  • 42. Activity Feed Traditional RDBMS (i.e. MySQL) activity_id uid event_time type oid1 oid2 1 123 2013-01-22 15:43:00 photo 123abc 789ghi 2 345 2013-01-22 15:52:00 status 456def foobar Tuesday, January 22, 13
  • 43. Activity Feed Traditional RDBMS (i.e. MySQL) activity_id uid event_time type oid1 oid2 1 123 2013-01-22 15:43:00 photo 123abc 789ghi 2 345 2013-01-22 15:52:00 status 456def foobar Advantage: Easy inserts Tuesday, January 22, 13
  • 44. Activity Feed Traditional RDBMS (i.e. MySQL) activity_id uid event_time type oid1 oid2 1 123 2013-01-22 15:43:00 photo 123abc 789ghi 2 345 2013-01-22 15:52:00 status 456def foobar Advantage: Easy inserts Disadvantages: Rigid schema adapts poorly to new activity types, doesn’t scale Tuesday, January 22, 13
  • 45. Activity Feed MongoDB user_feed_card user_feed_month ufc:{ ufm:{ _id: 123, // UID _id: “123_2013_01”, total_events: 18, events: [ 2013_01_total: 4, { 2012_12_total: 8, uid: 123, 2012_11_total: 6, type: “photo_upload”, ...other counts... content_id: “abcd9876”, } timestamp: 1358824502, ...more metadata... }, ...more events... ] } Tuesday, January 22, 13
  • 46. Activity Feed Algorithm Tuesday, January 22, 13
  • 47. Activity Feed Algorithm 1. Load user_feed_cards for all connections Tuesday, January 22, 13
  • 48. Activity Feed Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load Tuesday, January 22, 13
  • 49. Activity Feed Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load 3. Load user_feed_months Tuesday, January 22, 13
  • 50. Activity Feed Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load 3. Load user_feed_months 4. Aggregate events that refer to the same story Tuesday, January 22, 13
  • 51. Activity Feed Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load 3. Load user_feed_months 4. Aggregate events that refer to the same story 5. Sort (reverse chron) Tuesday, January 22, 13
  • 52. Activity Feed Algorithm 1. Load user_feed_cards for all connections 2. Calculate which user_feed_months to load 3. Load user_feed_months 4. Aggregate events that refer to the same story 5. Sort (reverse chron) 6. Load content, comments, etc. and build stories Tuesday, January 22, 13
  • 53. Activity Feed Performance Tuesday, January 22, 13
  • 54. Activity Feed Performance • Response times average under 500 ms (98th percentile under 1 sec Tuesday, January 22, 13
  • 55. Activity Feed Performance • Response times average under 500 ms (98th percentile under 1 sec • Design expected to scale well horizontally Tuesday, January 22, 13
  • 56. Activity Feed Performance • Response times average under 500 ms (98th percentile under 1 sec • Design expected to scale well horizontally • Need to continue to optimize Tuesday, January 22, 13
  • 57. Building Social Features with MongoDB Nathan Smith BrO: http://branchout.com/nate FB: http://facebook.com/neocortica Twitter: @nate510 Email: nate@branchout.com Aditya Agarwal on Facebook’s architecture: http://www.infoq.com/presentations/Facebook-Software-Stack Dan McKinley on Etsy’s activity feed: http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture Good Quora questions on activity feeds: http://www.quora.com/What-are-the-scaling-issues-to-keep-in-mind-while-developing-a-social-network-feed http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed Tuesday, January 22, 13