Building a Social Platform
Part 1:
Design Overview;
Storing Infinite Content
Solutions Engineering
• Identify Popular Use Cases
– Directly from MongoDB Users
– Addressing "limitations"
• Go beyond do...
Social Status Feed
Agenda
• What is a status feed and why build it w/MongoDB
• Application overview (goals, non-goals)
• Architecture overvie...
Socialite
• News/Social Status Feed: popular and
common
• Appears misleadingly simple: turns out to have
many tricky probl...
Status Feed
Status Feed
Socialite
• Open Source
• Reference Implementation
– Various Fanout Feed Models
– User Graph Implementation
– Content stor...
Architecture
GraphServiceProxy
ContentProxy
Pluggable Services
• Major components each have an interface
– see com.mongodb.socialite.services
• Configuration selects ...
Simple Interface
GET /users/{user_id} Get a User by their ID
DELETE /users/{user_id} Remove a user by their ID
POST /users...
Technical Decisions
User
timeline
cache
Schema
Indexing Horizontal Scaling
Operational Setup
Real life validation of our choices.
User facing latency
Linear scaling of resources
Most important criteria?
Operational ...
Scaling Goals
• Realistic real-life-scale workload
– compared to Twitter, etc.
• Understanding of HW required
– containing...
Architecture
GraphServiceProxy
ContentProxy
DB Architecture
The storage layer is separatefrom Socialiteservices, and
each service has its own URI – its own mongodb se...
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Built-in benchmark capability
Operational Testing
• All hosts in AWS
• Each service used its own DB, cluster or shards
• All benchmarks through `mongos`...
Scaling for Infinite Content
Architecture
GraphServiceProxy
ContentProxy
Socialite Content Service
• System of record for all user content
• Initially very simple (no search)
• Mainly designed to...
• Half life of most content is 1 day !
• Popular content usually < 1 month
• Access to old data is rare
Social Data Ages F...
Upcoming SlideShare
Loading in...5
×

Socialite, the Open Source Status Feed Part 1: Design Overview and Scaling for Infinite Content

2,320

Published on

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,320
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
42
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • give intuitive sense of how the presentation will be structured...

  • News/Social Status Feed: popular and common

    Internal goals: implement different schema options, builtin benchmarking for comparison

    External goals: low latency from end-user perspective, linear scaling from operational perspective
  • News/Social Status Feed: popular and common

    Internal goals: implement different schema options, builtin benchmarking for comparison

    External goals: low latency from end-user perspective, linear scaling from operational perspective
  • image at https://dropwizard.github.io/dropwizard of the hat 

  • Socialite, the Open Source Status Feed Part 1: Design Overview and Scaling for Infinite Content

    1. 1. Building a Social Platform Part 1: Design Overview; Storing Infinite Content
    2. 2. Solutions Engineering • Identify Popular Use Cases – Directly from MongoDB Users – Addressing "limitations" • Go beyond documentation and blogs • Create open source project • Run it!
    3. 3. Social Status Feed
    4. 4. Agenda • What is a status feed and why build it w/MongoDB • Application overview (goals, non-goals) • Architecture overview (arch diagram) • Operational overview (benchmarks, automation) • Describe components – Describe options • For each component – Options tried – Results – Option chosen
    5. 5. Socialite • News/Social Status Feed: popular and common • Appears misleadingly simple: turns out to have many tricky problems to solve to have good performance • We created a reference implementation – Configurable models and options – Built-in benchmarking • Used this implementation to test out different options. • This talk will summarize
    6. 6. Status Feed
    7. 7. Status Feed
    8. 8. Socialite • Open Source • Reference Implementation – Various Fanout Feed Models – User Graph Implementation – Content storage • Configurable models and options • REST API in Dropwizard (Yammer) – https://dropwizard.github.io/dropwizard/ • Built-in benchmarking https://github.com/10gen-labs/socialite
    9. 9. Architecture GraphServiceProxy ContentProxy
    10. 10. Pluggable Services • Major components each have an interface – see com.mongodb.socialite.services • Configuration selects implementation to use • ServiceManager organizes : – Default implementations – Lifecycle – Binding configuration – Wiring dependencies – see com.mongodb.socialite.ServiceManager
    11. 11. Simple Interface GET /users/{user_id} Get a User by their ID DELETE /users/{user_id} Remove a user by their ID POST /users/{user_id}/posts Send a message from this user GET /users/{user_id}/followers Get a list of followers of a user GET /users/{user_id}/followers_count Get the number of followers of a user GET /users/{user_id}/following Get the list of users this user is following GET /users/{user_id}/following count Get the number of users this user follows GET /users/{user_id}/posts Get the messages sent by a user GET /users/{user_id}/timeline Get the timeline for this user PUT /users/{user_id} Create a new user PUT /users/{user_id}/following/{target} Follow a user DELETE /users/{user_id}/following/{target} Unfollow a user https://github.com/10gen-labs/socialite
    12. 12. Technical Decisions User timeline cache Schema Indexing Horizontal Scaling
    13. 13. Operational Setup
    14. 14. Real life validation of our choices. User facing latency Linear scaling of resources Most important criteria? Operational Testing
    15. 15. Scaling Goals • Realistic real-life-scale workload – compared to Twitter, etc. • Understanding of HW required – containing costs • Confirm architecture scales linearly – without loss of responsiveness
    16. 16. Architecture GraphServiceProxy ContentProxy
    17. 17. DB Architecture The storage layer is separatefrom Socialiteservices, and each service has its own URI – its own mongodb server or cluster that can be configured differentlyfrom others. This allows us to physically optimize each services'DB for the workload we'llbe running on it. It also allows us to scale out the DB that's currently the limiting factor(the bottleneck) in our setup.
    18. 18. Operational Testing
    19. 19. Operational Testing
    20. 20. Operational Testing
    21. 21. Operational Testing
    22. 22. Operational Testing
    23. 23. Operational Testing
    24. 24. Operational Testing
    25. 25. Operational Testing
    26. 26. Operational Testing
    27. 27. Operational Testing
    28. 28. Operational Testing
    29. 29. Operational Testing
    30. 30. Operational Testing
    31. 31. Operational Testing Built-in benchmark capability
    32. 32. Operational Testing • All hosts in AWS • Each service used its own DB, cluster or shards • All benchmarks through `mongos` (sharded config) • Used MMS monitoring for measuring throughput • Used internal benchmarks for measuring latency • Based volume tested on real life social metrics
    33. 33. Scaling for Infinite Content
    34. 34. Architecture GraphServiceProxy ContentProxy
    35. 35. Socialite Content Service • System of record for all user content • Initially very simple (no search) • Mainly designed to support feed – Lookup/indexed by _id and userid – Time based anchors/pagination
    36. 36. • Half life of most content is 1 day ! • Popular content usually < 1 month • Access to old data is rare Social Data Ages Fast
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×