Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Socialite, the Open Source Status Feed Part 1: Design Overview and Scaling for Infinite Content


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • give intuitive sense of how the presentation will be structured...

  • News/Social Status Feed: popular and common

    Internal goals: implement different schema options, builtin benchmarking for comparison

    External goals: low latency from end-user perspective, linear scaling from operational perspective
  • News/Social Status Feed: popular and common

    Internal goals: implement different schema options, builtin benchmarking for comparison

    External goals: low latency from end-user perspective, linear scaling from operational perspective
  • image at of the hat 

  • Transcript

    • 1. Building a Social Platform Part 1: Design Overview; Storing Infinite Content
    • 2. Solutions Engineering • Identify Popular Use Cases – Directly from MongoDB Users – Addressing "limitations" • Go beyond documentation and blogs • Create open source project • Run it!
    • 3. Social Status Feed
    • 4. Agenda • What is a status feed and why build it w/MongoDB • Application overview (goals, non-goals) • Architecture overview (arch diagram) • Operational overview (benchmarks, automation) • Describe components – Describe options • For each component – Options tried – Results – Option chosen
    • 5. Socialite • News/Social Status Feed: popular and common • Appears misleadingly simple: turns out to have many tricky problems to solve to have good performance • We created a reference implementation – Configurable models and options – Built-in benchmarking • Used this implementation to test out different options. • This talk will summarize
    • 6. Status Feed
    • 7. Status Feed
    • 8. Socialite • Open Source • Reference Implementation – Various Fanout Feed Models – User Graph Implementation – Content storage • Configurable models and options • REST API in Dropwizard (Yammer) – • Built-in benchmarking
    • 9. Architecture GraphServiceProxy ContentProxy
    • 10. Pluggable Services • Major components each have an interface – see • Configuration selects implementation to use • ServiceManager organizes : – Default implementations – Lifecycle – Binding configuration – Wiring dependencies – see com.mongodb.socialite.ServiceManager
    • 11. Simple Interface GET /users/{user_id} Get a User by their ID DELETE /users/{user_id} Remove a user by their ID POST /users/{user_id}/posts Send a message from this user GET /users/{user_id}/followers Get a list of followers of a user GET /users/{user_id}/followers_count Get the number of followers of a user GET /users/{user_id}/following Get the list of users this user is following GET /users/{user_id}/following count Get the number of users this user follows GET /users/{user_id}/posts Get the messages sent by a user GET /users/{user_id}/timeline Get the timeline for this user PUT /users/{user_id} Create a new user PUT /users/{user_id}/following/{target} Follow a user DELETE /users/{user_id}/following/{target} Unfollow a user
    • 12. Technical Decisions User timeline cache Schema Indexing Horizontal Scaling
    • 13. Operational Setup
    • 14. Real life validation of our choices. User facing latency Linear scaling of resources Most important criteria? Operational Testing
    • 15. Scaling Goals • Realistic real-life-scale workload – compared to Twitter, etc. • Understanding of HW required – containing costs • Confirm architecture scales linearly – without loss of responsiveness
    • 16. Architecture GraphServiceProxy ContentProxy
    • 17. DB Architecture The storage layer is separatefrom Socialiteservices, and each service has its own URI – its own mongodb server or cluster that can be configured differentlyfrom others. This allows us to physically optimize each services'DB for the workload we'llbe running on it. It also allows us to scale out the DB that's currently the limiting factor(the bottleneck) in our setup.
    • 18. Operational Testing
    • 19. Operational Testing
    • 20. Operational Testing
    • 21. Operational Testing
    • 22. Operational Testing
    • 23. Operational Testing
    • 24. Operational Testing
    • 25. Operational Testing
    • 26. Operational Testing
    • 27. Operational Testing
    • 28. Operational Testing
    • 29. Operational Testing
    • 30. Operational Testing
    • 31. Operational Testing Built-in benchmark capability
    • 32. Operational Testing • All hosts in AWS • Each service used its own DB, cluster or shards • All benchmarks through `mongos` (sharded config) • Used MMS monitoring for measuring throughput • Used internal benchmarks for measuring latency • Based volume tested on real life social metrics
    • 33. Scaling for Infinite Content
    • 34. Architecture GraphServiceProxy ContentProxy
    • 35. Socialite Content Service • System of record for all user content • Initially very simple (no search) • Mainly designed to support feed – Lookup/indexed by _id and userid – Time based anchors/pagination
    • 36. • Half life of most content is 1 day ! • Popular content usually < 1 month • Access to old data is rare Social Data Ages Fast