Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive


Published on

In this presentation, Snap Interactive goes through how they use MongoDB to scale Facebook's Real-time endpoint. In building, SNAP Interactive wanted to instantly update people's profiles based on users' individual "likes". In this talk, Mike and Justin from SNAP Interactive will go through how they configured MongoDB to handle thousands of user profile updates.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

  1. 1. Scaling The Facebook RealtimeEndpoint Using MongoDBPRESENTED BY: Justin Medoy and Mike Sherov SNAP Interactive mikesherov@snap-interactive
  2. 2. Redefining the Way PeopleMeet & Socialize Online
  3. 3. What are Facebook Realtime Updates?Facebook says: "Real-time updates enable your application to subscribe to changes in data in Facebook."What it means: "You provide a URL,Facebook pings it when users do stuff."
  4. 4. Pings from Facebook● Every minute we get around 20 pings from facebook that contain data for around 11,000 users{ "object": "user", "entry": [ { "uid": 1335845740, "changed_fields": [ "name", "picture" ], "time": 232323 },.... ] }
  5. 5. WHAT?!? Wheres the data?● Facebook tells you that something about the field changed, but not what the current data is.
  6. 6. Retrieving User Data from theGraph● Solution: go back to Facebook and grab the users data ids=<USERID>&fields=music,movies,likes *This will only get data that the user has made publicly available● To avoid timeouts each call to Facebook only asks for the data for 25 users *Our CURL timeouts for Facebook have been lowered from the default 60 seconds to 25 seconds
  7. 7. Update the users profile● Facebook wont tell you exactly whats changed but we can figure it out from our own data All Data - Stored Data = Changed Data● The next step is to update the users profile with this changed data
  8. 8. Mongo Architecture● Mongo 2.0.2● Mongo PHP driver 1.2.10● Two separate replica sets ○ User data ○ Interest data● Why separate replica sets? ○ Keep as much of the index as possible in memory ○ Disk reads are expensive
  9. 9. User Data Replica SetDesign Challenge● Random access pattern over 106 million documents
  10. 10. User Data Replica Set● Large $in queries● High page faults in MMS● We upgraded from 32G to 128G on each node
  11. 11. Indexes● We added duplicates of some of our indexes with reversed fields● Updating all of these extra indexes was a huge bottleneck
  12. 12. Indexes● Unique index uid_1● profile.sync_1_installed_1_platforms.facebook_1● email_1● uid_1_installed_1● last_login_1_uid_1
  13. 13. Indexes● There were certain minutes when Facebook would tell us that the data had changed for more than 40,000 users ○ limit the amount of data Facebook can send in one minute● High number of writes and a large number of indexes prevented the secondaries from reading the oplog because of the global write lock ○ Increase the size of the oplog ○ This is fixed in 2.2.1
  14. 14. Indexes and the realtime endpointprofile.sync_1_installed_1_platforms.facebook_1● Filtered 11,000 users a minute down to a few hundred ○ moved filtering logic out of PHP into the index● Added efficiency from covered index ○ All we need is platforms.facebook, which is part of the index
  15. 15. Interest Replica SetDifferent set of challenges than User repl set● Needs to power typeahead● 64 million interests● Access pattern based on interest popularity ○ Lady Gaga is going to get accessed more than Ladybug, Javascript
  16. 16. The Typeahead{ "_id" : ObjectId("4f511a230624967b7d000003"), "name" : "Rubiks Cube", "search" : "rubiks cube", "subsearch" : [ "r", "ru", "rub", "rubi", "rubik", "rubiks", "rubiks ", "rubiks c", "rubiks cu", "rubiks cub" ], "popularity" : NumberLong(907)}
  17. 17. The Typeahead ● Add an array with the first few characters of interest ● Add an index on that field ● This allows us to have 10 entries in 1 index instead of 10 separate indexes
  18. 18. Typeahead indexessubsearch_1_popularity_-1● Specifying -1 for the popularity component of the index naturally causes the typeahead to show more popular interests first
  19. 19. Lessons Learned● Dont over index● Covered indexes when possible● indexes to reduce size of returned data● Keep everything in memory● Multikey index for typeaheads● Utilize -1 in index for natural sorting
  20. 20. SNAP Interactive, Inc. Contact Information● SNAP Interactive, Inc.● Justin Medoy Team Lead / Software Engineer● Mike Sherov Lead Developer @mikesherov● For more information on our open positions, email or check our website at meet people like you