• Share
  • Email
  • Embed
  • Like
  • Private Content
Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive
 

Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

on

  • 1,469 views

In this presentation, Snap Interactive goes through how they use MongoDB to scale Facebook's Real-time endpoint. In building AreYouInterested.com, SNAP Interactive wanted to instantly update people's ...

In this presentation, Snap Interactive goes through how they use MongoDB to scale Facebook's Real-time endpoint. In building AreYouInterested.com, SNAP Interactive wanted to instantly update people's profiles based on users' individual "likes". In this talk, Mike and Justin from SNAP Interactive will go through how they configured MongoDB to handle thousands of user profile updates.

Statistics

Views

Total Views
1,469
Views on SlideShare
1,469
Embed Views
0

Actions

Likes
0
Downloads
6
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive Presentation Transcript

    • Scaling The Facebook RealtimeEndpoint Using MongoDBPRESENTED BY: Justin Medoy and Mike Sherov SNAP Interactive jmedoy@snap-interactive.com mikesherov@snap-interactive
    • Redefining the Way PeopleMeet & Socialize Online
    • What are Facebook Realtime Updates?Facebook says: "Real-time updates enable your application to subscribe to changes in data in Facebook."What it means: "You provide a URL,Facebook pings it when users do stuff."
    • Pings from Facebook● Every minute we get around 20 pings from facebook that contain data for around 11,000 users{ "object": "user", "entry": [ { "uid": 1335845740, "changed_fields": [ "name", "picture" ], "time": 232323 },.... ] }
    • WHAT?!? Wheres the data?● Facebook tells you that something about the field changed, but not what the current data is.
    • Retrieving User Data from theGraph● Solution: go back to Facebook and grab the users data https://graph.facebook.com? ids=<USERID>&fields=music,movies,likes *This will only get data that the user has made publicly available● To avoid timeouts each call to Facebook only asks for the data for 25 users *Our CURL timeouts for Facebook have been lowered from the default 60 seconds to 25 seconds
    • Update the users profile● Facebook wont tell you exactly whats changed but we can figure it out from our own data All Data - Stored Data = Changed Data● The next step is to update the users profile with this changed data
    • Mongo Architecture● Mongo 2.0.2● Mongo PHP driver 1.2.10● Two separate replica sets ○ User data ○ Interest data● Why separate replica sets? ○ Keep as much of the index as possible in memory ○ Disk reads are expensive
    • User Data Replica SetDesign Challenge● Random access pattern over 106 million documents
    • User Data Replica Set● Large $in queries● High page faults in MMS● We upgraded from 32G to 128G on each node
    • Indexes● We added duplicates of some of our indexes with reversed fields● Updating all of these extra indexes was a huge bottleneck
    • Indexes● Unique index uid_1● profile.sync_1_installed_1_platforms.facebook_1● email_1● uid_1_installed_1● last_login_1_uid_1
    • Indexes● There were certain minutes when Facebook would tell us that the data had changed for more than 40,000 users ○ limit the amount of data Facebook can send in one minute● High number of writes and a large number of indexes prevented the secondaries from reading the oplog because of the global write lock ○ Increase the size of the oplog ○ This is fixed in 2.2.1
    • Indexes and the realtime endpointprofile.sync_1_installed_1_platforms.facebook_1● Filtered 11,000 users a minute down to a few hundred ○ moved filtering logic out of PHP into the index● Added efficiency from covered index ○ All we need is platforms.facebook, which is part of the index
    • Interest Replica SetDifferent set of challenges than User repl set● Needs to power typeahead● 64 million interests● Access pattern based on interest popularity ○ Lady Gaga is going to get accessed more than Ladybug, Javascript
    • The Typeahead{ "_id" : ObjectId("4f511a230624967b7d000003"), "name" : "Rubiks Cube", "search" : "rubiks cube", "subsearch" : [ "r", "ru", "rub", "rubi", "rubik", "rubiks", "rubiks ", "rubiks c", "rubiks cu", "rubiks cub" ], "popularity" : NumberLong(907)}
    • The Typeahead ● Add an array with the first few characters of interest ● Add an index on that field ● This allows us to have 10 entries in 1 index instead of 10 separate indexeshttp://docs.mongodb.org/manual/core/indexes/#index-type-multikey
    • Typeahead indexessubsearch_1_popularity_-1● Specifying -1 for the popularity component of the index naturally causes the typeahead to show more popular interests first
    • Lessons Learned● Dont over index● Covered indexes when possible● indexes to reduce size of returned data● Keep everything in memory● Multikey index for typeaheads● Utilize -1 in index for natural sorting
    • SNAP Interactive, Inc. Contact Information● SNAP Interactive, Inc. SNAP-Interactive.com● Justin Medoy Team Lead / Software Engineer JMedoy@snap-interactive.com● Mike Sherov Lead Developer mike@snap-interactive.com @mikesherov● For more information on our open positions, email jobs@snap-interactive.com or check our website at meet people like you www.snap-interactive.com/jobs/job-openings