Real-time Location Based Social  Discovery using MongoDB         Fredrik Björk       Director of Engineering           Mon...
What is Banjo?• The most powerful location based mobile  technology that brings you the moments  you would otherwise miss•...
3
Stats•   Launched June 2011•   3 million users•   Social graph of 400 million profiles•   50 billion connections•   ~200 ge...
Why MongoDB?• Developer friendly• Easy to maintain and scale• Automatic failover• Rapid prototyping of features• Good fit f...
Infrastructure• ~160 EC2 instances (75% MongoDB, 25%  Redis)• SSD drives for low latency• App servers (Sinatra & Rails) ho...
Geo tagged posts• Consumed as JSON from social network  APIs - streaming, polling & real-time  callbacks• Exposed via REST...
Schema designhttps://twitter.com/fbjork/status/262989592561606656                                                       8
• _id is composed of provider (Facebook:  1, Twitter: 2 etc.) and post id for  uniqueness          https://twitter.com/fbj...
• Coordinates are stored inside an array  with latitude, longitude{    _id: “2:262989592561606656”,    username: “fbjork”,...
• Friends are stored inside an array{    _id: “2:262989592561606656”,    username: “fbjork”,    text: “Will give a present...
12
Geospatial Indexing• Create the geo index:> db.posts.ensureIndex( { coordinates: ‘2d’ } )                                 ...
Find nearby posts in Miami:> db.posts.find( { coordinates: { $near: [25.792627,-80.226142] } } ){ _id: “2:809438082”, coord...
15
Find friend posts globally:> db.posts.find({ friend_ids: { $in: [2006261] }){    _id: “2:10248172”,    username: “fbjork”, ...
Find friend posts in a location:> db.posts.find({ coordinates: { $near: [25.792627,-80.226142] },friend_ids: { $in: [200626...
Compound geo indexes• Create a compound index on coordinates  and friend_ids:> db.posts.ensureIndex( { coordinates: ‘2d’, ...
• Fails for compound indexes with large   arrays • Geospatial indexes have a size limit of   1000 bytes> db.posts.ensureIn...
Geospatial query performance• Do we need a compound index at all?• Geospatial index is usually restrictive  enough• Proble...
Pre-sharded array fields• When dealing with large arrays, i.e  @BarackObama follower ids• Partition fields using pre-shardin...
# shard_example.rbSHARDS = 3friend_ids = [1000 , 1001, 1002, 1003, 1004, 1005, 1006]friend_ids.each { |f| puts Zlib.crc32(...
Find friend posts using pre-shardingof the friend arrays:> db.posts.find({ coordinates: { $near: [25.792627,-80.226142] },f...
Capped collections• Good fit for storing a feed of posts for a  period of time• Eliminates need to expire old posts• Docume...
TTL collections• We switched to TTL collections with  MongoDB 2.2• Deleting and growing documents is now  possible• Easier...
Questions            26
Thank you!     Available:                   fredrik@teambanjo.comiPhone and Android                        @fbjork
Upcoming SlideShare
Loading in …5
×

Real-time Location Based Social Discovery using MongoDB

526 views
441 views

Published on

The slides from my MongoSV 2012 presentation

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
526
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Real-time Location Based Social Discovery using MongoDB

  1. 1. Real-time Location Based Social Discovery using MongoDB Fredrik Björk Director of Engineering MongoSV, Dec 4th 2012
  2. 2. What is Banjo?• The most powerful location based mobile technology that brings you the moments you would otherwise miss• Aggregates geo tagged posts from Facebook, Twitter, Instagram and Foursquare in real-time
  3. 3. 3
  4. 4. Stats• Launched June 2011• 3 million users• Social graph of 400 million profiles• 50 billion connections• ~200 geo posts created per second 4
  5. 5. Why MongoDB?• Developer friendly• Easy to maintain and scale• Automatic failover• Rapid prototyping of features• Good fit for consuming, storing and presenting JSON data• Geospatial features out of the box 5
  6. 6. Infrastructure• ~160 EC2 instances (75% MongoDB, 25% Redis)• SSD drives for low latency• App servers (Sinatra & Rails) hosted on Heroku• Mongos with authentication running on dedicated servers 6
  7. 7. Geo tagged posts• Consumed as JSON from social network APIs - streaming, polling & real-time callbacks• Exposed via REST APIs as JSON to the Banjo iOS and Android apps 7
  8. 8. Schema designhttps://twitter.com/fbjork/status/262989592561606656 8
  9. 9. • _id is composed of provider (Facebook: 1, Twitter: 2 etc.) and post id for uniqueness https://twitter.com/fbjork/status/262989592561606656> db.posts.find({ _id: ‘2:262989592561606656’ }){ _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/ events/mongosv”, ...} 9
  10. 10. • Coordinates are stored inside an array with latitude, longitude{ _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/ events/mongosv”, coordinates: [37.784234,-122.438212], ...} 10
  11. 11. • Friends are stored inside an array{ _id: “2:262989592561606656”, username: “fbjork”, text: “Will give a presentation at #MongoSV on how we use @MongoDB for real-time location based social discovery at @Banjo http://www.10gen.com/ events/mongosv”, coordinates: [37.784234,-122.438212], friend_ids: [8816792, 10324882, 2006261, ...]} 11
  12. 12. 12
  13. 13. Geospatial Indexing• Create the geo index:> db.posts.ensureIndex( { coordinates: ‘2d’ } ) 13
  14. 14. Find nearby posts in Miami:> db.posts.find( { coordinates: { $near: [25.792627,-80.226142] } } ){ _id: “2:809438082”, coordinates: [25.792610,-80.226100], username:“Rebecca_Boorsma”, text: “I love Miami!”, ... }{ _id: “2:1234567”, coordinates: [25.781324,-80.431423], username:“foo”, text: “Another day, another dollar”, ... } 14
  15. 15. 15
  16. 16. Find friend posts globally:> db.posts.find({ friend_ids: { $in: [2006261] }){ _id: “2:10248172”, username: “fbjork”, friend_ids: [8816792, 10324882, 2006261, ...], ...} 16
  17. 17. Find friend posts in a location:> db.posts.find({ coordinates: { $near: [25.792627,-80.226142] },friend_ids: { $in: [2006261] }){ _id: “2:10248172”, username: “fbjork”, friend_ids: [8816792, 10324882, 2006261, ...], ...} 17
  18. 18. Compound geo indexes• Create a compound index on coordinates and friend_ids:> db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } ) 18
  19. 19. • Fails for compound indexes with large arrays • Geospatial indexes have a size limit of 1000 bytes> db.posts.ensureIndex( { coordinates: ‘2d’, friend_ids: 1 } )Error: Key too large to index 19
  20. 20. Geospatial query performance• Do we need a compound index at all?• Geospatial index is usually restrictive enough• Problem: Array traversal (using $in) is CPU hungry for large arrays• Solution: Pre-sharded array fields 20
  21. 21. Pre-sharded array fields• When dealing with large arrays, i.e @BarackObama follower ids• Partition fields using pre-sharding• shard = Hash(key) MOD shard_count• Keep array sizes in the low hundreds 21
  22. 22. # shard_example.rbSHARDS = 3friend_ids = [1000 , 1001, 1002, 1003, 1004, 1005, 1006]friend_ids.each { |f| puts Zlib.crc32(f.to_s) % SHARDS }0202120{ friends_0: [1000, 1002, 1006], friends_1: [1004], friends_2: [1001, 1003, 1005]} 22
  23. 23. Find friend posts using pre-shardingof the friend arrays:> db.posts.find({ coordinates: { $near: [25.792627,-80.226142] },friend_0: { $in: [1000] }){ friends_0: [1000, 1002, 1006], friends_1: [1004], friends_2: [1001, 1003, 1005]} 23
  24. 24. Capped collections• Good fit for storing a feed of posts for a period of time• Eliminates need to expire old posts• Documents can’t grow• Documents can’t be deleted• Resizing collections is painful• Can’t be sharded 24
  25. 25. TTL collections• We switched to TTL collections with MongoDB 2.2• Deleting and growing documents is now possible• Easier to change expiration times• Can be sharded (not by geo) 25
  26. 26. Questions 26
  27. 27. Thank you! Available: fredrik@teambanjo.comiPhone and Android @fbjork

×