MongoSF 2011 - Using MongoDB for IGN's Social Platform


Published on

Using MongoDB for IGN's Social Platform - slides from my talk at MongoSF 2011 on 05/24/2011

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

MongoSF 2011 - Using MongoDB for IGN's Social Platform

  1. 1. Using MongoDB for IGN’s Social Platform<br />MongoSF<br />Tuesday May 24th, 2011<br />
  2. 2. Agenda<br />About<br />Architecture<br />MongoDB Usage<br />ActivityStreams<br />Configuration, Monitoring, Maintenance<br />Backup<br />Tools<br />Lessons Learned, Next steps<br />
  3. 3. About<br />About IGN<br />We have the largest audience of gamers in the world<br />Over 70M Monthly Uniques<br />About IGN’s Social Platform:<br />An API to connect gamer community with editors, games, other gamers, and help lay the foundation for premium content discovery as well as UGC<br />Launched Sept 2010<br />~7M activities <br />30M API calls per day (24h), ~9ms response times<br />
  4. 4. Architecture<br />REST based API, built in Java<br />Entities are People, MediaItems, Activities, Comments, Notifications, Status<br />Interfaces across as well as other social networks<br />Caching tier based on memcached<br />MySQL and MongoDB as persistence<br />PHP/Zend front end <br />
  5. 5. MongoDB Usage<br />Activity Streams : standard<br />Activity Caching : (more on this later!)<br />Activity Commenting<br />Points, Leaderboards : Also extend to badges<br />Block lists, Ban lists<br />Notifications for conversations <br />Analytics : Activity snapshot for a user<br />
  6. 6. Challenges with ActivityStreams<br />Lots of data!<br />Large amount of data coming out as a result<br />Reverse sorting<br />The data has to be sorted in reverse natural order ($natural : -1), and we do not use capped collections<br />Aggregation of similar activities<br />Impacts pagination<br />Fetching self activities (profile), and newsfeed (self + friends)<br />Filtering based on the activity type<br />People want to see Game Updates or Blog updates from their friends<br />Hydration of activities for dynamic data<br />The thumbnail and level of the actor or commenter may change<br />Activity Comments <br />When an activity is rendered, the initial comments and count has to be pulled ($slice). Not having a $sizeOf type operator hurts.<br />No Embedding or References<br />We build data on the fly as a part of hydration process<br />
  7. 7. Caching using MongoDB<br />Caching the entire streams<br />A bad idea (or bad implementation?)<br />The expired objects sat in the db, bloating the database<br />The removal did not free up space, so we ran out<br />Batch removals clogged the slaves<br />Use Mongo as a cache-key-index<br />Cache the streams in Memcached<br />For invalidation, keep the index of the memcached keys in MongoDB.<br />Works!<br />
  8. 8. Configuration<br />Server:<br />1 Master, 2 Slaves (load balanced thru Netscalar)<br />2 extra slaves which are not queried (replicate!!)<br />Version 1.6.1<br />1.8.1 with Journaling is being tested in Stage<br />Clients:<br />Java Driver (2.1)<br />Ruby Driver (1.2)<br />Mappers:<br />Morphia for Java, MongoMapper for Ruby<br />Connections per host : 200, #hosts = 4<br />Oplog Size: 1GB, gives us ~272 hours<br />Syncdelay: 60s (default)<br />Hardware: 2 core, 6 GB virtualized machine<br />
  9. 9. Monitoring<br />Slow Query Logs after every new build<br />Nagios<br />TCP Port Monitoring <br />Disk space monitoring<br />CPU monitoring<br />Munin<br />Mongo connections <br />Memory usage<br />Ops/second<br />Write Lock %<br />Collection Sizes (in terms of # of documents)<br />MMS<br />Started using it 2 weeks ago as a beta customer<br />
  10. 10. Maintenance<br />Data defragmentation<br />Slaves – by running it on different port<br />Master – by having a downtime<br />Collection trimming<br />The scripts block during remove<br />Bulk removes kills the slaves, spiking CPU 100%<br />
  11. 11. Backup or prepping for O S***!<br />NetApp Filter based, snapshots<br />Make sure to do {fsync:1} and {lock:1} on one slave<br />Hourly dumps via a cron job<br />Using mongodump<br />Incremental backup via the oplog<br />Replay the oplog instead of relying on a snapshot<br />Delayed slaves <br />Not recommended as it almost guarantees data loss proportional to the delay, which is inversely proportional to the time-to-react<br />
  12. 12. Tools to be familiar with<br />mongostat<br />Look at queue lengths, memory, connections and operation mix<br />db.serverStatus()<br />Server status with sync, pagefaults, locks, index misses<br />atop<br />iostat/vm_stat<br />db.stats()<br />Overall info at the database level<br />db.<coll_name>.stats()<br />Overall info at the collection level<br />db.printReplicationInfo()<br />Info about the oplog size andlength in time<br />db.printSlaveReplicationInfo()<br />Info about the master, the last sync timetamp, and how behind the slave is from the master. The delays could be no writes on the master if the numbers look wonky.<br />
  13. 13. What we’ve learned<br />Keep an eye on<br />Page Faults<br />Index misses<br />Queue lengths<br />Write Lock %<br />Database sizes on disk due to reuse vs. release<br />Use .explain() <br />Watch for nscanned and indexBounds<br />Use limit() when using find<br />While updating, try to load that object in memory so that its in the working set (findAndModify)<br />Try to keep the fields being selected at a minimum<br />Do not use writeconcerns<br />Elegant schema design might bite you – design for performance and ease of programming<br />Write to multiple collections instead of doing mapreduce operations<br />
  14. 14. Next Steps<br />Move to replica sets on 1.8.1<br />Move relationship graphs to MongoDB<br />Shard the relationships based on the userId<br />Run multiple mongo processes, splitting out collections among multiple databases<br />Fan-out architecture instead of queries – using HornetQ and Scala (Akka)<br />
  15. 15. Extra: Why Fanout vs. Query<br />Mon May 9 14:43:00 [conn63907] query ignsocial.activities<br />ntoreturn:200 scanAndOrder reslen:7836 nscanned:135727 <br />{query: { isActive: true, actorType: "PERSON", actorId: {<br />$in: [ "230", "1529", "1872", "1915", "2103", "4606",<br />"5759", "5925", "7235", "7580", "9254", "10226", "14508",<br />"16758", "20282", "21246", "21546", "22302", "22376",<br />"23104", "25657", "26421", "28381", "30094", "33409",<br />"33918", "34749", "34901", "35136", "36327", "37473",<br />"37760", "40984", "41701", "44708", "45348", "45950",<br />"47529", "47654", "48249", "49157", "49160", "51094",<br />"51256", "52680", "53301", "53337", "54261", "54270",<br />"56900", "60724", "61119", "61983", "62888", "63546",<br />"64251", "65911", "67058", "70065", "70196", "73863",<br />"74918", "75547", "75993", "77017", "77950", "78211",<br />"78473", "78659", "78858", "82535", "85376", "85384",<br />"86909", "87883", "88489", "88818", "88975", "89783",<br />"90029", "90587", "91206", "93051", "93502", "94200", ..36,203 such lines<br /> …] }, created: {$gte: new Date(1302385379514) },<br />activityObjects.type: { $in: [ "BLOG_ENTRY" ] } }, orderby:{ created: -1 } } <br />nreturned:200 1054ms<br />
  16. 16. About Me<br />Manish Pandit<br />Engineering Manager, API Platform<br />IGN Entertainment<br />@lobster1234<br />
  17. 17. We are hiring<br />Software Engineers to help us with exciting initiatives at IGN<br />Technologies we use<br />RoR, Java (no J2EE!), Scala, Spring, Play! Framework<br />PHP/Zend, JQuery, HTML5, CSS3, Sencha Touch, PhoneGap<br />MongoDB, memcached, Redis, Solr, ElasticSearch<br />NewRelic for monitoring, 3Scale for Open APIs<br /><br />@ignjobs<br />
  18. 18. References<br />IGN’s Social Platform<br /><br /><br />Mongo MuninPlugins<br /><br /><br />Morphia<br /><br />