Using MongoDB for IGN’s Social PlatformMongoSFTuesday May 24th, 2011
AgendaAboutArchitectureMongoDB UsageActivityStreamsConfiguration, Monitoring, MaintenanceBackupToolsLessons Learned, Next steps
AboutAbout IGNWe have the largest audience of gamers in the worldOver 70M Monthly UniquesAbout IGN’s Social Platform:An API to connect gamer community with editors, games, other gamers, and help lay the foundation for premium content discovery as well as UGCLaunched Sept 2010~7M activities 30M API calls per day (24h), ~9ms response times
ArchitectureREST based API, built in JavaEntities are People, MediaItems, Activities, Comments, Notifications, StatusInterfaces across IGN.com as well as other social networksCaching tier based on memcachedMySQL and MongoDB as persistencePHP/Zend front end
MongoDB UsageActivity Streams : ActivityStrea.ms standardActivity Caching : (more on this later!)Activity CommentingPoints, Leaderboards : Also extend to badgesBlock lists, Ban listsNotifications for conversations Analytics : Activity snapshot for a user
Challenges with ActivityStreamsLots of data!Large amount of data coming out as a resultReverse sortingThe data has to be sorted in reverse natural order ($natural : -1), and we do not use capped collectionsAggregation of similar activitiesImpacts paginationFetching self activities (profile), and newsfeed (self + friends)Filtering based on the activity typePeople want to see Game Updates or Blog updates from their friendsHydration of activities for dynamic dataThe thumbnail and level of the actor or commenter may changeActivity Comments When an activity is rendered, the initial comments and count has to be pulled ($slice). Not having a $sizeOf type operator hurts.No Embedding or ReferencesWe build data on the fly as a part of hydration process
Caching using MongoDBCaching the entire streamsA bad idea (or bad implementation?)The expired objects sat in the db, bloating the databaseThe removal did not free up space, so we ran outBatch removals clogged the slavesUse Mongo as a cache-key-indexCache the streams in MemcachedFor invalidation, keep the index of the memcached keys in MongoDB.Works!
ConfigurationServer:1 Master, 2 Slaves (load balanced thru Netscalar)2 extra slaves which are not queried (replicate!!)Version 1.6.11.8.1 with Journaling is being tested in StageClients:Java Driver (2.1)Ruby Driver (1.2)Mappers:Morphia for Java, MongoMapper for RubyConnections per host : 200, #hosts = 4Oplog Size: 1GB, gives us ~272 hoursSyncdelay: 60s (default)Hardware: 2 core, 6 GB virtualized machine
MonitoringSlow Query Logs after every new buildNagiosTCP Port Monitoring Disk space monitoringCPU monitoringMuninMongo connections Memory usageOps/secondWrite Lock %Collection Sizes (in terms of # of documents)MMSStarted using it 2 weeks ago as a beta customer
MaintenanceData defragmentationSlaves – by running it on different portMaster – by having a downtimeCollection trimmingThe scripts block during removeBulk removes kills the slaves, spiking CPU 100%
Backup or prepping for O S***!NetApp Filter based, snapshotsMake sure to do {fsync:1} and {lock:1} on one slaveHourly dumps via a cron jobUsing mongodumpIncremental backup via the oplogReplay the oplog instead of relying on a snapshotDelayed slaves Not recommended as it almost guarantees data loss proportional to the delay, which is inversely proportional to the time-to-react
Tools to be familiar withmongostatLook at queue lengths, memory, connections and operation mixdb.serverStatus()Server status with sync, pagefaults, locks, index missesatopiostat/vm_statdb.stats()Overall info at the database leveldb.<coll_name>.stats()Overall info at the collection leveldb.printReplicationInfo()Info about the oplog size andlength in timedb.printSlaveReplicationInfo()Info about the master, the last sync timetamp, and how behind the slave is from the master. The delays could be no writes on the master if the numbers look wonky.
What we’ve learnedKeep an eye onPage FaultsIndex missesQueue lengthsWrite Lock %Database sizes on disk due to reuse vs. releaseUse .explain() Watch for nscanned and indexBoundsUse limit() when using findWhile updating, try to load that object in memory so that its in the working set (findAndModify)Try to keep the fields being selected at a minimumDo not use writeconcernsElegant schema design might bite you – design for performance and ease of programmingWrite to multiple collections instead of doing mapreduce operations
Next StepsMove to replica sets on 1.8.1Move relationship graphs to MongoDBShard the relationships based on the userIdRun multiple mongo processes, splitting out collections among multiple databasesFan-out architecture instead of queries – using HornetQ and Scala (Akka)
Extra: Why Fanout vs. QueryMon May  9 14:43:00 [conn63907] query ignsocial.activitiesntoreturn:200 scanAndOrder  reslen:7836 nscanned:135727 {query: { isActive: true, actorType: "PERSON", actorId: {$in: [ "230", "1529", "1872", "1915", "2103", "4606","5759", "5925", "7235", "7580", "9254", "10226", "14508","16758", "20282", "21246", "21546", "22302", "22376","23104", "25657", "26421", "28381", "30094", "33409","33918", "34749", "34901", "35136", "36327", "37473","37760", "40984", "41701", "44708", "45348", "45950","47529", "47654", "48249", "49157", "49160", "51094","51256", "52680", "53301", "53337", "54261", "54270","56900", "60724", "61119", "61983", "62888", "63546","64251", "65911", "67058", "70065", "70196", "73863","74918", "75547", "75993", "77017", "77950", "78211","78473", "78659", "78858", "82535", "85376", "85384","86909", "87883", "88489", "88818", "88975", "89783","90029", "90587", "91206", "93051", "93502", "94200", ..36,203 such lines …] }, created: {$gte: new Date(1302385379514) },activityObjects.type: { $in: [ "BLOG_ENTRY" ] } }, orderby:{ created: -1 } } nreturned:200 1054ms
About MeManish PanditEngineering Manager, API PlatformIGN Entertainment@lobster1234
We are hiringSoftware Engineers to help us with exciting initiatives at IGNTechnologies we useRoR, Java (no J2EE!), Scala, Spring, Play! FrameworkPHP/Zend, JQuery, HTML5, CSS3, Sencha Touch, PhoneGapMongoDB, memcached, Redis, Solr, ElasticSearchNewRelic for monitoring, 3Scale for Open APIshttp://corp.ign.com/careers@ignjobs
ReferencesIGN’s Social Platformhttp://my.ign.comhttp://people.ign.com/ign-labsMongo MuninPluginshttps://github.com/erh/mongo-muninhttps://github.com/lobster1234/munin-mongo-collectionsMorphiahttp://code.google.com/p/morphia/

MongoSF 2011 - Using MongoDB for IGN's Social Platform

  • 1.
    Using MongoDB forIGN’s Social PlatformMongoSFTuesday May 24th, 2011
  • 2.
  • 3.
    AboutAbout IGNWe havethe largest audience of gamers in the worldOver 70M Monthly UniquesAbout IGN’s Social Platform:An API to connect gamer community with editors, games, other gamers, and help lay the foundation for premium content discovery as well as UGCLaunched Sept 2010~7M activities 30M API calls per day (24h), ~9ms response times
  • 4.
    ArchitectureREST based API,built in JavaEntities are People, MediaItems, Activities, Comments, Notifications, StatusInterfaces across IGN.com as well as other social networksCaching tier based on memcachedMySQL and MongoDB as persistencePHP/Zend front end
  • 5.
    MongoDB UsageActivity Streams: ActivityStrea.ms standardActivity Caching : (more on this later!)Activity CommentingPoints, Leaderboards : Also extend to badgesBlock lists, Ban listsNotifications for conversations Analytics : Activity snapshot for a user
  • 6.
    Challenges with ActivityStreamsLotsof data!Large amount of data coming out as a resultReverse sortingThe data has to be sorted in reverse natural order ($natural : -1), and we do not use capped collectionsAggregation of similar activitiesImpacts paginationFetching self activities (profile), and newsfeed (self + friends)Filtering based on the activity typePeople want to see Game Updates or Blog updates from their friendsHydration of activities for dynamic dataThe thumbnail and level of the actor or commenter may changeActivity Comments When an activity is rendered, the initial comments and count has to be pulled ($slice). Not having a $sizeOf type operator hurts.No Embedding or ReferencesWe build data on the fly as a part of hydration process
  • 7.
    Caching using MongoDBCachingthe entire streamsA bad idea (or bad implementation?)The expired objects sat in the db, bloating the databaseThe removal did not free up space, so we ran outBatch removals clogged the slavesUse Mongo as a cache-key-indexCache the streams in MemcachedFor invalidation, keep the index of the memcached keys in MongoDB.Works!
  • 8.
    ConfigurationServer:1 Master, 2Slaves (load balanced thru Netscalar)2 extra slaves which are not queried (replicate!!)Version 1.6.11.8.1 with Journaling is being tested in StageClients:Java Driver (2.1)Ruby Driver (1.2)Mappers:Morphia for Java, MongoMapper for RubyConnections per host : 200, #hosts = 4Oplog Size: 1GB, gives us ~272 hoursSyncdelay: 60s (default)Hardware: 2 core, 6 GB virtualized machine
  • 9.
    MonitoringSlow Query Logsafter every new buildNagiosTCP Port Monitoring Disk space monitoringCPU monitoringMuninMongo connections Memory usageOps/secondWrite Lock %Collection Sizes (in terms of # of documents)MMSStarted using it 2 weeks ago as a beta customer
  • 10.
    MaintenanceData defragmentationSlaves –by running it on different portMaster – by having a downtimeCollection trimmingThe scripts block during removeBulk removes kills the slaves, spiking CPU 100%
  • 11.
    Backup or preppingfor O S***!NetApp Filter based, snapshotsMake sure to do {fsync:1} and {lock:1} on one slaveHourly dumps via a cron jobUsing mongodumpIncremental backup via the oplogReplay the oplog instead of relying on a snapshotDelayed slaves Not recommended as it almost guarantees data loss proportional to the delay, which is inversely proportional to the time-to-react
  • 12.
    Tools to befamiliar withmongostatLook at queue lengths, memory, connections and operation mixdb.serverStatus()Server status with sync, pagefaults, locks, index missesatopiostat/vm_statdb.stats()Overall info at the database leveldb.<coll_name>.stats()Overall info at the collection leveldb.printReplicationInfo()Info about the oplog size andlength in timedb.printSlaveReplicationInfo()Info about the master, the last sync timetamp, and how behind the slave is from the master. The delays could be no writes on the master if the numbers look wonky.
  • 13.
    What we’ve learnedKeepan eye onPage FaultsIndex missesQueue lengthsWrite Lock %Database sizes on disk due to reuse vs. releaseUse .explain() Watch for nscanned and indexBoundsUse limit() when using findWhile updating, try to load that object in memory so that its in the working set (findAndModify)Try to keep the fields being selected at a minimumDo not use writeconcernsElegant schema design might bite you – design for performance and ease of programmingWrite to multiple collections instead of doing mapreduce operations
  • 14.
    Next StepsMove toreplica sets on 1.8.1Move relationship graphs to MongoDBShard the relationships based on the userIdRun multiple mongo processes, splitting out collections among multiple databasesFan-out architecture instead of queries – using HornetQ and Scala (Akka)
  • 15.
    Extra: Why Fanoutvs. QueryMon May 9 14:43:00 [conn63907] query ignsocial.activitiesntoreturn:200 scanAndOrder reslen:7836 nscanned:135727 {query: { isActive: true, actorType: "PERSON", actorId: {$in: [ "230", "1529", "1872", "1915", "2103", "4606","5759", "5925", "7235", "7580", "9254", "10226", "14508","16758", "20282", "21246", "21546", "22302", "22376","23104", "25657", "26421", "28381", "30094", "33409","33918", "34749", "34901", "35136", "36327", "37473","37760", "40984", "41701", "44708", "45348", "45950","47529", "47654", "48249", "49157", "49160", "51094","51256", "52680", "53301", "53337", "54261", "54270","56900", "60724", "61119", "61983", "62888", "63546","64251", "65911", "67058", "70065", "70196", "73863","74918", "75547", "75993", "77017", "77950", "78211","78473", "78659", "78858", "82535", "85376", "85384","86909", "87883", "88489", "88818", "88975", "89783","90029", "90587", "91206", "93051", "93502", "94200", ..36,203 such lines …] }, created: {$gte: new Date(1302385379514) },activityObjects.type: { $in: [ "BLOG_ENTRY" ] } }, orderby:{ created: -1 } } nreturned:200 1054ms
  • 16.
    About MeManish PanditEngineeringManager, API PlatformIGN Entertainment@lobster1234
  • 17.
    We are hiringSoftwareEngineers to help us with exciting initiatives at IGNTechnologies we useRoR, Java (no J2EE!), Scala, Spring, Play! FrameworkPHP/Zend, JQuery, HTML5, CSS3, Sencha Touch, PhoneGapMongoDB, memcached, Redis, Solr, ElasticSearchNewRelic for monitoring, 3Scale for Open APIshttp://corp.ign.com/careers@ignjobs
  • 18.
    ReferencesIGN’s Social Platformhttp://my.ign.comhttp://people.ign.com/ign-labsMongoMuninPluginshttps://github.com/erh/mongo-muninhttps://github.com/lobster1234/munin-mongo-collectionsMorphiahttp://code.google.com/p/morphia/