Mongo SVKeeping the lights on with MongoDBTony Tam12/3/2010
Presentation OverviewData >>> codeTreat it appropriatelyManage and maintain MongoMongo is young (and robust!)Performance and FeaturesThe right hooks exist
Who is WordnikWordnik is:The world’s largest English Language reference ~10M words!Mapping every word, based on real data(free   ) API to add word information, everywhere
Wordnik’s MongoDB DeploymentOver 12 Months with MongoCorpus/UGC/Structured Data/StatisticsMaster/Slave~3TB data~12B recordsWe love Mongo’s performanceRead more:http://blog.wordnik.com/12-months-with-mongodb
Engineering + IT OpsFirst, Guiding PrinciplesKnow your dataDon’t rely on IT magicEqual Importance in WebApps / SaaSHold hands and be friendsIf you can’t manage it, don’t deploy it
Admins: Be Preparedok, this sucks.
How?Replicate!Is that enough?Well, not if your company is on the lineSnapshotEvery minute???Export oftenReally???
Then What?Yes, Mongo can do IncrementalUse the mongo slave mechanismIt’s exposedIt’s supportedIt’s very easyIt’s extremely fastHow?Snapshot your dataStream write ops to diskRepeat
Better than FreeTake our tools-They work!!!SnapshotUtilSelectively snapshot in BSONIndex info too!IncrementalBackupUtilTail the oplog, stream to diskOnly the collections you want!Compress & rotateRestoreUtilRecover your snapshotsApply indexes yourselfReplayUtilApply your Incremental backups
What if ScenariosOne collection gets corrupt?Restore itApply all operations to it“My top developer dropped a collection!”Restore just that oneApply operations to it until that POT“We got hacked!”Restore it allApply operations until that POT
What else is possible?ReplicationWhy not use built-in?Control, of courseSame logic as Incremental + ReplayAdd some filters and it gets interesting
Hot DatacenterCreate incremental backupsCompressPush to DC in batchApply to masterPrimary DatacenterHot DatacenterIncremental Backup FilesMasterMasterReplay UtilSCP
Dev EnvironmentDevelopers need production-ish dataAnonymize while replicating to dev server
Multiple Upstream MastersAggregate to single collectionTarget can be a master!Master AMaster BMaster Cdb.page_viewsdb.page_views
Unblock MapReduceMap Reduce can lock up your serverReplicate source data to another mongodReplicate results back to masterMasterMR Serverdb.source_datadb.summary_data
Mesh ModeWrite to Multiple MastersFilter by “Server Identifier”> db.documents.find().limit(2){"_id":99887,"src":2,"title":"favorite.png","fsid":33774}{"_id":128773,"src":1,"title":"select.png","fsid":837743}db.documentsdocuments.src != 1Master 1Master 2db.documentsdocuments.src != 2
What’s NextMulti-Master in Wordnik ProductionMultiple Datacenter PresenceMore data => more challenges
Try it outhttp://blog.wordnik.com/mongoutilsQuestions?

Keeping the Lights On with MongoDB

  • 1.
    Mongo SVKeeping thelights on with MongoDBTony Tam12/3/2010
  • 2.
    Presentation OverviewData >>>codeTreat it appropriatelyManage and maintain MongoMongo is young (and robust!)Performance and FeaturesThe right hooks exist
  • 3.
    Who is WordnikWordnikis:The world’s largest English Language reference ~10M words!Mapping every word, based on real data(free ) API to add word information, everywhere
  • 4.
    Wordnik’s MongoDB DeploymentOver12 Months with MongoCorpus/UGC/Structured Data/StatisticsMaster/Slave~3TB data~12B recordsWe love Mongo’s performanceRead more:http://blog.wordnik.com/12-months-with-mongodb
  • 5.
    Engineering + ITOpsFirst, Guiding PrinciplesKnow your dataDon’t rely on IT magicEqual Importance in WebApps / SaaSHold hands and be friendsIf you can’t manage it, don’t deploy it
  • 6.
  • 7.
    How?Replicate!Is that enough?Well,not if your company is on the lineSnapshotEvery minute???Export oftenReally???
  • 8.
    Then What?Yes, Mongocan do IncrementalUse the mongo slave mechanismIt’s exposedIt’s supportedIt’s very easyIt’s extremely fastHow?Snapshot your dataStream write ops to diskRepeat
  • 9.
    Better than FreeTakeour tools-They work!!!SnapshotUtilSelectively snapshot in BSONIndex info too!IncrementalBackupUtilTail the oplog, stream to diskOnly the collections you want!Compress & rotateRestoreUtilRecover your snapshotsApply indexes yourselfReplayUtilApply your Incremental backups
  • 10.
    What if ScenariosOnecollection gets corrupt?Restore itApply all operations to it“My top developer dropped a collection!”Restore just that oneApply operations to it until that POT“We got hacked!”Restore it allApply operations until that POT
  • 11.
    What else ispossible?ReplicationWhy not use built-in?Control, of courseSame logic as Incremental + ReplayAdd some filters and it gets interesting
  • 12.
    Hot DatacenterCreate incrementalbackupsCompressPush to DC in batchApply to masterPrimary DatacenterHot DatacenterIncremental Backup FilesMasterMasterReplay UtilSCP
  • 13.
    Dev EnvironmentDevelopers needproduction-ish dataAnonymize while replicating to dev server
  • 14.
    Multiple Upstream MastersAggregateto single collectionTarget can be a master!Master AMaster BMaster Cdb.page_viewsdb.page_views
  • 15.
    Unblock MapReduceMap Reducecan lock up your serverReplicate source data to another mongodReplicate results back to masterMasterMR Serverdb.source_datadb.summary_data
  • 16.
    Mesh ModeWrite toMultiple MastersFilter by “Server Identifier”> db.documents.find().limit(2){"_id":99887,"src":2,"title":"favorite.png","fsid":33774}{"_id":128773,"src":1,"title":"select.png","fsid":837743}db.documentsdocuments.src != 1Master 1Master 2db.documentsdocuments.src != 2
  • 17.
    What’s NextMulti-Master inWordnik ProductionMultiple Datacenter PresenceMore data => more challenges
  • 18.