Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Keeping the Lights On with MongoDB


Published on

A presentation by Tony Tam at the MongoSV conference in Silicon Valley, hosted by MongoDB creator 10gen.

Published in: Technology
  • Be the first to comment

Keeping the Lights On with MongoDB

  1. 1. Mongo SV<br />Keeping the lights on with MongoDB<br />Tony Tam<br />12/3/2010<br />
  2. 2. Presentation Overview<br />Data >>> code<br />Treat it appropriately<br />Manage and maintain Mongo<br />Mongo is young (and robust!)<br />Performance and Features<br />The right hooks exist<br />
  3. 3. Who is Wordnik<br />Wordnik is:<br />The world’s largest English Language reference <br />~10M words!<br />Mapping every word, based on real data<br />(free ) API to add word information, everywhere<br />
  4. 4. Wordnik’s MongoDB Deployment<br />Over 12 Months with Mongo<br />Corpus/UGC/Structured Data/Statistics<br />Master/Slave<br />~3TB data<br />~12B records<br />We love Mongo’s performance<br />Read more:<br /><br />
  5. 5. Engineering + IT Ops<br />First, Guiding Principles<br />Know your data<br />Don’t rely on IT magic<br />Equal Importance in WebApps / SaaS<br />Hold hands and be friends<br />If you can’t manage it, don’t deploy it<br />
  6. 6. Admins: Be Prepared<br />ok, this sucks.<br />
  7. 7. How?<br />Replicate!<br />Is that enough?<br />Well, not if your company is on the line<br />Snapshot<br />Every minute???<br />Export often<br />Really???<br />
  8. 8. Then What?<br />Yes, Mongo can do Incremental<br />Use the mongo slave mechanism<br />It’s exposed<br />It’s supported<br />It’s very easy<br />It’s extremely fast<br />How?<br />Snapshot your data<br />Stream write ops to disk<br />Repeat<br />
  9. 9. Better than Free<br />Take our tools-They work!!!<br />SnapshotUtil<br />Selectively snapshot in BSON<br />Index info too!<br />IncrementalBackupUtil<br />Tail the oplog, stream to disk<br />Only the collections you want!<br />Compress & rotate<br />RestoreUtil<br />Recover your snapshots<br />Apply indexes yourself<br />ReplayUtil<br />Apply your Incremental backups<br />
  10. 10. What if Scenarios<br />One collection gets corrupt?<br />Restore it<br />Apply all operations to it<br />“My top developer dropped a collection!”<br />Restore just that one<br />Apply operations to it until that POT<br />“We got hacked!”<br />Restore it all<br />Apply operations until that POT<br />
  11. 11. What else is possible?<br />Replication<br />Why not use built-in?<br />Control, of course<br />Same logic as Incremental + Replay<br />Add some filters and it gets interesting<br />
  12. 12. Hot Datacenter<br />Create incremental backups<br />Compress<br />Push to DC in batch<br />Apply to master<br />Primary Datacenter<br />Hot Datacenter<br />Incremental Backup Files<br />Master<br />Master<br />Replay Util<br />SCP<br />
  13. 13. Dev Environment<br />Developers need production-ish data<br />Anonymize while replicating to dev server<br />
  14. 14. Multiple Upstream Masters<br />Aggregate to single collection<br />Target can be a master!<br />Master A<br />Master B<br />Master C<br />db.page_views<br />db.page_views<br />
  15. 15. Unblock MapReduce<br />Map Reduce can lock up your server<br />Replicate source data to another mongod<br />Replicate results back to master<br />Master<br />MR Server<br />db.source_data<br />db.summary_data<br />
  16. 16. Mesh Mode<br />Write to Multiple Masters<br />Filter by “Server Identifier”<br />> db.documents.find().limit(2)<br />{"_id":99887,"src":2,"title":"favorite.png","fsid":33774}<br />{"_id":128773,"src":1,"title":"select.png","fsid":837743}<br />db.documents<br />documents.src != 1<br />Master 1<br />Master 2<br />db.documents<br />documents.src != 2<br />
  17. 17. What’s Next<br />Multi-Master in Wordnik Production<br />Multiple Datacenter Presence<br />More data => more challenges<br />
  18. 18. Try it out<br /><br />Questions?<br />