View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
Mongo SV Keeping the lights on with MongoDB Tony Tam 12/3/2010
Presentation Overview Data >>> code Treat it appropriately Manage and maintain Mongo Mongo is young (and robust!) Performance and Features The right hooks exist
Who is Wordnik Wordnik is: The world’s largest English Language reference ~10M words! Mapping every word, based on real data (free ) API to add word information, everywhere
Wordnik’s MongoDB Deployment Over 12 Months with Mongo Corpus/UGC/Structured Data/Statistics Master/Slave ~3TB data ~12B records We love Mongo’s performance Read more: http://blog.wordnik.com/12-months-with-mongodb
Engineering + IT Ops First, Guiding Principles Know your data Don’t rely on IT magic Equal Importance in WebApps / SaaS Hold hands and be friends If you can’t manage it, don’t deploy it
How? Replicate! Is that enough? Well, not if your company is on the line Snapshot Every minute??? Export often Really???
Then What? Yes, Mongo can do Incremental Use the mongo slave mechanism It’s exposed It’s supported It’s very easy It’s extremely fast How? Snapshot your data Stream write ops to disk Repeat
Better than Free Take our tools-They work!!! SnapshotUtil Selectively snapshot in BSON Index info too! IncrementalBackupUtil Tail the oplog, stream to disk Only the collections you want! Compress & rotate RestoreUtil Recover your snapshots Apply indexes yourself ReplayUtil Apply your Incremental backups
What if Scenarios One collection gets corrupt? Restore it Apply all operations to it “My top developer dropped a collection!” Restore just that one Apply operations to it until that POT “We got hacked!” Restore it all Apply operations until that POT
What else is possible? Replication Why not use built-in? Control, of course Same logic as Incremental + Replay Add some filters and it gets interesting
Hot Datacenter Create incremental backups Compress Push to DC in batch Apply to master Primary Datacenter Hot Datacenter Incremental Backup Files Master Master Replay Util SCP
Dev Environment Developers need production-ish data Anonymize while replicating to dev server
Multiple Upstream Masters Aggregate to single collection Target can be a master! Master A Master B Master C db.page_views db.page_views
Unblock MapReduce Map Reduce can lock up your server Replicate source data to another mongod Replicate results back to master Master MR Server db.source_data db.summary_data