Keeping the Lights On with MongoDB

Mongo SV Keeping the lights on with MongoDB Tony Tam 12/3/2010

Presentation Overview Data >>> code Treat it appropriately Manage and maintain Mongo Mongo is young (and robust!) Performance and Features The right hooks exist

Who is Wordnik Wordnik is: The world’s largest English Language reference ~10M words! Mapping every word, based on real data (free ) API to add word information, everywhere

Wordnik’s MongoDB Deployment Over 12 Months with Mongo Corpus/UGC/Structured Data/Statistics Master/Slave ~3TB data ~12B records We love Mongo’s performance Read more: http://blog.wordnik.com/12-months-with-mongodb

Engineering + IT Ops First, Guiding Principles Know your data Don’t rely on IT magic Equal Importance in WebApps / SaaS Hold hands and be friends If you can’t manage it, don’t deploy it

Admins: Be Prepared ok, this sucks.

How? Replicate! Is that enough? Well, not if your company is on the line Snapshot Every minute??? Export often Really???

Then What? Yes, Mongo can do Incremental Use the mongo slave mechanism It’s exposed It’s supported It’s very easy It’s extremely fast How? Snapshot your data Stream write ops to disk Repeat

Better than Free Take our tools-They work!!! SnapshotUtil Selectively snapshot in BSON Index info too! IncrementalBackupUtil Tail the oplog, stream to disk Only the collections you want! Compress & rotate RestoreUtil Recover your snapshots Apply indexes yourself ReplayUtil Apply your Incremental backups

What if Scenarios One collection gets corrupt? Restore it Apply all operations to it “My top developer dropped a collection!” Restore just that one Apply operations to it until that POT “We got hacked!” Restore it all Apply operations until that POT

What else is possible? Replication Why not use built-in? Control, of course Same logic as Incremental + Replay Add some filters and it gets interesting

Hot Datacenter Create incremental backups Compress Push to DC in batch Apply to master Primary Datacenter Hot Datacenter Incremental Backup Files Master Master Replay Util SCP

Dev Environment Developers need production-ish data Anonymize while replicating to dev server

Multiple Upstream Masters Aggregate to single collection Target can be a master! Master A Master B Master C db.page_views db.page_views

Unblock MapReduce Map Reduce can lock up your server Replicate source data to another mongod Replicate results back to master Master MR Server db.source_data db.summary_data

Mesh Mode Write to Multiple Masters Filter by “Server Identifier” > db.documents.find().limit(2) {"_id":99887,"src":2,"title":"favorite.png","fsid":33774} {"_id":128773,"src":1,"title":"select.png","fsid":837743} db.documents documents.src != 1 Master 1 Master 2 db.documents documents.src != 2

What’s Next Multi-Master in Wordnik Production Multiple Datacenter Presence More data => more challenges

Try it out http://blog.wordnik.com/mongoutils Questions?

Keeping the Lights On with MongoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Keeping the Lights On with MongoDB

Similar to Keeping the Lights On with MongoDB (20)

More from Tony Tam

More from Tony Tam (17)

Recently uploaded

Recently uploaded (20)

Keeping the Lights On with MongoDB