Keeping the Lights On with MongoDB

  • 5,757 views
Uploaded on

A presentation by Tony Tam at the MongoSV conference in Silicon Valley, hosted by MongoDB creator 10gen.

A presentation by Tony Tam at the MongoSV conference in Silicon Valley, hosted by MongoDB creator 10gen.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,757
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
89
Comments
0
Likes
10

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Mongo SV
    Keeping the lights on with MongoDB
    Tony Tam
    12/3/2010
  • 2. Presentation Overview
    Data >>> code
    Treat it appropriately
    Manage and maintain Mongo
    Mongo is young (and robust!)
    Performance and Features
    The right hooks exist
  • 3. Who is Wordnik
    Wordnik is:
    The world’s largest English Language reference
    ~10M words!
    Mapping every word, based on real data
    (free ) API to add word information, everywhere
  • 4. Wordnik’s MongoDB Deployment
    Over 12 Months with Mongo
    Corpus/UGC/Structured Data/Statistics
    Master/Slave
    ~3TB data
    ~12B records
    We love Mongo’s performance
    Read more:
    http://blog.wordnik.com/12-months-with-mongodb
  • 5. Engineering + IT Ops
    First, Guiding Principles
    Know your data
    Don’t rely on IT magic
    Equal Importance in WebApps / SaaS
    Hold hands and be friends
    If you can’t manage it, don’t deploy it
  • 6. Admins: Be Prepared
    ok, this sucks.
  • 7. How?
    Replicate!
    Is that enough?
    Well, not if your company is on the line
    Snapshot
    Every minute???
    Export often
    Really???
  • 8. Then What?
    Yes, Mongo can do Incremental
    Use the mongo slave mechanism
    It’s exposed
    It’s supported
    It’s very easy
    It’s extremely fast
    How?
    Snapshot your data
    Stream write ops to disk
    Repeat
  • 9. Better than Free
    Take our tools-They work!!!
    SnapshotUtil
    Selectively snapshot in BSON
    Index info too!
    IncrementalBackupUtil
    Tail the oplog, stream to disk
    Only the collections you want!
    Compress & rotate
    RestoreUtil
    Recover your snapshots
    Apply indexes yourself
    ReplayUtil
    Apply your Incremental backups
  • 10. What if Scenarios
    One collection gets corrupt?
    Restore it
    Apply all operations to it
    “My top developer dropped a collection!”
    Restore just that one
    Apply operations to it until that POT
    “We got hacked!”
    Restore it all
    Apply operations until that POT
  • 11. What else is possible?
    Replication
    Why not use built-in?
    Control, of course
    Same logic as Incremental + Replay
    Add some filters and it gets interesting
  • 12. Hot Datacenter
    Create incremental backups
    Compress
    Push to DC in batch
    Apply to master
    Primary Datacenter
    Hot Datacenter
    Incremental Backup Files
    Master
    Master
    Replay Util
    SCP
  • 13. Dev Environment
    Developers need production-ish data
    Anonymize while replicating to dev server
  • 14. Multiple Upstream Masters
    Aggregate to single collection
    Target can be a master!
    Master A
    Master B
    Master C
    db.page_views
    db.page_views
  • 15. Unblock MapReduce
    Map Reduce can lock up your server
    Replicate source data to another mongod
    Replicate results back to master
    Master
    MR Server
    db.source_data
    db.summary_data
  • 16. Mesh Mode
    Write to Multiple Masters
    Filter by “Server Identifier”
    > db.documents.find().limit(2)
    {"_id":99887,"src":2,"title":"favorite.png","fsid":33774}
    {"_id":128773,"src":1,"title":"select.png","fsid":837743}
    db.documents
    documents.src != 1
    Master 1
    Master 2
    db.documents
    documents.src != 2
  • 17. What’s Next
    Multi-Master in Wordnik Production
    Multiple Datacenter Presence
    More data => more challenges
  • 18. Try it out
    http://blog.wordnik.com/mongoutils
    Questions?