#MongoBostonStrategies for Backing UpMongoDBJeff YeminEngineering Manager, 10gen
File and Directory Layout• A set of files per database
Insert with write concern of {fsync :true}
Archive the data directory
Restore the data directory
Start mongod on restored datadirectory
Everything is fine, right?• No, its not• But you cant tell until you look
Try validating the collection• In the shell, run the validate command
How can we get a cleanbackup?• kill mongod• fsyncLock / fsyncUnlock
How can we get a cleanbackup?• mongodump
mongodump• Snapshot of each collection   – Does NOT represent a point in time, even for a single     collection• Can NOT b...
Snaphot Query                5    2                   71       3   4       6       8   9
How can we get a cleanbackup?• journaling
Journaling• Write-ahead log• Guarantees a consistent view even after a hard crash• Default behavior as of 2.0• Journal sto...
Journaling implications forbackup• Logical Volume Manager (LVM)• LVM snapshots to the rescue   –   lvcreate –size 100M –sn...
Replica Sets
Backing up a replica set• Back up a (hidden) secondary  –   kill mongod  –   fsyncLock  –   mongodump  –   LVM snapshot
Mongodump for replica sets• True point in time   – mongodump –oplog   – mongorestore –-oplogreplay• Snapshot query of each...
mongos                               configChunks!                               balancer                                 ...
Backing up a sharded cluster• mongodump through mongos  – (but no –oplog)• mongorestore through mongos
Backup a Sharded Cluster1. Stop Balancer, and wait till inactive (state:0)      db.settings.update( { _id: "balancer" },  ...
#MongoBostonThank YouJeff YeminEngineering Manager, 10gen
Upcoming SlideShare
Loading in...5
×

Strategies For Backing Up Mongo Db 10.2012 Copy

426

Published on

Presentation by Jeff Yemin @ MongoBoston.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
426
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Do the fsyncLock/fsyncUnlock demo
  • i need a picture for the first bullet
  • Make the point that while you can turn journaling off, you shouldn't.Without journaling, the approach is quite straightforward, there is a one-to-one mapping of data files to memory and when either the OS or an explicit fsync happens, your data is now safe on disk.With journaling we do some tricks.Write ahead log, that is, we write the data to the journal before we update the data itself.Each file is mapped twice, once to a private view which is marked copy-on-write, and once to the shared view – shared in the context that the disk has access to this memory.Every time we do a write, we keep a list of the region of memory that was written to.Batches into group commits, compresses and appends in a group commit to disk by appending to a special journal fileOnce that data has been written to disk, we then do a remapping phase which copies the changes into the shared view, at which point those changes can then be synced to disk.Once that data is synced to disk then it’s safe (barring hardware failure). If there is a failure before the shared/storage view is written to disk, we simply need to apply all the changes in order to the data files since the last time it was synced and we get back to a consistent view of the data
  • LVM Logical volume manager. LVM is a program that abstracts disk images from physical devices, and provides a number of raw disk manipulation and snapshot capabilities useful for system management.LvcreateThis command creates an LVM snapshot (with the --snapshot option) named mdb-snap01 of the mongodbvolume in the vg0 volume group.This example creates a snapshot named mdb-snap01 located at /dev/vg0/mdb-snap01. The location and paths to your systems volume groups and devices may vary slightly depending on your operating system’sLVM configuration.The snapshot has a cap of at 100 megabytes, because of the parameter --size 100M. This size does not reflect the total amount of the data on the disk, but rather the quantity of differences between the current state of /dev/vg0/mongodb and the creation of the snapshot (i.e. /dev/vg0/mdb-snap01.) Make sure you size this big enough.EBS:If your deployment depends on Amazon’s Elastic Block Storage (EBS) with RAID configured within your instance, it is impossible to get a consistent state across all disks using the platform’s snapshot tool. As a result you may: 1. Flush all writes to disk and create a write lock to ensure consistent state during the backup process. If you choose this option see Backup Without Journaling. 2. Configure LVM to run and hold your MongoDB data files on top of the RAID within your system.If you choose this option, perform the LVM backup operation described in Create Snapshot
  • If the secondary is hidden, then options are more varied. Killing and locking are valid options, so long as there is enough spare capacity in the system to catch up after the backup is complete
  • Ok if you have enough space to store all the data on all the shards
  • Strategies For Backing Up Mongo Db 10.2012 Copy

    1. 1. #MongoBostonStrategies for Backing UpMongoDBJeff YeminEngineering Manager, 10gen
    2. 2. File and Directory Layout• A set of files per database
    3. 3. Insert with write concern of {fsync :true}
    4. 4. Archive the data directory
    5. 5. Restore the data directory
    6. 6. Start mongod on restored datadirectory
    7. 7. Everything is fine, right?• No, its not• But you cant tell until you look
    8. 8. Try validating the collection• In the shell, run the validate command
    9. 9. How can we get a cleanbackup?• kill mongod• fsyncLock / fsyncUnlock
    10. 10. How can we get a cleanbackup?• mongodump
    11. 11. mongodump• Snapshot of each collection – Does NOT represent a point in time, even for a single collection• Can NOT be combined with fsyncLock – Remember, you cant read…• You CAN dump directly from data files to get a point in time backup – mongodump –dbpath• Can be costlier than archiving as FS level
    12. 12. Snaphot Query 5 2 71 3 4 6 8 9
    13. 13. How can we get a cleanbackup?• journaling
    14. 14. Journaling• Write-ahead log• Guarantees a consistent view even after a hard crash• Default behavior as of 2.0• Journal stored in –dbpath /journal folder• --journalCommitInterval* (2ms - 300ms)
    15. 15. Journaling implications forbackup• Logical Volume Manager (LVM)• LVM snapshots to the rescue – lvcreate –size 100M –snapshot –name mdb-snap01 /dev/vg0/mongodb• No shutdown or fsyncLock needed• True point in time backup for a single instance
    16. 16. Replica Sets
    17. 17. Backing up a replica set• Back up a (hidden) secondary – kill mongod – fsyncLock – mongodump – LVM snapshot
    18. 18. Mongodump for replica sets• True point in time – mongodump –oplog – mongorestore –-oplogreplay• Snapshot query of each collection, then replay the oplog at the end – Similar to how a new secondary does an initial sync
    19. 19. mongos configChunks! balancer config config 1 2 3 4 13 14 15 16 25 26 27 28 37 38 39 40 5 6 7 8 17 18 19 20 29 30 31 32 41 42 43 44 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4Sharded clusters
    20. 20. Backing up a sharded cluster• mongodump through mongos – (but no –oplog)• mongorestore through mongos
    21. 21. Backup a Sharded Cluster1. Stop Balancer, and wait till inactive (state:0) db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true )2. Stop a config server Backup Data – Each shard – Config server (mongodump --db config)3. Restart config server4. Resume balancer
    22. 22. #MongoBostonThank YouJeff YeminEngineering Manager, 10gen
    1. ¿Le ha llamado la atención una diapositiva en particular?

      Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

    ×