14. mongodump
$ mongodump --help
Export MongoDB data to BSON files.
options:
--help produce help message
-v [ --verbose ] be more verbose (include multiple times for more
verbosity e.g. -vvvvv)
--version print the program's version and exit
-h [ --host ] arg mongo host to connect to ( /s1,s2 for
--port arg server port. Can also use --host hostname
-u [ --username ] arg username
-p [ --password ] arg password
--dbpath arg directly access mongod database files in
path, instead of connecting to a mongod
needs to lock the data directory, so can
if a mongod is currently accessing the s
-d [ --db ] arg database to use
-c [ --collection ] arg collection to use (some commands)
-o [ --out ] arg (=dump)output directory or "-" for stdout
-q [ --query ] arg json query
--oplog Use oplog for point-in-time snapshotting
15. mongodump
• Dumps collections to *.bson files
• Mirrors your structure
• --db to dump a specific database
• --collection to dump a specific collection
• --oplog to record oplog while backing up
• --query/filter selective dump
16. File System Backups
• Must use journaling
• Copy /data/db files
• Snapshot
• Seriously, always use journaling
18. File System Backups - Pros and
Cons
• Entire database
• Backup files will be large
• Fastest way to create a backup
• Fastest way to restore a backup
30. Deploy a Resilient Topology
• Redundancy
• Multiple Datacenters
• Multiple Regions
• Delayed Replication
• Can support HA and DR requirements
– HA by providing intra and inter datacenter failover
– DR by creating geographically dispersed copies of data
– DR by configuring a delay between the primary and one
or more secondaries
32. Choose the Right Tool
• RPO on the order of seconds or minutes?
– Use Replication
• RPO on the order of hours?
– Maybe backups will suffice
• RTO on the order of seconds or minutes?
– Use Replication
• RTO on the order of hours or days?
– Use backups with warm/cold standby
• Need HA and DR?
Close to 20 years developing software and systemsAcross industries including scientific research, military command and control, e-commerce, telecom, finance and government (intel, DoD and civilian agencies)I’ve spent the last 7 years working with NoSQL databases focusing on delivering solutions for big data problems to the federal and state and local governments
Story about how system is designed for DR but not HA
Business requirements for DR are typically defined in terms of RPO and RTO
The good news is that MongoDB provides a number of tools and features that let you design a solution to meet your needsThe best of which greatly simplifies the solution cases where RPO and RTO are approaching zero
I started this presentation off with an overview of DR because this is probably the most common reason why people do backupsHowever, there may be other business needs for backups Data archival System testing etc.So let’s talk about doing backups in MongoDB
Story about the navy’s failed DR plan for on-ship IT
Obtains a write lock for the duration of the backupShould be run against dedicated/hiddensecondaries in a replica setUse --oplog to ensure a consistent point-in-time backup (PIT is the time that the backup completes)
There are cases where you can do file system backups but…
RPO and RTO on the order of hours
Can be used to recover an individual node or a complete system
So far I’ve covered the standard backup and restore capabilities that you would expect from any enterprise-class databaseThis is all great but what do you really care about? You care that your data stays online and that you don’t loose it.As systems grow to handle the gobs and gobs of data that they do today, it becomes impractical to rely on these brute-force backup procedures to protect their dataSo, while backups may cover the DR requirements for some businesses, many businesses desire much smaller RPOs and RTOs than what backups can provideWhat you really want is to have your data replicated in real time so that you are essentially backing your data up as it changesThis is exactly what MongoDB’s Replication features provide
Drive home the point of backups having very practical limitations in systems dealing with Big Data. IE, how long would it take to do a backup of 100TB?How long would it take to restore that backup?