In this webinar, we'll discuss the different ways to back up and restore your single servers, replica sets, and sharded clusters in case of a disaster scenario. We'll review various approaches, including taking filesystem snapshots, using mongodump and mongorestore, or leveraging MongoDB Management Service to backup and restore.
2. 2
• Curious about backups and disaster recovery
• Proactively planning for your deployment
• Currently facing an impending or ongoing crisis
What Brought You Here?
3. 3
• Discuss recovery vs. availability
• Review backup and restore tools
– What is the relative complexity of each option?
• Understand high availability
– What are you trying to protect against?
Goals
7. 7
• Don’t confuse the two
• Distinctly different business requirements
• Technical solutions may converge
Availability vs. Recovery
8. 8
• Availability
– Data is readable/writable in the face of infrastructure
failures
– Tunable resiliency depending upon failure scenarios
• Recovery
– Data is safe, nothing is lost
– Returning from failure is a straightforward task
Definitions
9. 9
• How much data can you afford to lose?
• How long can you afford to be off-line?
• Cost is impacted by these decisions.
Considerations
15. 15
• Can be run in live or offline mode
• Oplog-aware for point-in-time operations
• Filter can be applied in both directions
• http://docs.mongodb.org/manual/reference/program
/mongodump/
• http://docs.mongodb.org/manual/reference/program
/mongorestore/
mongodump / mongorestore
16. 16
• Copy files in your data directory (e.g. /data/db)
• Filesystem or block storage snapshot
• Considerations
– Journaling (on by default*)
– Ensuring consistency
Storage-level Backups
17. 17
• Entire database is backed up
– Backup files will be large
• Fastest way to create a backup
• Fastest way to restore a backup
• Ongoing management requires devops expertise
Storage-level Considerations
19. 19
• Recovery at scale is challenging
– Deployments are growing all the time
• MongoDB needed a simple, scalable approach to
secure recovery
• Ongoing management of manual solutions was
difficult
• Overhead needed to be minimal
MMS Backup: Background
20. 20
MMS Backup: Features
Available
• Cloud-based service
• Archived across DCs
Secure
• Data is encrypted in-
transit
• 2-factor auth for
restores
Managed
• Developed and
monitored by MongoDB
• Point-in-time backups
Overhead
• Lightweight
agent, processes oplog
Restores
• Free, unlimited
• Seed new environments
30. 30
• Increased data redundancy
• Replica Set “majority” drives availability
• Deploy across multiple levels
– Racks, Regions, Data Centers
• Can support recovery and availability requirements
– Recovery: geographically dispersed copies of data
– Availability: multi-level failover protection
Resilient Topology
32. 32
Stop the
balancer (wait)
(or use
scheduled
window)
Stop one config
server (data is
still r/w)
Backup config
server
Execute backup
across all
shards
Restart config
server
Resume
balancer
Backup: Sharded Cluster
33. 33
• Goals:
– Return to normal operation/configuration, or
– Amend configuration parameters?
• Normal operation:
– Stop mongod and mongos processes
– Restore each shard as a replica set restore
– Restore config server data
– Restart mongos and mongod
• Amended deployment (advanced!):
– Shard key, shard topology, config hostnames
Restore: Sharded Cluster
34. 34
• mongodump/mongorestore
– --oplog[Replay]
– --objcheck/--repair
– --dbpath
– --query/--filter
• bsondump
– inspect data at console
• lvm snapshot time/space trade-off
– Multi EBS backup
– clean up snapshots
Other Recovery Tools
36. 36
• Pick the easiest solution and backup immediately
– Then test the restore process
• Interim solutions are ok – data safety is paramount
• Iterate into a long-term scalable approach
• Happieness is at the intersection of availability and
recovery
• Manage complexity and scalability
Work Towards a Balanced Solution