On MongoDB backup

1,450 views
1,161 views

Published on

A gentle overview of MongoDB backup issues in a real sharding + replication production environment.

Published in: Software, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,450
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
36
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Version 1.0:2014-05-02 @ Gogolook RD 內訓,30 minutes。
  • Source: http://docs.mongodb.org/manual/core/sharded-cluster-architectures-production/
  • @diagram: http://docs.mongodb.org/manual/core/sharding-introduction/
  • Figure: Ch 10 - Components of a Replica Set, in MongoDB: The Definitive Guide, 2nd edition.
  • On MongoDB backup

    1. 1. On MongoDB Backup 2014.05.02. 葉秉哲 William Yeh
    2. 2. Why backup?  Availability vs. recovery  Cost is impacted by these decisions…  How much data can you afford to lose?  How long can you afford to be off-line?  Ref: “MongoDB Backups & Disaster Recovery” http://www.mongodb.com/presentations/webinar-backups-and-disaster- recovery 2.
    3. 3. Tasks 3. Backup Restore One-time snapshot Incremental
    4. 4. Two kinds of backup  One-time snapshot  Capture a consistent snapshot at a specific point in time Not easy in a distributed zero-downtime production system.  Incremental backup  Capture differences since last one-time snapshot Oplog is N/A in MongoDB config servers (to be discussed later) 4.
    5. 5. Cluster architecture in production 5.
    6. 6. How about MMS? MongoDB Management Service http://mms.mongodb.com 6.
    7. 7. $$ 7.
    8. 8. 8. Backup Restore One-time snapshot Incremental
    9. 9. One-time snapshot steps 1. Stop balancer (mongos + config) 2. Take snapshot of config data 3. For each shard, take snapshot of one secondary 4. Re-start the balancer 5. Copy snapshots in steps 2 & 3  Ref: “Backup a Sharded Cluster with Filesystem Snapshots” http://bit.ly/1hqO1kj 9.
    10. 10. 10.10. 1 2 3 4 5 Copy outCopy out Take snapshot Take snapshot
    11. 11. #1#1 Stop BalancerStop Balancer 11.
    12. 12. Stop balancer… Why?  “If MongoDB migrates a chunk during a backup, you can end with an inconsistent snapshot of your sharded cluster. Never run a backup while the balancer is active.”  Ref: “Disable Balancing During Backups” http://bit.ly/1rnRMfV 12.
    13. 13. #1: Stop balancer var sleep_time_in_ms = 10000; sh.setBalancerState(false); while (sh.getBalancerState() || sh.isBalancerRunning()) { sleep(sleep_time_in_ms); } // now the balancer is stopped… Ref: “Disable Balancing During Backups” http://bit.ly/1rnRMfV13.
    14. 14. #2#2 Snapshot -Snapshot - config dataconfig data 14.
    15. 15. #3#3 Snapshot -Snapshot - secondarysecondary of each shardof each shard 15.
    16. 16. Snapshot the “dbpath”  Snapshoting step, almost the same:  #2: Config server  #3: Secondary  … but just one difference:  “Never use db.fsyncLock() on config databases.”  Ref: “Backup a Sharded Cluster with Filesystem Snapshots” http://bit.ly/1hqO1kj 16.
    17. 17. Snapshot approaches  “True” snapshot  LVM, ZFS, Btrfs, etc.  Cloud snapshot (EC2, GCE, etc.)  “Fake-and-slowwwww” dump  mongodump  fsyncLock() + tar + pigz  Wordnik tool: https://github.com/wordnik/wordnik-oss 17.
    18. 18. Considerations for snapshot  Timing  “Freeze” time period  Archive (including compression & transmission)  Storage efficiency  Copy-on-write snapshot is better 18.
    19. 19. Snapshot: copy on write  Ref: “ 磁 配額碟 (Quota) 與進階 案系統管理”檔 http://bit.ly/R67Yrd 19.
    20. 20. #4#4 Re-start BalancerRe-start Balancer 20.
    21. 21. #4: Re-start balancer sh.setBalancerState(true); // now the balancer is re-started… Ref: “Enable the Balancer” http://bit.ly/1hfKFl2 21.
    22. 22. #5#5 Copy SnapshotCopy Snapshot 22.
    23. 23. Considerations for snapshot  Timing  “Freeze” time  Archive (including compression & transmission)  Storage efficiency  Copy-on-write snapshot is better 23.
    24. 24. 24. Backup Restore One-time snapshot Incremental
    25. 25. Restore approaches  “True” snapshot  LVM, ZFS, Btrfs, etc.  Cloud snapshot (EC2, GCE, etc.)  “Fake-and-slowwwww” dump  mongorestore  Daemon stop + untar  Wordnik tool: https://github.com/wordnik/wordnik-oss 25.
    26. 26. 26. Backup Restore One-time snapshot Incremental
    27. 27. Oplog  Ordered list of write operations 27.
    28. 28. Oplog tools  Ready-to-use command line tools  mongodump + mongorestore  Wordnik tool: https://github.com/wordnik/wordnik-oss  Tayra tool: http://www.jroller.com/DhavalDalal/entry/tayra_an_incremental_ backup_tool  mongosync: http://nosqldb.org/topic/5173d275cbce24580a033bd8  Still many others:  https://github.com/search?q=mongo+oplog&type=Repositories 28.
    29. 29. Oplog in real-life  Replication lag  Inspect it in real production environment!  Oplog is N/A in MongoDB config servers  … Still need to deal with the balancer  … Still need to use snapshot techniques 29.
    30. 30. Replication lag in real-life 30.
    31. 31. References  MongoDB: The Definitive Guide, 2/e  MongoDB Manual  http://docs.mongodb.org/manual/administration/backup/  MongoDB Backups & Disaster Recovery  http://www.mongodb.com/presentations/webinar-backups-and-disaster-recovery  Backup Strategies: Keeping Your Data Safe  http://www.mongodb.com/presentations/backup-strategies-keeping-your-data- safe  http://www.slideshare.net/fehguy/keeping-mongodb-data-safe 31.

    ×