Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

WalB: Real-time and Incremental Backup System for Block Devices

4,431 views

Published on

WalB is an open-source backup system that consists of block devices, called WalB devices, and userland utilities, called WalB tools. A WalB device records write-I/Os. WalB tools extracts them to create restorable snapshots in an incremental manner.

Compared with dm-snap and dm-thin, WalB is designed to achieve small I/O latency overhead and short backup time. We conducted an experiment to take an incremental backup of a volume under random write workload. The result confirms those advantages of WalB.

Cybozu cloud platform, which has 500TB volumes and processes 25TB write-I/Os per day, is required to achieve (1) stable workload performance without I/O spikes which may affect application user experience and (2) short backup interval specified in our service level objective. WalB satisfies the requirements, while dm-snap is not enough to and dm-thin is not expected to.

Published in: Software
  • Be the first to comment

WalB: Real-time and Incremental Backup System for Block Devices

  1. 1. WalB: A Fast and Low Latency Backup System for Block Devices Cybozu Meetup #8 SRE WalB Kota Uchida September 25, 2017 1
  2. 2. 2 About me ▌Kota Uchida ▌SRE team at Cybozu, Inc. ▌A WalB developer
  3. 3. 3 About Cybozu ▌A large cloud service vendor in Japan. ▌Largest market shares in field of collaborative software. ▌We serve web applications on our own cloud platform.  kintone: a low-code business app platform  and more
  4. 4. #customer companies: #accesses / day: write IOs / day: 20,000+ 210 millions 24.5 TiB 4
  5. 5. 5 Service Level Objective ▌24/7 nonstop service ▌99.99% availability (4 min / month) ▌Daily backup (retention period is 14 days) ▌Disaster recover: copy data to a remote site once a day
  6. 6. Architecture of our platform 6 Application Server L7LB Storage Server dm-snap Storage Server dm-snap Backup Server Remote Site Database Server DiffDiff DiffDiff The scope of this talk RAID 1 Blob Server
  7. 7. Mapping Info Snapshot Management with dm-snap 7 A B Original Volume Area Snapshot Area Logical Structure Physical Structure (1) CoW Latest Image Write A’ Write B’ Snapshot Image (2) Write B’ B B’ A A’ A’ 0 1 2 3 4
  8. 8. Backup using dm-snap 8 Snapshot1 (2) Full-scan a new snapshot Logical Structure Snapshot0 B’A’ (3) Generate a diff image by comparing two snapshots B (1) Full-scan an old snapshot B’A’ A
  9. 9. Full-scan at night 9 Daytime Backup processing time o’clock
  10. 10. UX degradation during a full-scan 10Full-scanning
  11. 11. 11 We have no more “nights” ▌Until now: Full scan is allowed only when access rate is low, i.e., at night. ▌From now on: We have to handle accesses from multiple timezones. ▌We must be able to backup any time without UX degradation.
  12. 12. 12 New Solution ▌We need a new solution with:  No IO spikes  Short backup time ▌We compared dm-thin with WalB
  13. 13. 13 What is dm-thin? ▌dm-thin provides thin-provisioning volume management to  share same data among volumes  reduce disk usage using snapshots ▌In the mainline Linux kernel
  14. 14. Snapshot Management with dm-thin Logical Structure Physical Structure A Latest Tree Latest Image A
  15. 15. Snapshot Management with dm-thin 15 Logical Structure Physical Structure A Snapshot Tree Latest Tree ASnapshot Latest Image A
  16. 16. Snapshot Management with dm-thin 16 A A’ Snapshot Tree Latest Tree (1) CoW (1) CoW Write A’ Physical Structure (2) Write (2) Update A’ ASnapshot Latest Image Logical Structure
  17. 17. 17 A B B’ Snapshot0 Snapshot1 A’ A’ B’ A BSnapshot0 Snapshot1 Generate a diff image using dm-thin metadata Logical Structure Physical Structure Backup using dm-thin
  18. 18. 18 What is WalB? ▌A real-time and incremental backup system  developed at Cybozu Labs ▌Can backup block devices without IO spikes dm-snap full scanning WalB no spikes
  19. 19. Special Block Devices for WalB 19 WalB device Data device Log device Read Write Any application (File system, DBMS, etc.) Linear mapped Ring buffer
  20. 20. Write IO Logging and Backup with WalB 20 A B Data Device Log Device 0 1 2 3 4 Time series of write I/Os Time
  21. 21. Write IO Logging and Backup with WalB 21 B A B Write A’ Data Device Log Device A’ 0 1 2 3 4 1 A’ Time series of write I/Os Time Scan the log device and generate a diff image
  22. 22. Write IO Logging and Backup with WalB 22 B A B B’ Write A’ Write B’ Data Device Log Device A’ A’ 41 0 1 2 3 4 A’ A’ B’ Time series of write I/Os Scan the log device and generate a diff image Time 1
  23. 23. 23 Performance test ▌Compared dm-snap, dm-thin, and WalB ▌Executed a workload during a backup  The workload & the backup will affect each other ▌Measured the following metrics:  Latencies of the workload  Backup time
  24. 24. 24 Environment & Settings ▌Test environment:  CPU:2.40 GHz x 12 cores  MEM:192 GiB  HDD:4 TB HDD, RAID 6 (8D2P)  NIC:10 Gbps x 2  Kernel:4.11 (latest upstream) ▌Test settings:  100 GiB volumes  Workload: 4 KiB Random writes for a 5 GiB range
  25. 25. 25 Measuring the Backup Time (dm-snap, dm-thin) ▌dm-snap:take a snapshot & scan full image ▌dm-thin:get a structure of snapshot trees & find modified blocks & read these blocks 5 GiB 95 GiB (unchanged) 4 KiB Random Writes dm-snap : scan full image dm-thin : scan changed chunks (tree traversal)
  26. 26. 26 Measuring the Backup Time (WalB) ▌WalB:scan logs from a log device & send them to a backup server continuously 5 GiB 95 GiB (unchanged) 4 KiB Random Writes WalB : scan logs Log Device Write IO logsWalB Device Backup Server DiffDiff Network
  27. 27. Write I/O latency dm-thin dm-snap WalB no-backup 27 IO spikes due to CoW, worse than dm-snap! Small overhead large due to CoW
  28. 28. Backup time 28 1146 2260 1.2 slower than dm-snap so fast!
  29. 29. 29 Conclusion ▌dm-snap & dm-thin  High I/O latency during a backup  Long backup time ▌WalB  Stable and low I/O latency (no spikes)  Short backup time WalB satisfies our requirements for production use.
  30. 30. 30 Try WalB! ▌Project page  https://walb-linux.github.io/ ▌Tutorial  https://github.com/walb-linux/walb- tools/tree/master/misc/vagrant/  Vagrantfile for Ubuntu 16.04 and CentOS 7
  31. 31. Remote Host 31 Incremental backup ▌Daily backup (retention period is 14 days) ▌Worker daemon of WalB selects diff files older than 14 days and applies them to a base image. Volume Diff Diff Diff… Base Diff files for 14 days Backup Host Apply everyday
  32. 32. Remote Host 32 Restoring a volume ▌To restore the latest state of a volume:  take a snapshot of a base image, and  apply all diff files to it. Diff Diff Diff… Base Base' Writable snapshot Apply all diffs
  33. 33. Remote Host 33 Make restoration faster 1/2 ▌Fast restoration by preparing read-only snapshots for each day Diff Diff Diff… Base 1421 dm-thin snapshots for each day Diff
  34. 34. Remote Host 34 Make restoration faster 2/2 ▌Apply some diffs to the appropriate snapshot. ▌At most 24 hours of diffs are needed to be applied. Faster! Diff Diff Diff… Base 1421 Diff
  35. 35. 35 Worldline: restoring a whole environment ▌"Worldline" means a parallel world. ▌We backup configurations in addition to user data.  Configurations: definitions for each customer (ID, FQDN, Apps, …), application version definition, host definition, etc. ▌It is important to use applications whose versions are consistent with user data backed up before.
  36. 36. 36 Worldline: restoring a whole environment ▌A daily script takes a snapshot of a whole environment. ▌An weekly script restores the latest backup, so we can use it for investigation of failures or development our services. User data DiffDiff Snap shot Config DB Config DB'Backup Backup Worldline Spare hosts Restore DiffDiff Restore
  37. 37. Q&A email: kota-uchida@cybozu.co.jp twitter: @uchan_nos 37

×