Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Truly non-intrusive OpenStack Cinder backup for mission critical systems

83 views

Published on

One of the requirements for mission critical systems is to provide reliable volume backup without impacting running system. The recommended way of cinder backup is to unmount volume before backup to avoid crash consistent backup. Unmounting is intrusive in nature and may not be feasible for mission critical systems.

This presentation focuses on strategy to achieve non-intrusive cinder backup. The presentation was given in Openstack summit at Sydney on 06 Nov 2017.

https://www.openstack.org/videos/sydney-2017/truly-non-intrusive-openstack-cinder-backup-for-mission-critical-systems

Published in: Software
  • Be the first to comment

  • Be the first to like this

Truly non-intrusive OpenStack Cinder backup for mission critical systems

  1. 1. Truly non-intrusive cinder backup for mission critical systems Lightening talk by Dipak Kumar Singh & Deepak Gupta on 06 Nov 2017 at OpenStack Summit, Sydney, Australia
  2. 2. Table of Content  Challenges of Backup of Live system Example of sanity error of data, Implication of OS buffer cache on sanity at crash or live backup etc.  Current Approach & Proposed Approach Current approach, Idea on which proposed solution is based etc.  Proposed Solution & POC Design information of POC, result of POC’s validation, Known Limitations, Next Step etc.  Appendix Common Questions, References, Experimental Data etc.
  3. 3. 3 Background ▌Reliability of data and its availability is one of the key requirements for Mission critical systems. ▌Openstack ensures data availability by keeping multiple copies of data in storage nodes. It also facilitates backup for Disaster Recovery. ▌However, when it comes to Point-in-Time backup of live system, Openstack relies on volume snapshot and VM pause. Those solutions are not fool proof because impact of OS buffer cache is not accounted. Secondly, file system journaling & fsck does not always work. ▌This presentation talks about problem scenarios and their probable solutions for truly non-intrusive backup in Openstack.
  4. 4. Challenges of Backup of live system
  5. 5. 5 Simple example of Sanity of Data (1/3) Let’s understand sanity of data first in context of backup. During a backup process, an application makes two changes shown below Name: Rahul Id: 11755 Blood Test Status: Wait Blood Test File: Name: Vicky Id: 11755 … File: PatientsRecord.txt Size: 200 bytes Name: Rahul Id: 11755 Blood Test Status: NA Blood Test File: Name: Vicky Id: 11755 … File: PatientsRecord.txt Size: 200 bytes ... File: Test_79.pdf Size: 200 bytes Name: Rahul Id: 11755 Blood Test Status: OK Blood Test File: Test79.pdf Name: Vicky Id: 11755 … File: PatientsRecord.txt Size: 209 bytes ... File: Test79.pdf Size: 200 bytes TIME State0(initially) State1(at T1) State2(at T2) Second Change: Two lines modified at time T2 First Change: A new file created at time T1 One of change at T2 points to file created at T1
  6. 6. 6 Simple example of Sanity of Data (2/3) Total of three states created by the application during backup process as shown below. Name: Rahul Id: 11755 Blood Test Status: Wait Blood Test File: Name: Vicky Id: 11755 … File: PatientsRecord.txt Size: 200 bytes Name: Rahul Id: 11755 Blood Test Status: NA Blood Test File: Name: Vicky Id: 11755 … File: PatientsRecord.txt Size: 200 bytes ... File: Test_79.pdf Size: 200 bytes Name: Rahul Id: 11755 Blood Test Status: OK Blood Test File: Test79.pdf Name: Vicky Id: 11755 … File: PatientsRecord.txt Size: 209 bytes ... File: Test79.pdf Size: 200 bytes TIME State0(initially) State1(at T1) State2(at T2) Data restore must bring back to any one of these three states created by application. Any other data restore is a sanity error.
  7. 7. 7 Simple example of Sanity of Data (3/3) What if second change is saved in backup but not first. Recovered File- system will look like as shown in RHS. Name: Rahul Id: 11755 Blood Test Status: Wait Blood Test File: Name: Vicky Id: 11755 … File: PatientsRecord.txt Size: 200 bytes Name: Rahul Id: 11755 Blood Test Status: NA Blood Test File: Name: Vicky Id: 11755 … File: PatientsRecord.txt Size: 200 bytes ... File: Test_79.pdf Size: 200 bytes Name: Rahul Id: 11755 Blood Test Status: OK Blood Test File: Test79.pdf Name: Vicky Id: 11755 … File: PatientsRecord.txt Size: 209 bytes ... File: Test79.pdf Size: 200 bytes TIME State0(initially) State1(at T1) State2(at T2) Restored(at T3) Name: Rahul Id: 11755 Blood Test Status: OK Blood Test File: Test79.pdf Name: Vicky Id: 11755 … File: PatientsRecord.txt Size: 209 bytes Restored a state which was never generated by the application. Data Restore
  8. 8. 8 Point In Time Backup & Snapshot ▌Backup must be taken for any point of time in the history. For example state0,state1 or state2 in the example of previous slide. ▌Technically, this concept is called ‘Point In Time’ Backup. PIT is commonly used in context of Disaster Recovery. Backup software are expected to take PIT backup. ▌Snapshot of volume ensures PIT data of the volume in use. Then backup is taken from snapshot. ▌Since snapshot is created for the volume, data in Operating Systems buffer cache is not captured. So snapshot is not enough for PIT backup of live system.
  9. 9. 9 Implication of Buffer Cache on sanity of data ▌Data from buffer cache is written to disk with goal of better I/O. Invariably, the order in which applications writes to OS is different than order in which data is written to Disk. This is technically called out-or-order write. Example shown in diagram in next slide. Therefore, sanity of data is lost. ▌Journaling is used to ensure sanity of data. Journaling has its cost too in I/O. Therefore, default ext4 journaling mode ensures file system consistency only, not file’s data. An experiment has shown that file size became zero for new files created within 30 seconds of crash in ext4 mounted with default options. Refer to Appendix section for experimental data. Note that in OS, file-system integrity is tradeoff against performance.
  10. 10. 10 Example of Out-of-Order Write of OS buffer cache data TIME S0 at T0 S1 at T1 S2 at T2 BadState1 at T2’ S2 at T2’’ AsinVolume Aswrittenby Applicationand availableinOS No such state ever in OS What if this goes in snapshot. Name: Rahul Id: 11755 Test Status: Test File: … File: Patients.txt Size: 209 bytes Name: Rahul Id: 11755 Test Status: Test File: … File: Patients.txt Size: 209 bytes ... File: Test_79.pdf Size: 200 bytes Name: Rahul Id: 11755 Test Status: OK Test File: Test79.pdf … File: Patients.txt Size: 209 bytes ... File: Test_79.pdf Size: 200 bytes Name: Rahul Id: 11755 Test Status: OK Test File: Test79.pdf … File: Patients.txt Size: 209 bytes Name: Rahul Id: 11755 Test Status: OK Test File: Test79.pdf … File: Patients.txt Size: 209 bytes ... File: Test79.pdf Size: 200 bytes Out of Order write to disk Second application write flushed to Volume before first write Both writes flushed. OK. An application’s state S2 reached Name: Rahul Id: 11755 Test Status: Test File: … File: Patients.txt Size: 209 bytes
  11. 11. Current Approach & Proposed Approach
  12. 12. 12 Current approach to take backup of Live system ▌The most common solution to take backup of live volume is a two step process a) Create a snapshot of live volume. That might involve momentarily stopping the volume effectively the VM. b) Take backup from snapshot. ▌Pause time by (a) is usually imperceptible in current implementations of Virtual machines and Storage making it practically non-intrusive. ▌This approach is same as removing power cable from a machine, then take backup of disks attached to the machine. ▌Journaling would ensure sanity of file-system but not data unless performance costly journaling of ‘metadata+data’ is used.
  13. 13. 13 Idea of Proposed Solution – Briefly stop buffer caching of OS ▌Proposed solution is based on very simple idea of disabling buffer cache briefly for taking snapshot. ▌That is effectively making OS write-through ▌Simple CLI call on Linux will do this job ▌Proposed design and POC has been shared in subsequent slides of this presentation. ▌Feedback on its benefit in real world, proposed design and POC is solicited from the audience so that this idea can see the light of the day.
  14. 14. 14 How to disable buffer cache on Linux? ▌/proc/sys/vm/dirty_bytes defines max dirty bytes in Linux buffer cache. ▌When this value of dirty pages is reached, subsequent write() call will become write-through. ▌dirty_bytes can be changed to low value, not zero, in running Linux using CLI sysctl ▌Lower dirty_bytes value increases write() latency to the tune of milliseconds. Therefore, original value is reverted back as soon as snapshot is created.
  15. 15. Proposed Solution & POC
  16. 16. 16 Design Guidelines ▌Disable buffer cache of Guest Linux OS temporarily ▌Use other standard steps of Cinder Cinder’s snapshot and cinder backup from snapshot are used in solution so that code change is minimal. ▌Output file: Backup file produced by live system backup is exactly same as cinder’s regular backup files. Therefore, exactly same standard restore process of OpenStack is used for restoring the data. ▌Proposed Use Model  Adding new option in CLI cinder looks a good solution $ cinder backup-create –-livebackup <instanceId> ...
  17. 17. 17 Sequence of Steps for Backup 1. Retrieve current buffer cache config 2. Make buffer cache zero 4. Restore back buffer cache config Guest OSCinder 3. Generate snapshot of Volume(s) attached to Guest OS 5. Take backup from snapshot(s) 6. Delete snapshot(s) Live Backup Controller Access to Guest OS is required. Only impact is on write I/O latency in this phase. Guest OS continue to run
  18. 18. 18 Entity Relationship Diagram Controller Node POC – Overview ▌POC was based on “Sequence of Steps for Backup” mentioned in previous slide. User POC Code Guest OS (Linux) Cinder SWIFT Storage User Input 1. Instance Id, 2. Guest OS login & password Backup is produced on SWIFT as usualDatabase
  19. 19. 19 POC Validation - Using count of ‘new files with zero size’ ▌The POC was able to take backup and restore data. ▌POC was also validated by checking impact of ‘delayed block allocation’ in ext4 in the new approach. ▌Steps of Validation Take backup of Guest OS when following script is running for i in {0..600} # Create new files every second do echo hello > NewFile_$i.txt # Creating new file of 6 bytes sleep 1 done Restore backup. Count number of zero size files. Does count decrease in new approach? Observation shared in next slide.
  20. 20. 20 POC Validation – Result of ‘new files with zero size’ ▌ Number of zero size files seen after restore of live backup of Linux Guest OS. ▌ At most one file of zero size is expected Run 1 Run 2 Run 3 Run 4 Run a Run b Run c Run d Zero size file count 32 6 34 30 1 1 1 1 32 6 34 30 1 1 1 1 Run Environment Guest OS ubunto16 4.4.0-97-generic File-system Default ext4 mount option used (barrier=1,data=ordered) Run environment No load on the Hypervisor and Guest OS during testing Steps Standard OpenStack forced snapshot then backup used POC enabled ( dirty_bytes made very low during snapshot) ▌ Result: Magnitude of files with zero size has drastically reduced.
  21. 21. 21 Known Limitation & Next step ▌Known Limitation Min values supported by dirty_bytes is two pages only, not zero. It can’t be disabled completely. This is documented behavior of Linux. Reason of limit is being explored to find way to make it absolute zero. ▌Next Step Based on feedback received on idea and POC result, next course of AIs will have to be decided.
  22. 22. Appendix Common Questions, References etc.
  23. 23. 23 Common Questions ▌ Can all volumes attached to a Guest OS be backed up together? Solution does facilitate it. However, snapshot feature of OpenStack should support it too. ▌ Can multiple Guest OSes be backup up together? Same answer as above. ▌ Is it truly non-intrusive?  Yes. Guest OS continues to run  However, IO write latency would go high for some time.  Depending of snapshot feature of Cinder, Guest OS might be momentarily paused. If SAN’s hardware level snapshot is used, then no pause would be involved.
  24. 24. 24 References and Useful reading (1/2) ▌Impact of ‘Delayed Block Allocation’ on ext4’s sanity  Linus Torvalds Upset over Ext3 and Ext4 http://www.linux- magazine.com/Online/News/Linus-Torvalds-Upset-over-Ext3-and-Ext4  ext4 and data loss by Jonathan Corbet https://lwn.net/Articles/322823/ ▌Filesystem Journal  Description ‘2.1 File-system consistency’ of https://www.usenix.org/system/files/conference/fast12/chidambaram.pdf  Ext4 journal options at section ‘3. Options’ of https://www.kernel.org/doc/Documentation/filesystems/ext4.txt ▌Linux buffer cache  http://www.tldp.org/LDP/sag/html/buffer-cache.html  https://www.kernel.org/doc/Documentation/sysctl/vm.txt  Section ‘14.3.2 Writeback Parameters’ https://doc.opensuse.org/documentation/leap/tuning/html/book.sle.tuning/cha.tuning.me mory.html#cha.tuning.memory.vm
  25. 25. 25 References and Useful reading (2/2) ▌Alternative solutions to take backup of live systems  Volume Shadow Copy Service (VSS) on Windows  Vmsync on VMware Hypervisor ▌Openstack’s backup and restore  Back up and restore volumes and snapshots https://docs.openstack.org/cinder/latest/admin/blockstorage-volume-backups.html  Cinder CLI https://docs.openstack.org/python-cinderclient/latest/cli/details.html ▌Code used in experiments At https://github.com/saurabh0095/Unix-IO-test
  26. 26. 26 Experiment – Impact of ‘Delayed Block Allocation’ on ext4 (1/2) ▌Test Objective : To demonstrate magnitude of data loss at OS crash due to delayed block allocation, an experiment was performed. ▌Test Steps & Observations Create New files with small amount of data every second. The system is crashed by removing power cable. At system recovery, 30-35 recent files are seen of zero size. Expectation is that at most one file, which was getting written at the time of crash, should be of zero size. ▌Cause Zero file size is seen because default journaling mode of ext4 write metadata . But actual data write is delayed due to ‘Delayed Block Allocation’ leading to inconsistency by losing data of file.
  27. 27. 27 Experimental Data – ‘Delayed Block Allocation’ on ext4 (2/2) ▌ Number of zero size files seen after Guest OS was crashed by simulating power cable removal on two different virtual machines. ▌ At most one file of zero size is expected. Around 30 files were seen. Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Zero size file count 35 30 32 35 33 36 30 35 33 29 35 30 32 35 33 36 30 35 33 29 Two Run Environments - same result Hypervisor VirtualBox 4.3.28 on Window Microsoft Hyper-V (2016 Standard) Guest OS ubunto16 4.4.0-97-generic RHEL 7 3.10.0-123.el7.x86_64 Default ext4 mount option used (barrier=1,data=ordered) No load on the Hypervisor and Guest OS during testing
  28. 28. 28 Contact Information of Authors Dipak Kumar Singh Deepak Gupta Senior Solutions Architect, IT Platforms Deputy General Manager, IT Platform NEC Technologies India Pvt. Ltd. dipak.singh@india.nec.com deepak.gupta@india.nec.com dipak123@gmail.com dkumargupta@gmail.com http://linkedin.com/in/dipak123 https://www.linkedin.com/in/dkumargupta/ • https://www.openstack.org/summit/sydney-2017/summit-schedule/events/19305/truly-non-intrusive-openstack-cinder- backup-for-mission-critical-systems • https://www.openstack.org/assets/presentation-media/OpenStack-Truly-non-intrusive-Cinder-backup-1.1.pptx

×