Security Level: VM Live Migration Speedup          in Xen           Xiaowei Yang, Tao Hong                                ...
Agenda• Background• Issues and Proposals• Evaluation
Background• VM live migration scenarios  • Host evacuation on maintenance  • Load balance between hosts  • Power consumpti...
Goals• Evaluate the function and performance of VM livemigration in Xen• Increase VM live migration success ratio, especia...
VM Live Migration Algorithm               Source                  Destination           Connect socket            Send con...
Agenda• Background• Issues and Proposals• Evaluation
Issue 1: Timeout                                       Source                  Destination                                ...
Proposal• Main reason for timeout  • Long delay before Xen watch triggers, when dom0 serves multi-    VM and is stressed. ...
Issue 2: Lack of Fallback         Source                    Destination     Connect socket                                ...
Proposal• Network / socket connection breaks  • Source: domain resume is OK - VM’s memory still exists; devices    needs r...
Issue 3: Poor Performance                Migration    VM        Dom0 Throughput           Perf                 time (s) do...
Proposal – Compression        Migration    VM                Dom0              Throughput   Compression         time (s) d...
Proposal – Selective Compression                    Idle                                                VDI• Compression r...
Proposal – Multithread• Executes compression and transfer simultaneously                  Migration time (s) Compression t...
Agenda• Background• Issues and Proposals• Evaluation
Experimental Environment                                    Host Configure   Processor                    2x Intel X5620 @...
VDI – 1VM                                    • Migration time is reduced from                                    26.3s to ...
VDI – Multi-VM• Total migration time is reduced from 171.0s to 57.0s(3X faster)• The advantage of scalability is more obvi...
Webserver                                        45s                                     24s• Total migration time is redu...
Takeaways• To increase VM live migration success ratio  • Avoid timeout: Improve the notification efficiency between    mi...
Thank youwww.huawei.com
Upcoming SlideShare
Loading in...5
×

VM Live Migration Speedup in Xen

3,160

Published on

VM live migration from one physical server to another is a key advantage of virtualization. It's used widely in the scenarios such as load balance / power consumption optimization inside the cluster and host maintenance, etc. Being able to do VM live migration as quickly as possible with no service interruption is regarded as a key competitiveness of the virtualization platform.

Xen has supported live migration for many years. However our recent study shows that Xen still has lots of room to improve, in the aspects of live migration elapsed time, service downtime and concurrency instance number. Several experimental enhancements have been added and the initial result looks pretty good. For instance, merely using memory comparison before migration can speed up the elapsed time by >2X in some cases per our evaluation. The policy to balance the CPU utilization and the compression ratio is also considered.

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,160
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
85
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

VM Live Migration Speedup in Xen

  1. 1. Security Level: VM Live Migration Speedup in Xen Xiaowei Yang, Tao Hong www.huawei.comHUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  2. 2. Agenda• Background• Issues and Proposals• Evaluation
  3. 3. Background• VM live migration scenarios • Host evacuation on maintenance • Load balance between hosts • Power consumption optimization inside the cluster• Challenges • Success ratio • Fallback on exception • Migration time • 1-VM or multi-VMs • VM downtime • VM performance impact • Required resources (CPU%, network TX/RX) HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  4. 4. Goals• Evaluate the function and performance of VM livemigration in Xen• Increase VM live migration success ratio, especiallywhen the host is heavily loaded; fallback on exception• Increase VM live migration efficiency in terms of theindicators mentioned in the previous page HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  5. 5. VM Live Migration Algorithm Source Destination Connect socket Send config Create VM Enable logdirty Send whole memory Receive/restore Send dirty memory memory Migration Time Suspend Send dirty memory Receive/restore Send VM context VM Context VM Downtime Save/Send Qemu Receive/start Qemu Close socket Unpause VM Destroy VMHUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  6. 6. Agenda• Background• Issues and Proposals• Evaluation
  7. 7. Issue 1: Timeout Source Destination Connect socket Xenstore Send config Create VM watch Qemu: Enter Enable logdirty logdirty mode 5s timeout Send whole memory Receive/restore Xenstore memory watch / Send dirty memory Guest OS: Evtchn suspend Suspend 60s timeout Send dirty memory Receive/restore Xenstore Send VM context VM Context Qemu: save watch states Save/Send Qemu Receive/start Qemu 10s timeout Close socket Unpause VM Destroy VMHUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  8. 8. Proposal• Main reason for timeout • Long delay before Xen watch triggers, when dom0 serves multi- VM and is stressed. Extreme case: delay = 13s• Root cause • There are claims that xenstore has performance & scalability issue, e.g. Oxenstored paper• Workaround • Inter domain: use evtchn • Inter process: use Linux IPC, e.g. shared memory, semaphore, named pipe HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  9. 9. Issue 2: Lack of Fallback Source Destination Connect socket Without fallback Send config Create VM on failure, VM will stop execution on Enable logdirty either side! Send whole memory Receive/restore Send dirty memory memory Possible reasons of failure after Suspend suspension: Send dirty memory • Network / socket Receive/restore connection breaks Send VM context VM Context • VM fails to boot Save/Send Qemu Receive/start Qemu up, e.g. Qemu dump image Close socket Unpause VM corrupts, etc. Destroy VMHUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  10. 10. Proposal• Network / socket connection breaks • Source: domain resume is OK - VM’s memory still exists; devices needs recreation• VM fails to boot up at destination • Source: socket connection remains till the very end • Destination: • On failure, nack Source -> domain resume • On success, ack Source -> domain destroy HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  11. 11. Issue 3: Poor Performance Migration VM Dom0 Throughput Perf time (s) downtime (s) CPU% (Mbps) impact Idle 25.8 1.2 81.6 633.0 n/a VDI 26.3 1.4 115.0 661.0 n/aWebserver 44.6 ~2.0 108.0 799.0 ~30%• Migration time: long• Required resource: high • Network throughput becomes a bottleneck for multi-VM live migration• VM downtime: long• Performance Impact: high HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  12. 12. Proposal – Compression Migration VM Dom0 Throughput Compression time (s) downtime (s) CPU% (Mbps) ratio %Orig 25.8 1.2 81.6 633.0 1:1Libz 25.7 1.2 105.2 32.0 22:1• Throughput decreases dramatically• Migration time unchanged. Why? Compression time (s) Transfer time (s)Orig 0 22.6Libz 21.2 0.7• Compression (libz) consumes most of the time HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  13. 13. Proposal – Selective Compression Idle VDI• Compression ratio complies to principle of locality somehow• Introduce selective compression: skip xMB if compression ratio < y:1 Migration Compression Transfer Throughput Compression time (s) time (s) time (s) (Mbps) ratio % Orig 25.8 0 22.6 633.0 1:1 Libz 25.7 21.2 0.7 32.0 22:1Libz+opt 21.9 15.4 2.6 99.6 7.6:1 HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  14. 14. Proposal – Multithread• Executes compression and transfer simultaneously Migration time (s) Compression time (s) Transfer time (s) Orig 25.8 0 22.6 Libz 25.7 21.2 0.7 Libz+opt 21.9 15.4 2.6Libz+opt+m.t. 20.2 15.3 3.2 Throughput (Mbps) Dom0 CPU% Compression ratio % Orig 633.0 81.6% 1:1 Libz 32.0 105.2 22:1 Libz+opt 99.6 104.0 7.6:1Libz+opt+m.t. 114.0 120.0 7.6:1 HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  15. 15. Agenda• Background• Issues and Proposals• Evaluation
  16. 16. Experimental Environment Host Configure Processor 2x Intel X5620 @ 2.40GHz, SMT enabled Memory 96GB (8GB 800GMHz DDR3 x12) Storage Huawei Oceanspace S5300, thru IPSAN Network Intel Corporation 82576 Gigabit VM Configure VDI Scenario 2 vCPU, 2G vMem; Windows 7 Workload: Office, IE, PDF, Java Webserver 2 vCPU, 2G vMem; Suse11 sp1 Scenario Workload: Siego, 1024 current users Data: 100K * 35000URLs = ~ 3.5GB Client accesses all URLs randomlyHUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  17. 17. VDI – 1VM • Migration time is reduced from 26.3s to 10.7s (2.4X faster) • Multithread (compression/TX) helps further • With compression (lzo2), • TX time down • Throughput down – more scalableHUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  18. 18. VDI – Multi-VM• Total migration time is reduced from 171.0s to 57.0s(3X faster)• The advantage of scalability is more obvious HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  19. 19. Webserver 45s 24s• Total migration time is reduced from 44.6s to 23.8s(1.9X faster)• Performance impact is big – need further improve HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  20. 20. Takeaways• To increase VM live migration success ratio • Avoid timeout: Improve the notification efficiency between migration and other processes • Make sure VM lives at least on one side: fallback whenever there is a failure at Destination• To increase VM live migration efficiency • Compress memory data before TX. Different algorithms matter a lot • Do compression and data transfer parallelly • Choose a wise compression ratio to skip data block • Live migration can be 1.9X – 3X faster per our test• There is still room to improve HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  21. 21. Thank youwww.huawei.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×