• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
VM Live Migration Speedup in Xen
 

VM Live Migration Speedup in Xen

on

  • 3,192 views

VM live migration from one physical server to another is a key advantage of virtualization. It's used widely in the scenarios such as load balance / power consumption optimization inside the cluster ...

VM live migration from one physical server to another is a key advantage of virtualization. It's used widely in the scenarios such as load balance / power consumption optimization inside the cluster and host maintenance, etc. Being able to do VM live migration as quickly as possible with no service interruption is regarded as a key competitiveness of the virtualization platform.

Xen has supported live migration for many years. However our recent study shows that Xen still has lots of room to improve, in the aspects of live migration elapsed time, service downtime and concurrency instance number. Several experimental enhancements have been added and the initial result looks pretty good. For instance, merely using memory comparison before migration can speed up the elapsed time by >2X in some cases per our evaluation. The policy to balance the CPU utilization and the compression ratio is also considered.

Statistics

Views

Total Views
3,192
Views on SlideShare
2,536
Embed Views
656

Actions

Likes
3
Downloads
53
Comments
0

7 Embeds 656

http://www.xen.org 475
http://xen.org 138
http://www-archive.xenproject.org 32
http://staging.xen.org 5
http://xen.xensource.com 4
http://translate.googleusercontent.com 1
http://webcache.googleusercontent.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    VM Live Migration Speedup in Xen VM Live Migration Speedup in Xen Presentation Transcript

    • Security Level: VM Live Migration Speedup in Xen Xiaowei Yang, Tao Hong www.huawei.comHUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • Agenda• Background• Issues and Proposals• Evaluation
    • Background• VM live migration scenarios • Host evacuation on maintenance • Load balance between hosts • Power consumption optimization inside the cluster• Challenges • Success ratio • Fallback on exception • Migration time • 1-VM or multi-VMs • VM downtime • VM performance impact • Required resources (CPU%, network TX/RX) HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • Goals• Evaluate the function and performance of VM livemigration in Xen• Increase VM live migration success ratio, especiallywhen the host is heavily loaded; fallback on exception• Increase VM live migration efficiency in terms of theindicators mentioned in the previous page HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • VM Live Migration Algorithm Source Destination Connect socket Send config Create VM Enable logdirty Send whole memory Receive/restore Send dirty memory memory Migration Time Suspend Send dirty memory Receive/restore Send VM context VM Context VM Downtime Save/Send Qemu Receive/start Qemu Close socket Unpause VM Destroy VMHUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • Agenda• Background• Issues and Proposals• Evaluation
    • Issue 1: Timeout Source Destination Connect socket Xenstore Send config Create VM watch Qemu: Enter Enable logdirty logdirty mode 5s timeout Send whole memory Receive/restore Xenstore memory watch / Send dirty memory Guest OS: Evtchn suspend Suspend 60s timeout Send dirty memory Receive/restore Xenstore Send VM context VM Context Qemu: save watch states Save/Send Qemu Receive/start Qemu 10s timeout Close socket Unpause VM Destroy VMHUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • Proposal• Main reason for timeout • Long delay before Xen watch triggers, when dom0 serves multi- VM and is stressed. Extreme case: delay = 13s• Root cause • There are claims that xenstore has performance & scalability issue, e.g. Oxenstored paper• Workaround • Inter domain: use evtchn • Inter process: use Linux IPC, e.g. shared memory, semaphore, named pipe HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • Issue 2: Lack of Fallback Source Destination Connect socket Without fallback Send config Create VM on failure, VM will stop execution on Enable logdirty either side! Send whole memory Receive/restore Send dirty memory memory Possible reasons of failure after Suspend suspension: Send dirty memory • Network / socket Receive/restore connection breaks Send VM context VM Context • VM fails to boot Save/Send Qemu Receive/start Qemu up, e.g. Qemu dump image Close socket Unpause VM corrupts, etc. Destroy VMHUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • Proposal• Network / socket connection breaks • Source: domain resume is OK - VM’s memory still exists; devices needs recreation• VM fails to boot up at destination • Source: socket connection remains till the very end • Destination: • On failure, nack Source -> domain resume • On success, ack Source -> domain destroy HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • Issue 3: Poor Performance Migration VM Dom0 Throughput Perf time (s) downtime (s) CPU% (Mbps) impact Idle 25.8 1.2 81.6 633.0 n/a VDI 26.3 1.4 115.0 661.0 n/aWebserver 44.6 ~2.0 108.0 799.0 ~30%• Migration time: long• Required resource: high • Network throughput becomes a bottleneck for multi-VM live migration• VM downtime: long• Performance Impact: high HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • Proposal – Compression Migration VM Dom0 Throughput Compression time (s) downtime (s) CPU% (Mbps) ratio %Orig 25.8 1.2 81.6 633.0 1:1Libz 25.7 1.2 105.2 32.0 22:1• Throughput decreases dramatically• Migration time unchanged. Why? Compression time (s) Transfer time (s)Orig 0 22.6Libz 21.2 0.7• Compression (libz) consumes most of the time HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • Proposal – Selective Compression Idle VDI• Compression ratio complies to principle of locality somehow• Introduce selective compression: skip xMB if compression ratio < y:1 Migration Compression Transfer Throughput Compression time (s) time (s) time (s) (Mbps) ratio % Orig 25.8 0 22.6 633.0 1:1 Libz 25.7 21.2 0.7 32.0 22:1Libz+opt 21.9 15.4 2.6 99.6 7.6:1 HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • Proposal – Multithread• Executes compression and transfer simultaneously Migration time (s) Compression time (s) Transfer time (s) Orig 25.8 0 22.6 Libz 25.7 21.2 0.7 Libz+opt 21.9 15.4 2.6Libz+opt+m.t. 20.2 15.3 3.2 Throughput (Mbps) Dom0 CPU% Compression ratio % Orig 633.0 81.6% 1:1 Libz 32.0 105.2 22:1 Libz+opt 99.6 104.0 7.6:1Libz+opt+m.t. 114.0 120.0 7.6:1 HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • Agenda• Background• Issues and Proposals• Evaluation
    • Experimental Environment Host Configure Processor 2x Intel X5620 @ 2.40GHz, SMT enabled Memory 96GB (8GB 800GMHz DDR3 x12) Storage Huawei Oceanspace S5300, thru IPSAN Network Intel Corporation 82576 Gigabit VM Configure VDI Scenario 2 vCPU, 2G vMem; Windows 7 Workload: Office, IE, PDF, Java Webserver 2 vCPU, 2G vMem; Suse11 sp1 Scenario Workload: Siego, 1024 current users Data: 100K * 35000URLs = ~ 3.5GB Client accesses all URLs randomlyHUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • VDI – 1VM • Migration time is reduced from 26.3s to 10.7s (2.4X faster) • Multithread (compression/TX) helps further • With compression (lzo2), • TX time down • Throughput down – more scalableHUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • VDI – Multi-VM• Total migration time is reduced from 171.0s to 57.0s(3X faster)• The advantage of scalability is more obvious HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • Webserver 45s 24s• Total migration time is reduced from 44.6s to 23.8s(1.9X faster)• Performance impact is big – need further improve HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • Takeaways• To increase VM live migration success ratio • Avoid timeout: Improve the notification efficiency between migration and other processes • Make sure VM lives at least on one side: fallback whenever there is a failure at Destination• To increase VM live migration efficiency • Compress memory data before TX. Different algorithms matter a lot • Do compression and data transfer parallelly • Choose a wise compression ratio to skip data block • Live migration can be 1.9X – 3X faster per our test• There is still room to improve HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
    • Thank youwww.huawei.com