Pre-Copy and Post-Copy VM Live Migrationfor Memory Intensive ApplicationsBenoit Hudzia Sr. Researcher; SAP Research CEC Be...
AgendaIntroductionLive MigrationMemory Intensive AppsPre-Copy Live MigrationPost-Copy Live MigrationSummary© 2012 SAP AG...
IntroductionThe challenge of live migrating memory intensive applications
What is live migration?Moving of a guest VM running on host A (the source host) tohost B (the destination host) without no...
What is VM migration?CPU state and device driver state migration MBs of information can be transferred while VM is stoppe...
Memory Intensive ApplicationsAn application dominated by high rate of memory read/write accessesWhy must memory be transfe...
Pre-Copy Live MigrationReducing the amount of page re-sends and the cost of page re-sends
Pre-Copy Memory TransferMemory is transferred before VM relocationThe problem is how to copy memory while it is re-dirtied...
Pre-Copy Live Migration (pre-migration)                                      Guest VM                                     ...
Pre-Copy Live Migration (reservation)                                      Guest VM                                       ...
Pre-Copy Live Migration (pre-copy)                                      Guest VM                                          ...
Pre-Copy Live Migration (stop and copy)                                      Guest VM                                     ...
Pre-Copy Live Migration (commit)                                                                                          ...
LRU Page Reordering (reducing number of pages transferred)An LRU queue is used for tracking all historic pagedirty events ...
XBZRLE Delta Encoder (reducing cost of page resent)Cache page with highest dirtying rate   Classic Pre-Copyduring send ope...
XBZRLE Delta Encoder - Example                   0                  1      2      3      4      5      6      7      SizeO...
XBZRLE Delta Encoder - Micro Benchmark                                      Sparse   Medium   Dense   Very Dense          ...
Pre-copy Live Migration Evaluation Host: 1 core; 2 GB                          Host: 2 core; 8 GB; Bandwidth 240 Mbps Benc...
Post-Copy Live MigrationFast live migration … but at a cost?
Post-Copy Memory TransferMemory is transferred only after VM relocationThe problem is how to ensure that VM performance is...
Post-Copy Live Migration (pre-migration)                                      Guest VM                                    ...
Post-Copy Live Migration (reservation)                                      Guest VM                                      ...
Post-Copy Live Migration (stop and copy)                                      Guest VM                                    ...
Post-Copy Live Migration (post-copy)                                      Guest VM                                        ...
Post-Copy Live Migration (commit)                                                                                         ...
Reducing Effects of Network Bound Page FaultsLow latency RDMA page transfer protocol (reducing latency/cost of page faults...
Evaluation Pre-Copy vs. Post-CopySpecification Benchmark: Google Stress Application Test (SAT)  with a 1 GB working set ...
Summary
Key takeawaysPre-Copy and Post-Copy optimizations enhace thesupport of live migrating memory intessive applicationsTo be r...
Thank youBenoit Hudzia; Sr. Researcher;SAP Research CEC Belfastbenoit.hudzia@sap.comAidan Shribman; Sr. Researcher;SAP Res...
Legal DisclaimerThe information in this presentation is confidential and proprietary to SAP and may not be disclosed witho...
VHPC'12: Pre-Copy and Post-Copy VM Live Migration for Memory Intensive Applications
Upcoming SlideShare
Loading in …5
×

VHPC'12: Pre-Copy and Post-Copy VM Live Migration for Memory Intensive Applications

1,959 views

Published on

Published in: Technology, Business

VHPC'12: Pre-Copy and Post-Copy VM Live Migration for Memory Intensive Applications

  1. 1. Pre-Copy and Post-Copy VM Live Migrationfor Memory Intensive ApplicationsBenoit Hudzia Sr. Researcher; SAP Research CEC Belfast (UK)Aidan Shribman Sr. Researcher; SAP Research IsraelWith the contribution of Roei Tell, Steve Walsh, Peter Izsak and Chaim Bendelac
  2. 2. AgendaIntroductionLive MigrationMemory Intensive AppsPre-Copy Live MigrationPost-Copy Live MigrationSummary© 2012 SAP AG. All rights reserved. 2
  3. 3. IntroductionThe challenge of live migrating memory intensive applications
  4. 4. What is live migration?Moving of a guest VM running on host A (the source host) tohost B (the destination host) without noticeable servicedisruption.Typically used for dynamic resource load balancing in cloudenvironments. So the faster to migrate and the less the servicedisruption the better.. © 2012 SAP AG. All rights reserved. 4
  5. 5. What is VM migration?CPU state and device driver state migration MBs of information can be transferred while VM is stoppedMemory state migration GBs of information must (at least mostly) be transferred while VM is live Requirement: is that memory can be migrated during guest executionStorage state migration TBs of information (requiring minutes up to hours to move) Recommendation: use shared storage (accessed via NFS, etc)NIC relocation ARP announcement that IP address has been taken over Requirement: both hosts reside on the same LAN© 2012 SAP AG. All rights reserved. 5
  6. 6. Memory Intensive ApplicationsAn application dominated by high rate of memory read/write accessesWhy must memory be transferred mid execution?16 GBs takes 12 seconds to migrate over 10 GbE (1.25 GB/s) interconnects12 seconds downtime  would cause severe service disruptionThe challenge (of live migration implementations) is to transfer memory a mid high rates ofmemory shurn© 2012 SAP AG. All rights reserved. 6
  7. 7. Pre-Copy Live MigrationReducing the amount of page re-sends and the cost of page re-sends
  8. 8. Pre-Copy Memory TransferMemory is transferred before VM relocationThe problem is how to copy memory while it is re-dirtied over andover again by the guest VMSolved by first copying all the memory followed by intervals ofcopying newly dirtied pages until remaining state is small enoughImplemented by most (all) hypervisors VMWare, Xen, QemuChallenged by fast memory dirtying applications© 2012 SAP AG. All rights reserved. 8
  9. 9. Pre-Copy Live Migration (pre-migration) Guest VM Host A Host B Iterative Stop Commit Pre-migrate Reservation Pre-copy X and Rounds Copy Live on A Degraded on A Downtime Live on B Total Migration Time© 2012 SAP AG. All rights reserved. 9
  10. 10. Pre-Copy Live Migration (reservation) Guest VM Guest VM Host A Host B Iterative Stop Commit Pre-migrate Reservation Pre-copy X and Rounds Copy Live on A Degraded on A Downtime Live on B Total Migration Time© 2012 SAP AG. All rights reserved. 10
  11. 11. Pre-Copy Live Migration (pre-copy) Guest VM Guest VM Host A Host B Iterative Stop Commit Pre-migrate Reservation Pre-copy X and Rounds Copy Live on A Degraded on A Downtime Live on B Total Migration Time© 2012 SAP AG. All rights reserved. 11
  12. 12. Pre-Copy Live Migration (stop and copy) Guest VM Guest VM } Host A Host B Iterative Stop Commit Pre-migrate Reservation Pre-copy X and Rounds Copy Live on A Degraded on A Downtime Live on B Total Migration Time© 2012 SAP AG. All rights reserved. 12
  13. 13. Pre-Copy Live Migration (commit) Guest VM Host A Host B Iterative Stop Commit Pre-migrate Reservation Pre-copy X and Rounds Copy Live on A Degraded on A Downtime Live on B Total Migration Time© 2012 SAP AG. All rights reserved. 13
  14. 14. LRU Page Reordering (reducing number of pages transferred)An LRU queue is used for tracking all historic pagedirty events (i.e. application wrote to memory)QEMU (responsible for the live migration process)queries the Linux Kernel KVM hypervisor for dirtypage events. It then inserts dirty page events intothe LRU queue (after a random shuffle)Dirty pages (maintained in a separate LRU queue)are inserted according to historic LRU queue andtransmitted head firstLRU reordering may have an adverse effect forsome workloads requiring a smarter cache (ARC?)© 2012 SAP AG. All rights reserved. 14
  15. 15. XBZRLE Delta Encoder (reducing cost of page resent)Cache page with highest dirtying rate Classic Pre-Copyduring send operationIf old page is found in cache  deltacompress it; Else  transfer new Delta Compression Pre-Copypage uncompressedImplemented using an LRU cacheconfigured to fit working set size© 2012 SAP AG. All rights reserved. 15
  16. 16. XBZRLE Delta Encoder - Example 0 1 2 3 4 5 6 7 SizeOld ‘a’ ‘b’ ‘c’ ‘d’ ‘e’ ‘f’ ‘g’ ‘h’ 8New ‘a’ ‘b’ ‘c’ 0x00 0x01 ‘f’ ‘g’ ‘h’ 8XOR 0x00 0x00 0x00 ‘d’ ‘f’ 0x00 0x00 0x00 8RLE 0x00 0x03 ‘d’ 0x01 ‘f’ 0x01 0x00 0x03 8ZRLE 0x00 0x03 ‘d’ ‘f’ 0x00 0x03 6XBRLE (XOR + RLE) presented in VEE 2011: Evaluation of Delta Compression Techniques forEfficient Live Migration of Large Virtual Machines by Benoit, Svard, Tordsson and ElmrothXBZRLE (XOR + ZRLE) the current effort already released to QEMU community© 2012 SAP AG. All rights reserved. 16
  17. 17. XBZRLE Delta Encoder - Micro Benchmark Sparse Medium Dense Very Dense Jump 1111 701 203 121 Dirty 12 33 41 43© 2012 SAP AG. All rights reserved. 17
  18. 18. Pre-copy Live Migration Evaluation Host: 1 core; 2 GB Host: 2 core; 8 GB; Bandwidth 240 Mbps Benchmark: synthetic dirtying 512 MB; 1 T Benchmark: appmembench dirtying 512 MB at 1 GB/s 8 T© 2012 SAP AG. All rights reserved. 18
  19. 19. Post-Copy Live MigrationFast live migration … but at a cost?
  20. 20. Post-Copy Memory TransferMemory is transferred only after VM relocationThe problem is how to ensure that VM performance is not degraded (afterrelocation) due to expensive network bound page faultsSolved by using fast interconnects and improved page fault mechanismReferences: Yobusame PoC for KVM Linux (using char device user modetransfer)Challenged by fast memory reading applications© 2012 SAP AG. All rights reserved. 20
  21. 21. Post-Copy Live Migration (pre-migration) Guest VM Host A Host B Stop Page Pushing Commit Pre-migrate Reservation and 1 Copy Round Live on A Downtime Degraded on B Live on B Total Migration Time© 2012 SAP AG. All rights reserved. 21
  22. 22. Post-Copy Live Migration (reservation) Guest VM Guest VM Host A Host B Stop Page Pushing Commit Pre-migrate Reservation and 1 Copy Round Live on A Downtime Degraded on B Live on B Total Migration Time© 2012 SAP AG. All rights reserved. 22
  23. 23. Post-Copy Live Migration (stop and copy) Guest VM Guest VM Host A Host B Stop Page Pushing Commit Pre-migrate Reservation and 1 Copy Round Live on A Downtime Degraded on B Live on B Total Migration Time© 2012 SAP AG. All rights reserved. 23
  24. 24. Post-Copy Live Migration (post-copy) Guest VM Guest VM Page fault Page push Host A Host B Stop Page Pushing Commit Pre-migrate Reservation and 1 Copy Round Live on A Downtime Degraded on B Live on B Total Migration Time© 2012 SAP AG. All rights reserved. 24
  25. 25. Post-Copy Live Migration (commit) Guest VM Host A Host B Stop Page Pushing Commit Pre-migrate Reservation and 1 Copy Round Live on A Downtime Degraded on B Live on B Total Migration Time© 2012 SAP AG. All rights reserved. 25
  26. 26. Reducing Effects of Network Bound Page FaultsLow latency RDMA page transfer protocol (reducing latency/cost of page faults) Implemented fully in kernel mode OFED VERBS Can use the fastest RDMA hardware available (IB, IWARP, RoCE)Demand pre-paging (pre-fetching) mechanism (reducing the number of page faults) Currently only a simple fetching of pages surrounding page on which fault occuredFull Linux MMU integration (reducing the system-wide effects/cost of page fault) Enabling to perform page fault transparency (only pausing the requesting thread)Hybrid post-copy live migration (reducing the number of page faults) Moving some of the pages to the destination in a bounded pre-copy phase prior to relocation© 2012 SAP AG. All rights reserved. 26
  27. 27. Evaluation Pre-Copy vs. Post-CopySpecification Benchmark: Google Stress Application Test (SAT) with a 1 GB working set Communication: softIWARP over 1 GbE Guest VM: 2 x vCPU; 1 GB – 14 GB memoryResultsRoughly the total migration time between pre-copyto post-copy is 1:10© 2012 SAP AG. All rights reserved. 27
  28. 28. Summary
  29. 29. Key takeawaysPre-Copy and Post-Copy optimizations enhace thesupport of live migrating memory intessive applicationsTo be released as open source under GPLv2 and LGPLlicenses to Qemu and Linux communitiesDeveloped by SAP Research Technology Infrastructure(TI) Practice© 2012 SAP AG. All rights reserved. 30
  30. 30. Thank youBenoit Hudzia; Sr. Researcher;SAP Research CEC Belfastbenoit.hudzia@sap.comAidan Shribman; Sr. Researcher;SAP Research Israelaidan.Shribman@sap.com
  31. 31. Legal DisclaimerThe information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission ofSAP. This presentation is not subject to your license agreement or any other service or subscription agreement with SAP. SAPhas no obligation to pursue any course of business outlined in this document or any related presentation, or to develop orrelease any functionality mentioned therein. This document, or any related presentation and SAPs strategy and possible futuredevelopments, products and or platforms directions and functionality are all subject to change and may be changed by SAP atany time for any reason without notice. The information on this document is not a commitment, promise or legal obligation todeliver any material, code or functionality. This document is provided without a warranty of any kind, either express or implied,including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. Thisdocument is for informational purposes and may not be incorporated into a contract. SAP assumes no responsibility for errors oromissions in this document, except if such damages were caused by SAP intentionally or grossly negligent.All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materiallyfrom expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only asof their dates, and they should not be relied upon in making purchasing decisions.© 2012 SAP AG. All rights reserved. 32

×