Live Migration of Direct-Access Devices


Published on

WIOV 2008 talk

How to perform live migration of VMs which have directly attached pass-through devices?

Published in: Science, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Don’t repeat if already introduced.Thank youHi, I’m Asim. I will talk about how to perform live migration of virtual machines which access device directly.- Look straight
  • Let us start by looking at live migration. Live migration is the process of migrating guest virtual machines across different hosts without any perceptible downtime to applications. For I/O connections this means that all I/O connections should remain active and accessible to the applications. There should be no quesicing of the device stateOr no shutting down of the device being accessed by the applicationsIt is used to Consolidate virtual machines and save power. Online maintainance of hosts. – expand This is very useful and no.1 requested feature in the Microsoft Hypervisor. To maintain active connections, the current implementation of Live migration relies on hypervisor mediation to maintain connections to I/O devices because the guest OS is unaware of the migration and hypervisor ensures active For example, for storage, the storage is shared and accessible to the source and destination physical host. For virtual devices, like NICs, a virtual interface is avaiable to the guests by the hypervisor.
  • Consider a live migration scenario with a virtual device, If we look at the migration process, what actually happens is the guest driver is detached from the virtual device, the actual migration happens, and the device is re-attached to the virtual device at the destination host by the hypervisor. The hypervisor ensures that the virtual device is in a correct state during migration and tells the outside world that a migration happened. (For e.g. for a network device, this may be a change of MAC address.) It is the responsibility of the hypervisor to mainntain active connections and it can easily do it since it has continuous availability to the device state. Clearly, there is no need to shutdown the the virtual device. Also, there is no need to configure the the hardware device at the destination, which is already up and running.For heterogeneous devices, there is no problem as the hypervisor abstracts the real device from the guest OS.FIX ALIGNMENT
  • Let us look at virtual I/O more closely; in current implementations of virtual I/O in modern hypervisors like Xen and vmware the guest operating systems only run the virtual driver. Any I/O from the guest OS is sent to the virtual driver which is trapped by the hypervisor. The hypervisor then actually routes this request to the real driver.The real driver actually executes inside the physical host or inside a managed domain run by the hypervisor. As a result, the hypervisor mediates all I/O and has constant access to the state of the device.As a result with all the trapping, the performance using this form of I/O is moderate and there is relatively high CPU utilization with moderate throughput. However, this architecture also has many benefits. The Guest OS is completely independent of the hardware and it need not worry about having the correct driver. It also aids device sharing as a single physical device can be shared across different guest virtual machines. Also, it supports live migration because hypervisor can transperently connect to a different virtual device without any interruption in I/O
  • To solve the problem of performance, there is another form of I/O called as direct-i/o, also known as pass-through I/O. In this form of I/O the real drivers actually run inside the guest OS. All I/O requests directly go from the guest OS to the real physical device. As a result, because there is no hypervisor trapping, one gets near native performance. However, since the hypervisor is completely unaware of the device state, there is no live migration possible.
  • We now look at a migration scenario using direct I/O. Consider the source physical host where the guest OS is directly accessing the physical hardware device. The hypervisor is by no means involved in this I/O. When migration happens, at the destination host, there is no one to connect the real driver to the new hardware device. The hardware device also has no idea that a migration just occurred. Also, the device is already running and it cannot simply continue to work with the unconfigured hardware.Nobody tells migration happened to the device.hetrogeneous devices, the old real driver is irrelvant.
  • Guest OS drivers inconsistent with the device after migration.Hypervisor does not mediate I/O – unable to maintain active connections. Driver running in source becomes irrelevant the destination.-1. unacceptable downtime.-2.may not have two different devices, no performance guarantee-3. simpler solution exists without modifications to the whole stack – not compatible with existing devices/drivers.
  • In this talk, I will talk about how to get the performance of direct io with the flexibility of live migration. The solution we propose is to use shadow drivers to capture the device state and use them to transparently re-attach the driver to the device post migration.We also claim that our solution is generic and independent to the driver/device, supports heterogeneous devices and has low performance overhead.//Explain shadow drivers : - kernel agent runs inside guest VM that mediates I/O
  • Here is the outline of my talk, I will now talk about the architecture of our solution followed by a description of the prototype implementation. I will then describe the evaluation of our solution before I conclude.
  • Before we get into the architecture of our design , We have certain goals for live migration to support direct device access. First, since migration is a relatively rare event, there should be minimal cost of our solution in a steady state when there is no migration taking place. Secondly, the latency of migration should be as close as possible to migration using virtual devices. Third, our code should not delay the migration in any way.The solution we propose adopts a para-virtualization approach and introduces an agent in the guest OS to manage migration. We use shadow driver as this agent mediate migration inside guest OS. Now we look at what are shadow drivers.s- Migration is a rare event. No migration, no cost.As close to virtual migrationShadow driver should not delay migration, IMP Solution: Introduce an agent in the guest operating system to cooperate with the hypervisor to perform migration.
  • Kernel agent that mediates all communication between the kernel and driver. It is used to provide online recovery from driver. Shadow drivers are class drivers and have a single implementation for a class of drivers. The implementation of shadow drivers is independent of the device driver itself and is only dependedent to the kernel interface to the device.
  • Now we look at the operation of shadow drivers. Shadow drivers operate in two modes. The first is the normal operation where the shadow driver does three things. The first is to intercept calls and the results of these calls between the driver and the kernel using taps. We use this interception technique to track objects allocated by the driver.Intercept: shadow driver and see the inputs and result to the call.Taps in normalTracker: : Intercept all communication between driver and kernel. Free all resources w/o executing driver code. For e.g. network device structure, packets. Any point can unload the driver.Logs->Log state-changing actions. For e.g. configuration requests.bring back to original state. Taps ->Proxy as driver to kernel to conceal failure transparent to application ( Don’t allow application to see failure)Strikingly similar to what is required for a migration agentTODO: Change Post-failure to Recovery
  • To bring it to its pre-migration stateSame as recovery.Shadow driver performs all actions that hypervisor does. Recap shadow driver does what hypervisors do.- Explain in device independent way. No need to change device/driver. (Meets the goals).
  • The two additional contributions of shadow drivers are transperency and current state preservation. During reloading the correct driver, the proxy relays all I/O requests to the shadow driver which can give an illusion to the application that the device is up.To preserve the state of the driver, the driver only stores the current absolute state. The size of the log hence remains constant throughout the operation of the device.We need another slide on shadow driver contributionsWhat taps are. How they switch behavior during migration to prevent OS/apps from seeing driver restartHow manage log: don’t log everything, but only keep entries affecting current state. E.g. if change mode to 100 mb then to 1 gb, can forget 100mb setting. So log size is proportional to configuration state of driver
  • We now look at a sample migration scenario for direct access devices using shadow drivers.Add tracker and log at start.Show bothFirst tracker goes then log goes
  • Say run- completely unmodified linux device drivers.
  • Xen blocks migration . TODO : Picture with real PCI bus, xenprohibhits when there is virtual PCI bus We block unloading of driver on PCI unload so that upper levels of network stack so that device does not disappear from kernel perspectiveNotify time to run shadow driver mechanism.TODO : Read about Xen hypervisor and its workingsExplain Xen bus hereTODO: Remove Figure
  • Shadow driver , nooks isolation removed from 2.4 to 2.6Object trackers => when does it forget? What objects we track, net device structure.interrupts, packets, Taps => wrappers around kernel driver interfaceLog => what are functions log, ioctls, etc., sending a packetTalk about : Recovery code =>module_init, open netdev, timer interrupt goes off.For recovery of network devices, call module init, open the network device
  • We now look at what do we have to evaluate. We first look at the cost at steady state when not migration is happening because migration is a rare event. We also look at the latency of the migration process.
  • This is the platform we used for our evaluation. We used couple of desktop class machines with two different direct access devices. For our tests we used netperf to measure throughput, there were no applications running inside the VM while migration was happening. For liveness tests, we used a third physical host to send packets to the migrating IP address.
  • We first look at the throughput of the system at steady state where there is no migration happening. We first look at virtual I/O, where we get moderate performance, Using direct I/O we get near native performance. Hit ENTER . Using our shadow driver implementation, we get performance within 1% of the original. Shadow drivers have a very minimal performance impact when there is no migration taking place.
  • We know look at CPU utilization at steady state using virtual I/O, direct I/O and direct I/O with SD.Why CPU is so in virtual – trapping hypervisorCost of taps CPU util is higherTaps for all calls, state of inflight packet
  • We now look at the latency of migration in terms of migration time line to determine the time spent in the migration process. Before time t=0 migration has already started, and we only consider the time during which the guest VM is unavailable.Picture at top what is happening.Talk about all steps.Say e1000.Driver Initialization => Unmodified drivers => Cold boot => Lot of probing and the hardwareDrivers written to start faster
  • To conclude, in this talk, I described how to use shadow drivers as an agent the perform live migrations of VMs performing direct access. Our design supports hetrogeneous devices and requires no driver or hardware changes. There is minimal performance overhead at steady state and a small latency during migration. We demonstrated a prototype implementation using network devices inside a linux VM. However, our design is portable to other devices, OS and other hypervisors.
  • Picture at top what is happening.Talk about all steps.Say e1000.Driver Initialization => Unmodified drivers => Cold boot => Lot of probing and the hardware
  • Talk entire migraiton process, for network driver.
  • Live Migration of Direct-Access Devices

    1. 1. Live Migration of Direct-Access Devices Asim Kadav and Michael M. Swift University of Wisconsin - Madison
    2. 2. Live Migration • Migrating VM across different hosts without noticeable downtime • Uses of Live Migration – Reducing energy consumption by hardware consolidation – Perform non-disruptive hardware maintenance • Relies on hypervisor mediation to maintain connections to I/O devices – Shared storage – Virtual NICs 2
    3. 3. Live Migration with virtual I/O Hardware Device Hypervisor + OS Guest OS Virtual Driver Hardware Device Hypervisor + OS VMM abstracts the hardware Device state available to hypervisor. 3 Source Host Destination Host
    4. 4. Access to I/O Devices Physical Hardware Device Guest OS Virtual Driver HypervisorHypervisor + OS Virtual I/O Architecture Virtual I/O • Guest accesses virtual devices • Drivers run outside guest OS • Hypervisor mediates all I/O  Moderate performance  Device independence  Device sharing  Enables migration 4
    5. 5. Direct access to I/O Devices Hypervisor Guest OS Direct I/O Architecture (Pass-through I/O) Direct I/O • Drivers run in Guest OS • Guest directly accesses device  Near native performance  No migration Real Driver Physical Hardware Device 5
    6. 6. Live Migration with Direct I/O Hypervisor + OS Hypervisor + OS Device State lost No Heterogeneous Devices 6 Guest OS Real Driver Source Host Destination Host Hardware Device Hardware Device
    7. 7. Live migration with Direct I/O • Why not both performance and migration? – Hypervisor unaware of device state – Heterogeneous devices/drivers at source and destination • Existing Solutions : – Detach device interface and perform migration [Xen 3.3] – Detach device and divert traffic to virtual I/O [Zhai OLS 08] – Modify driver and device [Varley (Intel) ISSOO3 08] 7
    8. 8. Overview • Problem – Direct I/O provides native throughput – Live migration with direct I/O is broken • Solution – Shadow drivers in guest OS capture device/driver state – Transparently re-attach driver after migration • Benefits – Requires no modifications to the driver or the device – Supports migration of/to different devices – Causes minimal performance overhead 8
    9. 9. Outline • Introduction • Architecture • Implementation • Evaluation • Conclusions 9
    10. 10. Architecture • Goals for Live Migration – Low performance cost when not migrating – Minimal downtime during migration – No activity executing in guest pre-migration • Our Solution – Introduce agent in guest OS to manage migration – Leverage shadow drivers as the agent [Swift OSDI04] 10
    11. 11. Shadow Drivers • Kernel agent that monitors the state of the driver • Recovers from driver failures • Driver independent • One implementation per device type Guest OS Kernel Shadow Driver Device Driver Device 11 Taps
    12. 12. Shadow Driver Operation Guest OS Kernel Shadow Driver Device Driver Device • Normal Operation - Intercept Calls - Track shared objects - Log state changing operations • Recovery - Proxy to kernel - Release old objects - Restart driver - Replay log 12 Taps LogTracker
    13. 13. Shadow Drivers for Migration • Pre Migration – Record driver/device state in driver-independent way – Shadow driver logs state changing operations • configuration requests, outstanding packets • Post Migration – Unload old driver – Start new driver – Replay log to configure driver 13
    14. 14. Shadow Drivers for Migration • Transparency – Taps route all I/O requests to the shadow driver – Shadow driver can give an illusion that the device is up • State Preservation – Always store only the absolute current state – No history of changes maintained – Log size only dependent on current state of the driver 14
    15. 15. Migration with shadow drivers Guest OS Kernel Shadow Driver Source Network Driver Source Network Card Destination Network Driver Dest. Network Card Taps LogTracker Source Hypervisor + OS Destination Hypervisor + OS 15 Source Network Driver
    16. 16. Outline • Introduction • Overview • Architecture • Implementation • Evaluation • Conclusions 16
    17. 17. Implementation • Prototype Implementation – VMM: Xen 3.2 hypervisor – Guest VM based on linux- kernel • New code: Shadow Driver implementation inside guest OS • Changed code: Xen hypervisor to enable migration • Unchanged code: Device drivers 17
    18. 18. Changes to Xen Hypervisor • Migration code modified to allow – Allow migration of PCI devices – Unmap I/O memory mapped at the source – Detach virtual PCI bus just before VM suspension at source and reconnect virtual PCI bus at the destination – Added ability to migrate between different devices 18 Xen Hypervisor Guest VM PCI Bus
    19. 19. Modifications to the Guest OS • Ported shadow drivers to kernel – Taps – Object tracker – Log • Implemented shadow driver for network devices – Proxy by temporarily disabling device – Log ioctl calls, multicast address – Recovery code 19
    20. 20. Outline • Introduction • Architecture • Implementation • Evaluation • Conclusions 20
    21. 21. Evaluation 1. Cost when not migrating 2. Latency of Migration 21
    22. 22. Evaluation Platform • Host machines – 2.2GHz AMD machines – 1 GB Memory • Direct Access Devices – Intel Pro/1000 gigabit Ethernet NIC – NVIDIA MCP55 Pro gigabit NIC • Tests – Netperf on local network – Migration with no applications inside VM – Liveness tests from a third physical host 22
    23. 23. Throughput 698 706.2 773 940.5 769 937.7 0 200 400 600 800 1000 Intel Pro/1000 NVIDIA MCP55 Virtual I/O Direct I/O Direct I/O (SD) ThroughputinMbits/second 23 Direct I/O is 33% better than virtual I/O. Direct I/O with shadow throughput within 1% of original
    24. 24. CPU Utilization 13.80% 17.70% 2.80% 8.03% 4.00% 9.16% 0% 5% 10% 15% 20% Intel Pro/1000 NVIDIA MCP55 Virtual I/O Direct I/O Direct I/O (SD) CPUUtilizationintheguestVM 24 Direct I/O with shadow CPU utilization within 1% of original
    25. 25. Migration Time 1 2 3 4 Events Timeline PCI Unplug VM Migration PCI Replug Shadow Recovery Driver Uptime ARP Sent Time (in seconds) Significant migration latency(56%) is in driver uptime. 25 G H1 H2 D Total Downtime = 3.97 seconds Events TimeLine
    26. 26. Conclusions • Shadow Drivers are an agent to perform live migration of VMs performing direct access • Supports heterogeneous devices • Requires no driver or hardware changes • Minimal performance overhead and latency during migration • Portable to other devices, OS and hypervisors 26
    27. 27. Questions 27 Contact : {kadav, swift} More details:
    28. 28. Backup Slides 28
    29. 29. Complexity of Implementation • ~19000 LOCs • Bulk of this (~70%) are wrappers around functions. – Can be automatically generated by scripts 29
    30. 30. Migration Time 1 2 3 4 EventsTimeline PCI Unplug VM Migration PCI Replug Shadow Recovery Driver Uptime ARP Sent Time (in seconds) Total downtime = 3.97 seconds Significant migration latency(56%) is in driver uptime. 30 G H1 H2 D
    31. 31. Migration with shadow drivers Guest OS Kernel Shadow Driver Source Network Driver Source Network Card Dest. Network Driver Destination Network Card Taps 31 LogTracker Source Hypervisor + OS Destination Hypervisor + OS
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.