Xen and Co.: Communication-Aware CPU Management in ...

Uploaded on


  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • Consolidated Host Platform 整合式主機平台
  • Virtualization is a board term refers to the abstraction of computer resource: VM, Virtual Memory, Storage, Network, Desktop, … Co-location: Provision of space, bandwidth, and power in data center, with customer required to provide and manage the computer hardware.
  • Overhead: the ongoing operating costs of running.


  • 1. Xen and Co.: Communication-Aware CPU Management in Consolidated Xen-Based Hosting Platforms 報告人:倪丞頤
  • 2. Author
    • Sriram Govindan,
    • Jeonghwan Choi,
    • Arjun R. Nath,
    • Amitayu Das,
    • Bhuvan Urgaonkar, Member, IEEE,
    • and Anand Sivasubramaniam, Member, IEEE
  • 3. Abstract (VHPs)
    • Virtualization-based hosting platforms
    • Using this technology in the design of consolidated hosting platforms
    • Virtualization enables easier and faster application migration as well as secure co-location of antagonistic applications
  • 4. Abstract (shortcomings in VMMs)
    • Virtual Machine Monitors
    • Two shortcomings in VMMs
      • CPU schedulers that are agnostic to the communication behavior of modern, multi-tier applications
      • Inadequate or inaccurate mechanisms for accounting the CPU overheads of I/O virtualization
  • 5. Multi-tier Applications
  • 6. Goal
    • Achieve:
      • Communication-aware CPU scheduling algorithm
      • CPU usage accounting mechanism
    • Implement:
      • In Xen VMM
      • A prototype VHP on a cluster of 36 servers
    • Experiment:
      • Realistic Internet server
      • TPC-W benchmark
  • 7. The need for communication-aware CPU scheduling: A TPC-W benchmark
    • Two tier
    • A Jboss tier
      • Application logic and interacts with the clients
    • A Mysql-based database tier
      • Stores information about items
  • 8. CPU Usage: Isolated Servers
    • Each tier of this application was run on a separate physical server
  • 9. CPU Usage: Consolidated Server
    • Running this application with all its tiers consolidated on a single server running Xen along with five CPU-intensive applications.
  • 10. Client Response Times
    • Comparing the performance experienced by the clients of this application under the two scenarios.
  • 11. Analysis of Response time
    • Question:
      • Why did the performance degrade despite providing the same resource allocations?
    • Reason:
      • For applications with communicating components, providing enough CPU alone is not enough ── an equally important consideration is to provide CPU at the right time.
  • 12. Problem 1.
    • Can a server in a VHP schedule hosted VMs in a communication-aware manner to enable satisfactory application performance even under conditions of high consolidation, while still adhering to the high-level resource provisioning goals in a fair manner?
  • 13. The need for accurate accounting of the overheads of virtualization
    • In VHP, the unit of resource allocation and accounting for applications changes to VM
    • VMM is responsible for virtualizing I/O devices.
    • Problem:
      • Can a server in a VHP account for the overheads of virtualization and incorporate them into its CPU scheduling to provide fair resource allocations?
  • 14. Research Contributions
    • Develop a CPU scheduling algorithm for a VMM that incorporates the I/O behavior of the overlying VMs into its decision making.
    • Develop an algorithm that accurately and efficiently accounts the CPU overheads resulting from server virtualization and attributes these to the VMs they originate from.
    • Identify ways of implementing these algorithms in the state-of-the-art Xen VMM.
  • 15. Background
    • VMM
      • Virtualization of a server at the operating system level by a software layer
      • Virtualizes the resources of a physical server
      • Supports the execution of multiple VMs
      • Xen VMM
  • 16.  
  • 17. Domain
    • Domain0
      • Privileged VM
      • Implements the real device drivers
      • Does the translation between virtual and real I/O activity
    • Netfront driver
      • Guest domain implements a driver for its virtual NIC
    • Netback driver
      • Domain0 implements
      • between netfront drivers and the device driver for the physical NIC
  • 18. Network I/O virtualization in Xen. (a) Reception
  • 19. Network I/O virtualization in Xen. (b) transmission
  • 20. Hosting Model Assumes
    • Large cluster of high-end servers
    • High-bandwidth network
    • Connected to a consolidated high capacity storage
  • 21. Enhanced Hypervisor
    • A modified CPU scheduler to achieve communication-aware scheduling of domains
    • Mechanisms that measure and maintain relevant resource usage statistics pertaining to the overlying domains that are used by our scheduler
  • 22. Communication-Aware Scheduling In A VHP
    • The VMM provides a CPU scheduler that allows applications to specify guarantees on the CPU allocations
    • Xen hypervisor implements an algorithm called Simple Earliest-Deadline-First (SEDF)
  • 23. Simple Earliest-Deadline-First (SEDF)
    • Each domain specifies a pair (slice, period)
    • Asking for slice units of the CPU every period time units
    • Hypervisor ensures that the specified reservation can be provided
    • We assume that a domain is not admitted if its reservation cannot be satisfied
    • The residual CPU capacity is shared among the contending domains (including Domain0) in a round-robin fashion.
  • 24. New CPU Scheduling Algorithm
    • On top of SEDF
    • Attempts to preferentially schedule communication-sensitive domains over others
    • To reduce the aggregate scheduling-induced delay for the hosted domains while still providing guarantees on CPU allocations
  • 25. Classifying Scheduling-Induced Delays
    • Delay at the recipient
      • Network packet is copied from the reception-I/O-ring in Domain0 into its own address space
    • Delay at the sender
      • Before a domain sends a network packet, induced by the hypervisor scheduling other domains in between
  • 26. Classifying Scheduling-Induced Delays
    • Delay associated with the scheduling of Domain0
      • When Domain0 copies a packet, reception at the physical network interface, into the address space of the domain.
      • Between when a domain copies a packet into the transmit-I/O ring of Domain0 and when Domain0 gets scheduled next to actually send it over the network interface.
  • 27. Preferential Scheduling of Recipient
    • Devise a general approach that can choose between multiple recipient domains.
    • Pick the domain that has received the most number of packets
    • Not cause a domain receiving high-intensity traffic to starve other domains by the default SEDF
  • 28. Implementation Considerations
    • Bookkeeping Pages
      • Each domain is given a page that it shares with the hypervisor
      • We use these pages to maintain various I/O related
    • network-reception-intensity
      • One for each domain
      • Stored in the bookkeeping page of Domain0
      • Keeping track of the number of packets received and waiting within Domain0
  • 29. Algorithm: Bookkeeping at the Recipient
    • The netback driver figures out which domains have received packets
    • Uses the page-flip mechanism to copy pages
    • Uses the number of pages flipped with each domain as an indicator of the number of packets received by that domain
    • Increments the network-reception-intensity
    • The netfront driver of a recipient domain processes some packets received
    • Hypervisor decrements the network-reception-intensity
  • 30. Anticipatory Scheduling of Sender
    • Use a simple last-value-like prediction
    • Time instants Ttx and {Ttx_1}
    • The duration between the last two transmission operations {Ttx_1} and Ttx
    • A predictor of the duration over which the domain is likely to indulge in a transmission again, [Ttx, Ttx+(Ttx-{Ttx_1})]
    • Choose to schedule the one that is expected to transmit the most packets.
  • 31. Implementation Considerations
    • actual-network-transmit-intensity
      • In the bookkeeping page of each guest domain
    • anticipated-network-transmit-intensity
      • For each guest domain
      • In the bookkeeping page of Domain0
  • 32. Algorithm: Bookkeeping at the Sender
    • Step1:
      • For each of the network packet queued (waiting to be transmitted to the NIC) inside the guest domain, the Netfront driver increments the network intensity of that domain,
      • TransmittingDomain->actual-network-transmit-intensity++
    • Step2:
      • Upon deschedule of the guest domain, the Hypervisor increments the network intensity of the guest domain,
      • TransmittingDomain->anticipated-network-transmit-intensity += TransmittingDomain->actual-network-transmit-intensity.
  • 33. Scheduling of Domain0
    • By default, the Xen scheduler employs a high reservation of (15 msec, 20 msec) for domain0
    • Two kinds of packets
      • Packets written by guest domains to their virtual NICs
      • Packets received for delivery to domains and waiting in their reception-I/O-ring within Domain0
  • 34. A “Greedy” Aspect
    • Respect Reservations:
      • Scheduling D would not violate the CPU reservations of any of the domains.
    • Minimize Delays:
      • Scheduling D will help reduce the scheduling-induced delay for the most packets.
  • 35. Algorithm: Bookkeeping at Domain0
    • Step 1:
      • Every time a packet is received by the Network Interface
      • Card, the network interrupt handler inside the Hypervisor increments the network intensity of Domain0,
      • Domain0->network-reception-intensity++.
    • Step 2:
      • Upon deschedule of a guest domain, the Hypervisor increments the network intensity of Domain0,
      • Domain0->network-transmit-intensity += TransmittingDomain->actual-network-transmit-intensity.
  • 36. Communication-aware Scheduler
    • Picks the domain with the highest network intensity which is the sum of
      • network-reception-intensity and anticipated-networktransmit-intensity for guest domains
      • network-reception-intensity and network-transmit-intensity for Domain0
  • 37. Algorithm: Communication-aware Scheduler
    • Step1
      • The Hypervisor scheduler computes the network intensity of Domain0 as,
        • Domain0->network-reception-intensity + Domain0->network-transmit-intensity
      • and for the guest domains as,
        • GuestDomain->network-reception-intensity + GuestDomain->anticipated-network-transmit-intensity
    • Step2
      • Among the domains eligible to run within the current period (determined by SEDF fairness limits), the scheduler chooses to schedule the domain with the highest network intensity (with most number of pending network packets).
  • 38. Salient Features
    • Fairness issues
      • Guarantee that each domain receives at least ( slice, period)
      • Concerned with accounting of the CPU costs
    • Coordinated scheduling
  • 39. Accounting Of I/O Virtualization Overheads
    • Proportional-Share (PS)
      • Clearly result in a shrinkage of the CPU capacity available to an individual guest domain
    • Reservation-Based schedulers
      • Like SEDF
  • 40. Accounting Mechanism
    • Each guest domain to specify its CPU reservation including the virtualization costs of its I/O needs
    • Keep an accurate account of the time spent by Domain0 on behalf of each of the guest domains
    • I/O into four kinds:
      • network receptions/transmissions and disk reads/writes
  • 41. Implementation
  • 42. Experimental Setup
    • A rack of 36 servers
      • Dual Xeon 3.4 GHz CPUs with 2 MB of L1 cache, 800 MHz Front Side Bus, and 2 GB RAM
      • Gigabit Ethernet
      • Between 8 and 12 domains
    • Between 120 and 300 MB for each domain
    • Domain 0 is given 320 MB of RAM
  • 43. Server Applications
    • TPC-W-NYU
      • Three-tiered based on TPC-W benchmark
    • Scalar pentadiagonal solver (SP) benchmark and Lower-upper diagonal solver (LU) benchmark from NAS Parallel benchmark
    • Streaming media server
    • JBoss as middle tier
    • MySql for database tier
  • 44. Performance improvement for TPC-W
  • 45. Performance Improvement for NAS Parallel Benchmarks
    • LU benchmark is in much smaller packets (only 40 bytes) and may have to wait for additional packets. The scheduler results in an increased number of context switches.
  • 46. Performance and scalability improvement for streaming media server
  • 47. Fairness in CPU allocation for CPU-intensive domains consolidated with the streaming media server
  • 48. Evaluation of Accounting
    • Two main problems with the existing mechanism in Xen:
      • Nonreconciling the time that Domain0 spends on behalf of each hosted domain can cause a domain without significant I/O activity to unfairly suffer
      • The existing mechanism requires the administrator to explicitly specify the required reservation for Domain0