The motivation of this paper comes from two growing trends. First, the cost and complexity of system management is leading numerous enterprises to offload their IT demands to hosting/data centers. Each hosting center operates thousands of servers and provide service to large number of applications to achieve performance service level agreement (SLA).
CSE 598B: Self-* SystemsMemory Resource Management inVMware ESX Serverby Carl A. WaldspurgerPresented by: Arjun R. Nath(slide material adapted from C. Waldspurger, and M. Behar)
2Summary of this PresentationWhat is VMware ESX server ?VirtualizationMemory management techniques employed byESX server– Ballooning– Memory sharing– Reclaiming idle memoryOther stuff – similar products, etc.
3ESX server overview Thin kernel designed to run VMs Multiplexes hardware resources - virtualizes the Intel IA-32architecture Manages system hardware for high-performance I/O Runs unmodified commodity operating systems
4Virtualization Virtualization enables the running of multipleoperating systems on a single machine Each Virtual Machine (VM) is isolated andprotected from each other - Illusion of dedicatedphysical machine Allows an abstraction of server workloads Motivation:– Take advantage of idle machine time– Easy to maintain and upgrade VMs
6Memory Virtualization Guest OS needs to see a zero-based memoryspace Terms:– Machine address -> Host hardware memory space– “Physical” address -> Virtual machine memoryspace
7Memory Virtualization Translation from MPN (machinepage numbers) to PPN (physicalpage numbers) is done thru apmap data structure for each VM Shadow page tables aremaintained for virtual-to-machine translations– Allows for fast direct VM to Hostaddress translations Easy remapping of PPN-to-MPNpossible transparent to VM
9Memory Reclamation Each VM gets a configurable max size of physicalmemory ESX must handle overcommitted memory per VM– ESX must choose which VM to revoke memoryfrom
10Memory Reclamation Traditional: add transparent swap layer– Requires meta-level page replacement decisions– Best data to guide decisions known only by guestOS– Guest and meta-level policies may clash Alternative: implicit cooperation– Coax guest into doing page replacement
11Ballooning – a neat trick! ESX must do the memoryreclamation with noinformation from VM OS ESX uses Ballooning toachieve this– A balloon module or driver isloaded into VM OS– The balloon works on pinnedphysical pages in the VM– “Inflating” the balloon reclaimsmemory– “Deflating” the balloonreleases the allocated pages
12Ballooning – a neat trick!ESX server can “coax” a guest OS into releasing some memoryExample of how Ballooning can be employed
13Ballooning - performanceThroughput of a Linux VM running dbench with 40 clients. The blackbars plot the performance when the VM is configured with main memorysizes ranging from 128 MB to 256 MB. The gray bars plot theperformance of the same VM configured with 256 MB, ballooned downto the specified size.
14Ballooning - limitations Ballooning is not available all the time: OS boot time, driverexplicitly disabled Ballooning does not respond fast enough for certain situations Guest OS might have limitations to upper bound on balloon sizeESX Server preferentially uses ballooning to reclaim memory.However, when ballooning is not possible or insufficient, thesystem falls back to a paging mechanism. Memory is reclaimedby paging out to an ESX Server swap area on disk, without anyguest involvement.
16Sharing Memory - Page Sharing ESX Server can exploit the redundancy of data andinstructions across several VMs– Multiple instances of the same guest OS share many of thesame applications and data– Sharing across VMs can reduce total memory usage– Sharing can also increase the level of over-commitmentavailable for the VMsRunning multiple OSs in VMs on the same machine mayresult in multiple copies of the same code and databeing used in the separate VMs. For example, severalVMs are running the same guest OS and have the sameapps or components loaded.
17Page Sharing ESX uses page content to implement sharing ESX does not need to modify guest OS to work ESX uses hashing to reduce scan comparisoncomplexity– A hash value is used to summarize page content– A hint entry is used to optimize not yet sharedpages– Hash table content have a COW (copy-on-write) tomake a private copy when they are written too
20Page Sharing - performance•Best-case. workload.•Identical Linux VMs.•SPEC95 benchmarks.•Lots of potential sharing.•Metrics•Total guest PPNs.•Shared PPNs →67%.•Saved MPNs →60%.•Effective sharing•Negligible overhead
21Page Sharing - performanceThis graph plots the metrics shown earlier as a percentageof aggregate VM memory. For large numbers of VMs,sharing approaches 67% and nearly 60% of all VM memoryis reclaimed.
22Page Sharing - performanceReal-World Page Sharing metrics from production deployments of ESX Server.(A) 10 Win NT VMs serving users at a Fortune 50 company, running a variety ofDBs (Oracle, SQL Server), web (IIS,Websphere), development (Java, VB), andother applications.(B) 9 Linux VMs serving a large user community for a nonprofit organization,executing a mix of web (Apache), mail (Majordomo, Postfix, POP/IMAP,MailArmor), and other servers.(C) 5 Linux VMs providing web proxy (Squid), mail (Postfix, RAV), and remoteaccess (ssh) services toVMware employees.
24Proportional allocation ESX allows proportional memory allocation forVMs– With maintained memory performance– With VM isolation– Admin configurable
25Proportional allocation Resource rights are distributed to clients throughshares– Clients with more shares get more resourcesrelative to the total resources in the system– In overloaded situations client allocation degradesgracefully– Proportional-share can be unfair, ESX uses an“idle memory tax” to overcome this
26Idle memory tax When memory is scarce, clients with idle pages will be penalizedcompared to more active ones The tax rate specifies the max number of idle pages that can bereallocated to active clients– When a idle paging client starts increasing its activity the pages canbe reallocated back to full share– Idle page cost: k = 1/(1 - tax_rate) with tax_rate: 0 < tax_rate < 1 ESX statically samples pages in each VM to estimate activememory usage ESX has a default tax rate of .75 ESX by default samples 100 pages every 30 seconds
27Idle memory taxExperiment:2 VMs, 256 MB, same shares.VM1: Windows boot+idle.VM2:Linux boot+dbench.Solid: usage, Dotted:active.Change tax rate 0% 75%After: high tax.Redistribute VM1→VM2.VM1 reduced to min size.VM2 throughput improves 30%
28Dynamic allocation ESX uses thresholds to dynamically allocatememory to VMs– ESX has 4 levels from high, soft, hard and low– The default levels are 6%, 4%, 2% and 1%– ESX can block a VM when levels are at low– Rapid state fluctuations are prevented bychanging back to higher level only after higherthreshold is significantly exceeded
29I/O page remapping IA-32 supports PAE to address up to 64GB ofmemory over a 36bit address space ESX can remap “hot” pages in high “physical”memory addresses to lower machine addresses
30Conclusion Key features– Flexible dynamic partitioning– Efficient support for overcommitted workloads Novel mechanisms– Ballooning leverages guest OS algorithms– Content-based page sharing– Proportional-sharing with idle memory tax
31Similar Products VM (IBM), very early, roots in System/360, ’64–’65 Bochs, open source emulator. Xen, open source VMM, requires changes toguest OS. SIMICS, full system simulator VirtualPC (Microsoft)
32Current status of ESX ServerC. Waldspurger’s Paper - 2002, Today 2005 Supports enterprise workloads in multi-processorvirtual machines. Resource controls for virtual machine CPU,memory, disk I/O, and network I/O usage.Supports SLA type guarantees Has “VMotion”: Migrate a running VM to adifferent physical server connected to the samestorage area network without service interruption