Imp (distributed vmm)

547 views

Published on

cloud vmm

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Imp (distributed vmm)

  1. 1. The 9th International Conference for Young Computer Scientists DVMM: a Distributed VMM for Supporting Single System Image on Clusters Jinbing Peng Xiang Long Limin Xiao School of Computer Science & Engineering, Beihang University, Beijing pengjinbing@les.buaa.edu.cn long@les.buaa.edu.cn xiaolm@buaa.edu.cn Abstract Since the advantages from the two architectures are complementary mutually, how to obtain the two-sided Providing single system image (SSI) on clusters advantages is a certain idea. One way for the has ever been one of the hot topics in the research field combination with the advantages from the two of parallel computer architecture, since SSI supports architectures is to implement the image of shared easier programming and administration on clusters. memory architecture based on the hardware of Currently, most SSI studies focus on the middleware distributed memory architecture. Both DSM and SSI level of clusters, leading to some problems of poor on clusters are the typical practices. transparence, low performance and so on. This paper This paper presents a novel solution to provide presents a novel solution to provide SSI on clusters SSI on clusters using a DVMM with hardware-assisted using a distributed virtual machine monitor (DVMM) virtualization technologies. The rest of the paper is with hardware-assisted virtualization technologies. organized as follows. Section 2 describes the The DVMM contains some symmetrical and background of SSI and virtualization technologies as cooperative VMMs distributed on multi-node. The well as an introduction about relative works. Then, we cooperation among the VMMs virtualizes the describe the implementation details of the DVMM in distributed hardware resources to support SSI on a Section 3. Section 4 compares the DVMM with cluster. Thus, the DVMM can support an unmodified existing solutions. Finally, this paper ends up with a legacy operating system (OS) to run transparently on a concluding remark in section 5. cluster. Compared with the related work, our solution has some advantages of good transparence, high 2. Background performance and easy implementation. 2.1. Single system image Keywords : SSI, virtualization, hardware-assisted virtualization, VMM, DVMM. SSI means that all the distributed resources are organized to a uniform unit for users, users can not be 1. Introduction aware of the distributed attribute of the resources. SSI includes some attributes such as single memory space, Parallel computer architecture has been presenting single process space, single I/O space, and so on [2]. two development directions, one is the shared memory The SSI of a cluster can be implemented on the architecture represented by SMP (Symmetric hardware level, the underware level, the middleware Multiprocessor), the other is the distributed memory level and the application level. Currently there are few architecture represented by COW (Cluster of solutions on the hardware level; they are Enterprise X- Workstations)[1].The shared memory architecture Architecture [3], cc-NUMA [4] and DSM [5]. Special supports the shared memory programming mode, has chips or hardware are adopted in these solutions, so good programmability, but has poor scalability because that they have high cost and limited applications. The of some constrains, such as the bandwidth of the solutions on the underware level are also seldom. The shared memory. The distributed memory architecture representative solutions are MOSIX [6], Sun Solaris- uses the message passing programming mode, has poor MC [2]. SSI Implemented on this level has good programmability, however, it has strong scalability transparency for the users, but it is difficult to because of using the loosely coupled interconnection. implement. Current solutions on this level can only978-0-7695-3398-8/08 $25.00 © 2008 IEEE 183DOI 10.1109/ICYCS.2008.190
  2. 2. implement part attributes of SSI. There are many 2.3. Related worksolutions on the middleware level, the typical workinclude: distributed shared memory systems such as The essential of virtualization is to separate theIVY [7], the parallel and distributed file systems such software from the hardware through abstracting theas Lustre [8], the systems of resource management and physical resources. The goal of SSI is to hide theloads schedule such as LSF [9], the parallel distributed hardware environment of the cluster. Thus,programming environments such as MPI and PVM SSI can be implemented by virtualization.[10]. The SSI implemented on this level has poortransparency. There are seldom solutions on the 2.3.1. Virtual Multiprocessor. Virtual Multiprocessorapplication level; the representation is LVS [11]. [16]implements an 8-way shared memory virtual Therefore, the SSI of a cluster may be machine on a cluster of 8 PCs. The VMMs runs in theimplemented on the application level, the middleware application space with supports of the host OS. Para-level, the underware level and the hardware level. virtualization is used on the guest OS. The sharedFrom the top down, the difficulty to implement the SSI memory space is supported by the DSM; the virtualincreases, but the transparency for the users increase, processors are emulated by special processes; the I/Otoo. Currently most studies focus on the middleware virtualization is implemented through the cooperationlevel, leading to some problems, such as poor between the VMMs and the dedicated I/O sever. Thetransparence. There are seldom solutions on the disadvantages of this system are that VMMs on theapplication level, the underware level or the hardware application level lead to low performance and weaklevel, furthermore, the solutions on these levels have flexibility; para-virtualization needs to modify thepitfalls respectively, for example, the solutions on the guest OS and only the devices in the I/O server can behardware level have high cost, the solutions on the utilized, so it has limited application and it is difficultunderware level can not implement the SSI attributes to implement. Furthermore, Virtual Multiprocessor canroundly, and the solutions on the application level have not provide SSI on a SMP cluster.poor flexibility. 2.3.2. vNUMA. vNMUA [17] implements a 2-way2.2. Virtualization NUMA virtual machine on a cluster of two workstations; each one has an IA64 processor. The Virtualization means that computation and VMMs are implemented directly on the hardwareprocessing are done on the virtual base instead of the without host OS support. Pre-virtualization technologyreal base. A virtual platform can be constructed is used to modify the guest OS. The guest OS isbetween the hardware and the OS by means of compelled to run on the ring 1. The shared memory isvirtualization techniques for creating multiple domains supported by DSM. One node is the master node fromon one hardware platform, the domains are isolated which the system is set up. The disadvantages ofrespectively, and each domain can support the running vNMUA are that pre-virtualization needs to modify theof his OS and applications [12]. guest OS and degrading the privilege level of the guest Virtualization techniques can be differentiated to OS can bring the confusion of privileges. Also,full-virtualization, para-virtualization, pre- vNUMA can not provide SSI on a SMP cluster.virtualization and hardware-assisted virtualization[13][14]. Hardware-assisted virtualization is the most 3. Design and implementation of DVMMadvanced virtualization technology. VT-x [15] is ahardware-assisted virtualization technology for the IA- 3.1. Overview32 architecture. The contents of VT-x are listed asfollows. A new operation form, called VMX (Virtual The goal of DVMM is to hide the distributedMachine Extensions), is added to the processor. Two hardware attributes, provide SSI on a SMP cluster, andVMX transitions, VM entry and VM exit,are defined. support a single OS to run transparently on the cluster.A VMCS (Virtual-Machine Control Structure) and ten Therefore, three essential problems must be solved.new instructions used for controlling the VM are added Firstly, the distributed hardware configurations of theto the architecture. With the support of VT-x, the cluster can be detected and merged to form the globaldesign of VMM can be simplified, and full information. Secondly, the global hardware resourcesvirtualization can be implemented without using binary can be virtualized and presented to the OS. Thirdly, thetranslation. OS can manage, schedule and utilize the global resources just as on a single SMP machine. 184
  3. 3. For providing SSI on a cluster, a new layer named global virtual resources is reported to the OS, so thatDVMM is added between the OS and the cluster the OS is aware of the global virtual resources.hardware. The DVMM contains some symmetrical andcooperative VMMs distributed on the cluster. A single 3.2.2. Resource virtualization. ResourceOS supporting cc-NUMA architecture runs on the virtualization includes ISA virtualization, interruptDVMM. Using hardware-assisted virtualization mechanism virtualization, memory virtualization andtechnologies, the DVMM detects and merges the the I/O device virtualization. Unlike existingphysical resources of the cluster to form the global virtualization techniques, the virtualization techniqueinformation, virtualizes the whole physical resources, in this paper can implement the virtualization forand presents the virtual resources to the OS. The OS resources crossing the nodes.schedules and runs the processes, manages and The IA-32 ISA is virtualized through the VT-x;allocates the virtual resources. These actions by OS are the techniques are similar to that used in the HVM oftransparent to DVMM. The DVMM intercepts the Xen [18]. The interrupt mechanism is virtualized asoperations of accessing resources from the OS and follows. DVMM emulates interrupt controllers withhandles them on behalf of the OS, such as mapping the software, interferes the accesses from the OS to thevirtual resources to the physical resources and interrupt controllers, if the target interrupt controller ismanipulating the physical resources. In this way, it is in the native node, then DVMM manipulates theassured that the OS can be aware of the whole interrupt controller to reflect the guest’s operation; ifresources of the cluster as well as can manage and the target interrupt controller is in a remote node,utilize them. And the distributed attributes of the DVMM sends the access request to the target node, thehardware are hidden and the whole cluster is presented target VMM manipulates the virtual interruptto the OS as a cc-NUMA virtual machine. controller accordingly. DVMM catches the hardware interrupt, and the contents of the virtual interrupt3.2. Strategies controller are modified by the native VMM or by the remote VMM according to the node in which the Providing SSI on a cluster faces problems of interrupted object is, so that the interrupt can be showndetecting, presenting and utilizing the resources of the to the OS. Combine the techniques of Shadow Pagecluster. To solve these problems, our strategies are that Table (SPT) and software DSM to virtualize thedetecting the physical resources of each node during distributed memory resources. That is merging thethe startup of VMMs and integrating the physical memory resources of the cluster to a distributed sharedresources through communication among the VMMs; memory with the software DSM, and then virtualizingvirtualzing the physical resources and reporting them the distributed shared memory with SPT. The I/Oto the OS through hardware-assisted virtualization; operations are interfered by the VT-x, if the I/Omanaging and utilizing the physical resources of the operation will be processed on the native node, thecluster through the cooperation between the OS and the native VMM executes the interfered instruction andDVMM. The details of the strategies are as follows. returns the results to the OS, If the I/O operation will be done on a remote node, the I/O instruction is sent to3.2.1. Resource detection and merger. Emulates and the target VMM for executing, the results are sent backextends the BIOS to the eBIOS (Extended Basic to the native VMM, and then to the OS.Input/Output System). After the eBIOS acquires theinformation about the physical resources of native 3.2.3. Resource management and utilization. The OSnode, it communicates with the other nodes to manages and utilizes the virtual resources and thecollective the information about the physical resources DVMM manages and utilizes the physical resources.of whole cluster, and merges the information to form The OS interacts with the DVMM through the VMthe information of the global physical resources. Based entry and the VM exit [15]. Based on the virtualon the global physical resources, DVMM reserves resources, the OS schedules and runs the processes,some resources and virtualizes the remains. DVMM manages and allocates the virtual resourcesorganizes the virtual resources. This includes forming independently. This is transparent to the DVMM.various resources mapping tables, implementing the When the OS runs a sensitive instruction or a trap ormappings from the virtual resources to the physical interrupt occurs, the control is switched to the DVMMresources and from the physical resources to the nodes, by the VM exit. The DVMM handles the problemcreating the global virtual resources information table. according to the reason of VM exit, for example,Based on the virtual resources, the OS is set up, the allocating and manipulating various physical devices.calls for BIOS are captured, and the information of the After the DVMM handles the event for which the VM exit is triggered, the results and the control are returned 185
  4. 4. to the OS through the VM entry. Through the 32 ISA and cooperates with the interrupt virtualizationinteractions between the OS and the DVMM, the module so as to the OS can manage and schedule themanagement and utilization of the global physical virtual computing resources. The I/O virtualizationresources are implemented. module virtualizes the global I/O resources. The interrupt virtualization module virtualizes the interrupt3.3. Design and implementation control mechanism, notifies the interrupt event to the OS. The MMU virtualization module virtualizes the3.3.1. System architecture. The system architecture is memory resources and assures that the OS can runshown in figure 1. From the bottom up, the system correctly in the virtual physical address space. Thecontains hardware level, DVMM level and OS level. DSM module implements a distributed shared memoryThe hardware level contains some SMP nodes transparently. The communication module provides theinterconnected by the gigabit Ethernet, and the CPUs communication service for the cooperative VMMs.of the nodes can support VT-x. The DVMM levelcontains some symmetrical and cooperative VMMs 3.3.3. DVMM mechanism. The DVMM mechanism isdistributed on the nodes. The VMMs can communicate shown in figure 3.through the dedicated communication software. TheOS can be any one which supports the cc-NUMA. Thekey element for implementing this system is toconstruct the DVMM. Figure 3. DVMM mechanism Figure 1. System architecture The ISA virtualization module is the entry point3.3.2. DVMM structure. The DVMM is composed of as well as the exit point of the DVMM. This modulethe VMMs distributed on the nodes. The DVMM runs may invocate every other module of the VMM excepton the bare machines. The functions of the VMM are the communication module, and vice versa. When adetecting, integrating and virtualizing the physical VM exit occurs, this module analyzes the reason of theresources, reporting the virtual resources to the OS and VM exit and invocates appropriate module to handle.cooperating across the nodes. The structure of the When one module completes his duties, it invocatesDVMM is shown in the figure 2. this module to return to the guest system. The communication module is the base of the cooperation among the VMMs. This module may invocate every other module of the VMM except the ISA virtualization module, and vice versa. The eBIOS module is used only during the initialization of the DVMM and the setting up of the OS. Firstly, the eBIOS module invocates the interrupt virtualization module, the I/O virtualization module and the Figure 2. DVMM structure communication module to detect and build the resource information of the whole system. Secondly, The initialization module loads and runs the while the ISA virtualization module captures the callsVMM. The eBIOS module detects and integrates the to the BIOS during setting up the OS, the eBIOSresource information of the cluster and reports it to the module returns the information about the virtualOS. The ISA virtualization module virtualizes the IA- resources of the whole system to the OS. The I/O 186
  5. 5. virtualization module receives instructions from the invocates the DSM module to get the page. InvocatedISA virtualization module, according to the node on by the MMU virtualization module, the DSM modulewhich the I/O request should be done, it may execute requests the page from the remote node, whilethe I/O instruction or invocate the communication invocated by the communication module it serves themodule to send the I/O request to the target node. request and sends the result to the remote node.When the I/O virtualization module receives an I/O Through the cooperation among the modules ofrequest from a remote node, it manipulates the native the DVMM, based on resource virtualization, the SSII/O device and sends the result to the source node. The of the SMP cluster is implemented.interrupt virtualization module is invocated by the ISAvirtualization module to emulate the operation to the 4. Discussionvirtual interrupt controller by the OS; on the otherhand, it converts the external interrupt vectors to the There are many existing solutions for providingvirtual interrupt vectors and injects a virtual interrupt SSI on clusters. Few of them are based onto the OS. While the ISA virtualization module virtualization techniques, and the others are not. Tocaptures a sensitive instruction or a trap related to distinguish the features of our solution, we compare itMMU, it invocates the MMU virtualization module to with the existing solutions as follows.handle it. When the MMU virtualization module findsthat the requested page is not in the native node, it Table 1.Comparison among DVMM, Virtual Multiprocessor and vNUMA Level Technique Difficulty Transparence Symmetry Performance SMP ISA Supporting Virtual Application Para-virtualization High Poor No Low No IA-32 Multiprocessor Level vNUMA Underware Pre-virtualization High Good No Moderation No IA64 Level DVMM Underware Hardware-assisted Moderation Good Yes High Yes IA-32 Level Virtualization so the DVMM is more transparent than the Virtual Known from the table 1, the DVMM has Multiprocessor; because the IA-32 is used more widelyadvantages to the Virtual Multiprocessor and the than the IA64, the DVMM has wider application andvNUMA. Firstly, the DVMM can implement SSI on higher utilization value than the vNUMA.SMP clusters, while the Virtual Multiprocessor and the Compared with the existing solutions mentionedvNUMA can not. Secondly, the DVMM utilizes in section 2.1, the DVMM also has advantages. Firstly,hardware-assisted virtualization technology to the DVMM does not demand special hardware, so itimplement full virtualization, need not to modify the has lower cost and wider application than the solutionsguest OS, so that it has moderate difficulty to design on the hardware level. Secondly, the DVMM canand implement. While the Virtual Multiprocessor and implement full attributes of SSI, while the solutions onthe vNUMA adopt para-virtualization and pre- the firmware level can only implement part attributesvirtualization respectively, both of them need to of SSI, so the DVMM has higher utilization value.modify the guest OS, so they have high difficulty to Thirdly, the DVMM has better transparence and higherimplement and have limited applications. Thirdly, the performance than the solutions on the middlewareDVMM is implemented based on assistance of the level. Finally, the DVMM has better flexibility andhardware, and runs on the metal, so it has high higher performance than the solutions on theperformance. While both the Virtual Multiprocessor application level.and the vNUMA are implemented by software, so thatthey have low performance, further more the Virtual 5. Conclusions and future workMultiprocessor is implemented at the application level,it must pass through several software layers leading to The DVMM implements the SSI of clusters on thelower performance. Finally, the nodes of the DVMM underware level based on the hardware-assistedare full symmetrical, while the nodes of the Virtual virtualization technologies, so it can support anMultiprocessor and the vNUMA are not symmetrical, unmodified legacy OS to run transparently on a cluster.one of them is the master node. Besides, the DVMM is Compared with the existing solutions for implementingimplemented at the underware level, while the Virtual the SSI of clusters, the DVMM has some advantages.Multiprocessor is implemented at the application level, 187
  6. 6. There are still further improvements to be made: Canada [OL]. http://www.LinuxVirtualServer.org/.firstly, using the most advanced VT-d [19] and [12] James E.Smith, Ravi Nair. Virtual Machines: VersatileEPT(Extended Page Tables) [20] techniques to reduce Platforms for Systems and Processes. ELSEVIER,the implementing difficulty and adopting the processor 2006. [13] VMware. Understanding Full Virtualization,consistency model instead of the sequential Paravirtualization, and Hardware Assist. 2007.consistency model for higher performance; secondly, [OL].http://www.vmware.com/files/pdf/VMware_paradetecting the physical resources dynamically to support virtualization.pdfthe dynamic change of the number of the nodes; [14] Joshua, LeVasseur, et al. Pre-Virtualization: Slashingthirdly, adding the functions of resource management the Cost of Virtualization[OL].and load schedule to the DVMM for supporting http://l4ka.org/publications/2005/previrtualization-multiple guest OS running transparently and separately techreport.pdf www.l4ka.org. 2005.on a cluster. [15] Intel. Intel® 64 and IA-32 Architectures Software Developer’s Manual. Vol. 3:System Programming Guide. 2007.Acknowledgment [16] Kenji Kaneda, Yoshihiro Oyama, and Akinori Yonezawa. A Virtual Machine Monitor for Providing a This work is supported by Hi-tech Research and Single System Image (in Japanese). In Proceedings ofDevelopment Program of China (863 Program, No. the 17th IPSJ Computer System Symposium (ComSys2006AA01Z108). ’05), pages 3–12, November 2005. [17] M. Chapman and G. Heiser. Implementing transparent shared memory on clusters using virtual machines. InReferences USENIX Annual Technical Conference, Anaheim, CA, USA, Apr. 2005.[1] Culler D E, Singh J P, Gupta A. Parallel computer [18] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, architecture — a hardware/software approach. China A. Ho, R. Neugebauer, I. Pratt, and A. War_eld. Xen Machine Press. 1999 and the Art of Virtualization. In Proceedings of the[2] Rajkumar Buyya, Toni Cortes, Hai Jin. Single System 19th ACM SOSP, pages 164.177, October 2003. Image (SSI). The International Journal of High [19] Intel. Intel® Virtualization Technology for Directed I/O Performance Computing Applications, Volume 15, No. [OL]. 2, Summer 2001, pp. 124-135 http://www.intel.com/technology/itj/2006/v10i3/2-io/7-[3] IBM Enterprise X-Architecture Technology [OL]. conclusion.htm. http://www.unitech- [20] Gil Neiger. Intel Virtualization Technology: Hardware ie.com/ole/doc_library/xArchitecture%20technology% Support for Efficient Processor Virtualization. Intel 202.pdf Technology Journal, Vol. 10, Issue 3, 2006.[4] B. C. Brock, G. D. Carpenter, et al. Experience with building a commodity Intel-based ccNUMA system. IBM J. Res. & Dev. Vol. 45 No. 2 March 2001[5] Ayal, Itzkovitz and Assaf, Schuster. Distributed Shared Memory: Bridging the Granularity Gap. 1999. In Proceedings of the 1st Workshop on Software Distributed Shared Memory.[6] L. Amar, A. Barak, and A. Shiloh, The MOSIX Direct File System Access Method for Supporting Scalable Cluster File Systems. Cluster Comput-ing, 7(2), pp. 141-150, 2004.[7] Li, Kai and PAUL, HUDAK. Memory Coherence in Shared Virtual Memory Systems. ACM Transactions on Computer Systems (TOCS) . 1989, Vol. 7, ISSN:0734-2071, pp. 321-359.[8] SUN Corp. Lustre File System [OL]. http://www.sun.com/software/products/lustre/[9] Platform Corp. LSF Reference [OL]. http://support.sas.com/rnd/scalability/platform/lsf_ref_ 6.0.pdf[10] Geist, A., and Sunderam, V. 1990.PVM: A framework for parallel distributed computing. Journal of Concurrency: Practice and Experience [OL]. http://www.epm.ornl.gov/pvm/.[11] Zhang, W. 2000. Linux virtual servers for scalable network services.Ottawa Linux Symposium 2000, 188

×