Your SlideShare is downloading. ×
Exploring Virtual Workspace Concepts in a Dynamic Universe ...
Exploring Virtual Workspace Concepts in a Dynamic Universe ...
Exploring Virtual Workspace Concepts in a Dynamic Universe ...
Exploring Virtual Workspace Concepts in a Dynamic Universe ...
Exploring Virtual Workspace Concepts in a Dynamic Universe ...
Exploring Virtual Workspace Concepts in a Dynamic Universe ...
Exploring Virtual Workspace Concepts in a Dynamic Universe ...
Exploring Virtual Workspace Concepts in a Dynamic Universe ...
Exploring Virtual Workspace Concepts in a Dynamic Universe ...
Exploring Virtual Workspace Concepts in a Dynamic Universe ...
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Exploring Virtual Workspace Concepts in a Dynamic Universe ...

471

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
471
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. Exploring Virtual Workspace Concepts in a Dynamic Universe for Condor Quinn Lewis ABSTRACT Virtualization offers a cost-effective and flexible way to use and manage computing resources. Such an abstraction is appealing in grid computing for better matching jobs (applications) to computational resources. This paper applies the virtual workspace concept introduced in the Globus Toolkit to the Condor workload management system. It allows existing computing resources to be dynamically provisioned at run-time by users based on application requirements instead of statically at design-time. INTRODUCTION A common goal of computer systems is to minimize cost while maximizing other criteria, such as performance, reliability, and scalability, to achieve the objectives of the user(s). In Grid computing, a scalable way to harness large amounts of computing power across various organizations is to amass several relatively inexpensive computing resources together. Coordinating these distributed and heterogeneous computing resources for the purposes of perhaps several users can be difficult. In such an environment, resource consumers have several varying, specific, and demanding requirements and preferences for how they would like their applications and services to leverage the resources made available by resource providers. Resource providers must ensure the resources meet a certain quality of service (e.g. make resources securely and consistently available to several concurrent users). In the past, control over the availability, quantity, and software configurations of resources has been limited to the resource provider. With virtualization, it becomes
  2. possible for resource providers to offer up more control of the resources to a user without sacrificing quality of service to other resource consumers. Users (resource consumers) can more easily create execution environments that meet the needs of their applications and jobs within the policies defined by the resource providers. Such a relationship, enabled by virtualization, is both cost-effective and flexible for the resource producer and consumer. [1] The virtual workspace term, initially coined in [2] for use with the Globus Toolkit, "is an abstraction of an execution environment that can be made dynamically available to authorized clients by using well-defined protocols". This execution environment can encompass several physical resources. Generically, this concept could be implemented in various ways; however, virtualization has proven itself to be a practicable implementation. [3] Condor, "is a specialized workload management system for compute-intensive jobs" [4]. Condor currently abstracts the resources of a single physical machine into virtual machines which can run multiple jobs at the same time [5]. A "universe" is used to statically describe the execution environment in which the jobs are expected to run. This approach assumes the resources (whether real or virtual) have to all be allocated in advance. While there is support for adding more resources to an existing pool via the Glide-in mechanism, the user still has to dedicate the use of these other physical resources. The purpose of this paper is to describe how a Condor execution environment (universe) can be dynamically created at run-time by users to more flexibly and cost- effectively use and manage existing resources using virtualization. Two of the unique
  3. implementation details described in this paper are the use of Microsoft Windows and Microsoft Virtual Server 2005 R2 for the virtual machine manager (VMM) on the host operating system (instead of being Linux-based using Xen or VMWare) and the use of differencing virtual hard disks. More details about virtual workspaces and similar attempts to virtualize Condor are described in Related Work. The implementation details of the work performed for a dynamic Condor universe are provided along with performance tests results. Future enhancements are included for making this work-in- progress more robust. RELATED WORK While virtualization has a number of applications for business computing and software development and testing, the work outlined in this paper most directly applies to technical computing, including Grid computing, clusters, and resource-scavenging systems. Grid Computing The use of virtualization in Grid computing has been proposed before, touting the benefits of legacy application support, improved security, and the ability to deploy computation independently of site administration. The challenges of dynamically creating and managing virtual machines are also described [6]. The virtual workspace concept [7] extended [6] to present "a unified abstraction" and address additional issues associated with the complexities of managing such an environment in the Grid. Two key differences between the Grid-related work mentioned and this paper is the emphasis on dynamically creating the execution environment at run-time and the (Microsoft) virtualization software employed.
  4. As mentioned previously, the Condor Glide-in mechanism works in conjunction with the Globus Toolkit to temporarily make Globus resources available to a user’s Condor pool. This has the advantage of being able to submit Condor jobs using Condor capabilities (matchmaking and scheduling) on Globus managed resources [8]. However, it is expected that the user acquire these remote resources before the jobs are executed. Using virtualization allows the existing “local” Condor resources to be leveraged as the jobs require. Clusters Many of the same motivations that exist for this work have also been applied to clusters [9, 10] but focus more on dynamically provisioning homogenous execution environments on resources. Although perhaps accommodated in the design of the Cluster-on-Demand [9], virtualization technology is not used in the implementation of the system. The resources are assumed to physically exist and the software is deployed by re-imaging the machine. In [10], virtualization is used to provision the software on the cluster(s) but the time required to stage in the virtual image(s) is costly. The use of the “differencing” virtual hard disk image type in this work offers a mitigating solution to this problem [11]. Condor Additional work with virtualization and Condor focuses on exploiting Condor’s “cycle stealing” capability at the University of Nebraska Lincoln to transform typical Windows campus machines into Unix-based machines required by researchers [12]. The solution leveraged coLinux to run a Condor compute node through a Windows device driver [13]. While some of the same motivation exists for this work, using a
  5. virtualization technology such as Virtual Server 2005 R2 allows other operating systems and versions to be used and provides more flexible ways to programmatically control the dynamic environment. IMPLEMENTATION We leverage Condor’s existing ability to schedule jobs, advertise resource availability, and match jobs to resources and introduce a flexible extension for dynamically describing, deploying, and using virtual execution resources in the Condor universe. In Condor, one or more machines (resources) along with jobs (resource requests) are part of a collection, known as a pool. The resources in the pool have one or more of the following roles: Central Manager, Execute, and/or Submit. The Central Manager collects information and negotiates how jobs are matched to available resources. Submit resources allow jobs to be submitted to the Condor pool through a description of the job and its requirements. Execute resources run jobs submitted by users after having been matched and negotiated by the Central Manager. [14] We extend the responsibilities of each of these three different roles to incorporate virtualization into Condor. Each Execute resource describes the extent to which it can be virtualized (to the Central Manager) and is responsible for hosting additional (virtual) resources. The Submit resource(s) takes a workflow of jobs and requirements and initiates the deployment of the virtual resources plus signals its usage (start/stop) to the host/execute machine. The Central Manager is responsible for storing virtual machine metadata used for scheduling. For this implementation, a single machine is used for the Central Manager, Submit, and Execute roles.
  6. The virtualization capabilities for a particular Execute resource can be published to the Central Manager via authorized use of condor_advertise. Attributes about the virtual Execute resources, such as the operating system (and version), available memory and disk space, and more specific data about the status of the virtual machine are included. Currently, the “host” Execute resource invokes condor_advertise for each “guest” or virtual Execute resource it anticipates hosting at start-up. This approach allows virtual resources to appear almost indistinguishable from real physical resources and will be included in Condor’s resource scheduling. Note that real resources are running while the virtual resources are not. They have only been described. Using the standard Condor tools, such as condor_status, users can view the resources (real and virtual) available in the pool. Users can then create workflows (using Windows Workflow Foundation [16]) for one or more jobs that intend to run on the provided resources. Since the virtual resource(s) may not be running when a job is submitted, the initial scheduling will fail. Fortunately, Condor provides a SOAP-based API for submitting and querying jobs [15]. Using this Condor API via workflows, unsuccessful job submissions can be checked for the intended attributes of the advertised machine to determine if the resource is a virtual machine and if it needs to be deployed, and/or if it needs to be started. The user can indicate specific job requirements in the workflow. These requirements can optionally specify the location of the files required to run the virtual machine for consumer flexibility (assuming the provider has allowed it). These files provide the operating system and necessary configuration (including Condor) for executing the job. The workflow is invoked by the Submit machine. If the virtual
  7. resource is specified by the workflow, the workflow manager on the Submit machine either transfers the virtual machine files to the Execute resource or provides the Execute resource with the location and protocol for retrieving the virtual machine files. (The automatic copying of virtual images was not completely implemented for this paper.) For performance, it is expected that host Execute machines have base virtual images local to the resource that provide the operating system and Condor. Additional software and configuration can be added by in a separate file that only stores the modified blocks from a parent hard disk (file), called differencing virtual disks. This provides a flexible balance, allowing resource providers to provide base images and giving resource consumers the ability to extend the base images. The workflow, running on the Submit machine, also provides the logic for starting the virtual resource on the host. Microsoft Virtual Server R2 provides an API for managing local and remote virtual machines. The workflow leverages this API for starting the virtual resources. For this paper, it assumes that virtual resources are started from a “cold” state. The result is that startup times are as long as a normal boot time for the respective operating system. PERFORMANCE TESTS AND MEASUREMENTS To test performance, a 2GHz AMD Athlon 64 processor with 1 GB of RAM running Windows XP was used as the Central Manager, Execute, and Submit role. Two virtual Execute machines, running Debian Linux 3.1 and Windows 2000, each with 128 MB RAM were created. A virtual network was created to allow communication between the three different operating systems, each running Condor.
  8. The MEME [17] bioinformatics application was used as the test job. Initially, a MEME job was submitted to the Condor pool using the standard Condor command-line tools (e.g. condor_submit). The test input and configuration options were used resulting in job submission, execution, and result times of less than one minute. Using Windows Workflow Foundation and Visual Studio, a graphical workflow was constructed that submitted the same MEME job to the cluster, specifically requesting a Windows 2000 or Linux resource. The same test input and configuration options took 6 to 8 minutes on average. Since the virtual machines are programmatically started only after an initial job schedule fails and are currently starting from a cold state, the start times include the setup and also reflects the time for the operating system to boot. There is also an unresolved issue with the (5 minute) cycle time between scheduling when using the Condor SOAP API [18]. Additionally, the Windows 2000 virtual machine was created as a base image (932 MB) with a differencing virtual disk that included Condor and other support software (684 MB). Since the differencing disks use a sector bitmap to indicate which sectors are within the current disk (1’s) or on the parent (0’s), the specification [11] suggests it may be possible to achieve performance improvements. It also lent itself well to compression. The 684 MB difference disk was compressed to 116 MB (using standard ZIP compression). This file could be transferred over a standard broadband Internet connection in 3.7 minutes (at 511.88 Kb/s) as opposed to 30 minutes. CONCLUSION AND FUTURE WORK A number of additional modifications are required for this solution to become more robust. For example, security was not considered. Also, the current times for
  9. executing short running jobs are not acceptable. Another improvement would be to start the virtual machines from a “hot” or paused state. Since the virtual machines used in this exercise were DHCP, the virtual machines would need to have static IPs or have additional knowledge of when the virtual machines are un-paused. The virtual hard disk(s) may be further compressed using a specific compression algorithm that takes the disk format into account. Performance considerations could also be given to differencing hard disks that are chained together for application extensibility purposes. This paper describes a mechanism for extending Condor to take advantage of virtualization to more flexibly (and cost-effectively) create an execution environment at run-time that balances the interests of the resource providers and consumers.
  10. REFERENCES 1. Keahey, K., Foster, I., Freeman, T., Zhang, X. Virtual Workspaces: Achieving Quality of Service and Quality of Life in the Grid. CCGRID 2006, Singapore, May 2006. 2. Keahey, K., Foster, I., Freeman, T., Zhang, X., Galron, D. Virtual Workspaces in the Grid. Europar 2005, Lisbon, Portugal, September, 2005. 3. http://workspace.globus.org/vm 4. http://www.cs.wisc.edu/condor/description.html 5. http://www.bo.infn.it/alice/alice-doc/mll-doc/condor/node4.html 6. Figueiredo, R., inda, P., Fortes, Jose. A Case For Grid Computing On Virtual Machines. 7. Keahey, K., Ripeanu, M., Doering, K. Dynamic Creation and Management of Runtime Environments in the Grid. 8. http://www.cs.wisc.edu/condor/CondorWeek2005/presentations/user_tutorial.ppt 9. Chase, J., Irwin, D., Grit, L., Moore, J., Sprenkle, S. Dynamic Virtual Clusters in a Grid Site Manager. 10. Zhang, X., Keahey, K., Foster, I., Freeman, T. Virtual Cluster Workspaces for Grid Applications. 11. Virtual Hard Disk Image Form Specification. October 11, 2006 – Version 1.0. Microsoft. 12. Sumanth, J. Running Condor in a Virtual Environment with coLinux. http://www.cs.wisc.edu/condor/CondorWeek2006/presentations/sumanth_condor _colinux.ppt 13. Santosa, M., Schaefer, A. Build a heterogeneous cluster with coLinux and openMosix. http://www-128.ibm.com/developerworks/linux/library/l- colinux/index.html 14. Condor Version 6.9.2 Manual. http://www.cs.wisc.edu/condor/manual/v6.9/ 15. http://www.cs.wisc.edu/condor/birdbath/ 16. http://wf.netfx3.com/content/WFIntro.aspx 17. MEME. http://meme.sdsc.edu 18. https://lists.cs.wisc.edu/archive/condor-users/2006-May/msg00296.shtml

×