Optimizing the Virtual Data Center


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Optimizing the Virtual Data Center

  1. 1. DATA CENTER ENVIRONMENT Optimizing the Virtual Data Center The ideal virtual data center dynamically balances workloads across a computing clus- ter and redistributes hardware resources among clusters in response to changing needs. The challenge is to implement these load-balancing and resource-balancing features so that they are transparent to client applications. BY J. CRAIG LOWERY, PH.D. A dministrators have greatly improved the enterprise infrastructure by making use of high-availability and high-performance computing (HPC) cluster systems that By managing hardware total number of units deployed, moving idle units automatically from as pools of similar, easily provide fault tolerance and load balancing. However, these cluster to cluster as improvements in robustness generally increase cost, so demand dictates. Should data center managers are now beginning to focus on “relocated” components, total demand exceed reducing those costs. Because clustered systems are sized available resources, appli- to accommodate peak rather than average demand, data the virtual data center cations can be prioritized center managers have decided that underutilized nodes to ensure that critical sys- in lightly loaded clusters are wasteful, and suspect they can optimize real-time, tems do not starve. If are the key to improving the bottom line. increased demand per- Typically, several application clusters exist in a data dynamic resource sists, administrators can center. Instead of permanently sizing clusters for peak boost capacity simply by loads, managers should be able to move servers and allocation purely connecting new units to storage among application clusters in response to the the existing infrastructure. current demand for each application. Because all appli- These concepts form the through software. cations are unlikely to experience peak demand simulta- basis for the virtual data neously, managers can save money by reducing the center.1 1Formore information, see “Building the Virtual Data Center” by J. Craig Lowery, Ph.D., in Dell Power Solutions, February 2003; and “Managing the Virtual Data Center” by J. Craig Lowery, Ph.D., in Dell Power Solutions, August 2003. © Copyright 2003 by Dell Inc. All Rights Reserved. Preprinted from the November 2003 issue of Dell Power Solutions.
  2. 2. DATA CENTER ENVIRONMENT Cluster A Cluster B = Utilization Before reallocation 1 Cluster A is underprovisioned. Cluster B is overprovisioned. Preparing to remove 2 Cluster B vacates the target node and redistributes its load. Remove The target node is removed from 3 Cluster B. Remove Add The target node is added to Cluster A. Add 4 After reallocation Cluster A redistributes its load to 5 include the new node. Both clusters are now appropriately provisioned. Figure 1. Load balancing and resource reallocation The statistical multiplexing of hardware across application clus- Load balancing occurs within clusters; resource balancing ters is at the core of the virtual data center concept. In the virtual occurs between clusters. Achieving synergy in both workload dis- data center, administrators physically configure hardware only once; tribution and resource utilization is the main goal of the virtual data software creates logical associations among hardware components center. Although simple in concept, the implementation is nontrivial. as needed. For example, virtual LANs (VLANs) can be configured A centrally acting controller that orchestrates operations across through software that resides on a network switch. By managing the entire data center could ideally make use of three load-balancing hardware as pools of similar, easily “relocated” components, the and resource-balancing mechanisms: redistribute work, remove virtual data center can optimize real-time, dynamic resource allo- node, and add node.2 For example, Figure 1 shows how the virtual cation purely through software. data center could balance loads and resources between two clus- ters running different clustering software, as follows: Balancing loads and resources Two mechanisms work in tandem to achieve maximum perfor- 1. Before reallocation. Cluster A is experiencing heavy demand mance and optimal resource utilization in the virtual data center: and is nearing the saturation point; it is underprovisioned because it requires additional hardware resources. Cluster B • Load balancing: The ability to redistribute work across the has spare capacity; it is overprovisioned because it has an nodes in a cluster, commensurate with each node’s process- abundance of hardware resources. ing capacity 2. Preparing to remove. An identification algorithm targets a • Resource balancing: The ability to move nodes among clus- node in Cluster B for transfer to Cluster A. System adminis- ters in order to increase and decrease cluster sizes and thus trators can program the logic in the identification algorithm cluster processing capacities to align with business objectives, such as minimizing impact to clients or minimizing time to yield of the target 2For more information about virtual data center management software and the role of the global engine, see “Managing the Virtual Data Center” by J. Craig Lowery, Ph.D., in Dell Power Solutions, August 2003. 2 POWER SOLUTIONS November 2003
  3. 3. DATA CENTER ENVIRONMENT as the unit of work performed by Server 1 Server 2 Server 3 Three operations imple- (underutilized) an always-resident server process in response to a particular client J = Job J J J J ment the load-balancing request; in this instance, the J J J J J1 request represents the job. Gen- J J J J J J and resource-balancing erally, for workloads character- ized by uniformly short job J3 J2 J1 Job queue mechanisms: redistribute lengths, utilization is nearly equivalent across the nodes in work, remove node, steady state. However, when job Figure 2. Job scheduling workload redistribution lengths are generally unknown or and add node. highly variable, job completion times are impossible for the job (that is, the time until the cluster makes the node avail- scheduler to predict. Conse- able). Cluster B uses workload redistribution methods quently, utilization across nodes in the cluster becomes skewed over (discussed in the next section) to vacate this node. time as the scheduler makes inefficient assignments. 3. Remove. The vacant target node is removed from Cluster B. In addition, unpredictable job completion times can diminish 4. Add. The target node is added to Cluster A, which begins to the value of the remove operation: When the cluster must vacate redistribute its work across the new cluster member. a node, the job scheduler excludes that node from new assign- 5. After reallocation. A steady-state workload exists in Cluster A; ments. However, because the completion times for existing jobs on both Clusters A and B are provisioned appropriately. the node are unpredictable, determining when the node will be ready is impossible. As shown in Figure 1, the redistribute, add, and remove oper- ations enable the reallocation of hardware resources according to Managing process migration changes in demand. The add operation is easy to implement The workload redistribution method most difficult to implement is because it does not require an immediate reaction from the affected process migration. In this approach, the job scheduler initially assigns cluster; new nodes are integrated in a nondisruptive fashion during a process (that is, a job) to one node. Then the cluster management the redistribute operation. The remove operation also is simple software subsequently suspends the process, moves it to a differ- because work can be redistributed to vacate a target node before ent node, and resumes execution (see Figure 3). removing that node. Clearly, of the three operations, redistribute To appreciate the difficulties of process migration, consider the is the most critical. types and sizes of state information that general processes own, and that this state information must be copied from one system to Redistributing workloads Assigning new jobs to another to effect a migration. In the absence of overhead, process Whereas load-balancing mecha- migration provides the greatest flexibility and the fastest response nisms determine the appropriate nodes, or job scheduling, to configuration changes for jobs that possess very little state infor- allocation of work across nodes, mation. Sometimes migrating processes with sizable memory workload redistribution is the means of achieving that alloca- is a primary function of tion. That is, workload redistrib- ution techniques move a thread cluster operating systems. Server 1 Server 2 Server 3 of execution—a job, a process, or a session—among nodes. P = Process Assigning new jobs to nodes, or job scheduling, is a primary P1 P2 function of cluster operating systems. Usually the scheduler chooses = Data P3 P4 Copy P4 the least utilized node for a new job, as shown in Figure 2, but it can employ other criteria as well. Once a job starts on a node, it runs to completion on that node. A job is usually defined as a process group spanning a time from creation to completion. Alternatively, a job may be defined Figure 3. Process migration workload redistribution www.dell.com/powersolutions POWER SOLUTIONS 3
  4. 4. DATA CENTER ENVIRONMENT that of the system bus, system designers finally can consider work- Before hand-off After hand-off load redistribution methods such as session migration for mass Shared storage Shared storage deployment. Because workload redistribution is the key to load Data Data Data Data and resource balancing, high-performance interconnects are criti- cal to the success of the virtual data center. Data Data Data Data Data Data One interesting variation of process migration that also can leverage new interconnect technologies is virtual machine–hosted P P operating systems. Products such as VMware™ GSX Server™ and P P P P P ESX Server™ software and Microsoft® Virtual Server run several P P P P guest operating systems concurrently on a host operating system by simulating a reference implementation of node hardware. All state P = Process P = Idle or new process P = Vacating process information (the image) for the guest operating system resides in a large file, normally located in the local file system of the host. This method makes possible Figure 4. Session migration workload redistribution suspending the execution of the Because workload guest operating system with its structures—such as arrays, large local files, temporary working current state fixed in the image files, network connections, and user interface I/O paths—is more redistribution is the key file, moving the file to another time-consuming than simply letting the processes finish on the host system, and resuming exe- node to which they were initially assigned. to load and resource bal- cution. The transfer time for the If migration overhead were uniformly low, process migration image file introduces problems would be the ideal workload redistribution mechanism. A refine- ancing, high-performance similar to those of process migra- ment of the process migration concept is session or transaction tion. However, by keeping the migration, whereby a vacating process on one node hands off an interconnects are critical image file in shared storage and in-progress transaction to a new or idle process on another node enabling multiple host systems to (see Figure 4). A combination of shared storage and network to the success of the access the image file across a communication typically facilitates the hand-off. Several options high-performance interconnect, exist for implementing the session migration mechanism. For nearly instant migration of an virtual data center. example, processes could keep state information in local storage, entire execution environment is copying it to shared storage only when they must vacate the attainable. node. However, this approach offers little improvement over The drawbacks to this method include the overhead for sup- process migration. porting the virtual machine and the large-size granularity at the Conversely, processes could work directly from shared storage operating system level rather than at the process level. Adminis- all the time, allowing a process to vacate almost immediately if trators could assign one process per virtual operating system to necessary. In this scenario, another process on a different node could achieve finer granularity, but the virtualization overhead would pick up where the vacating process left off simply by accessing the become prohibitive. Even so, the combination of virtual operating share. Although long possible in shared-memory multiprocessor sys- systems, shared storage, and high-performance interconnects can tems, this approach was considered impractical in the general case be considered a step forward in achieving the goals of the virtual of cooperating servers on a LAN. However, relatively inexpensive, data center. high-performance interconnects such as InfiniBand™ and Gigabit Ethernet3 fabrics are enabling bandwidth-intensive cooperative Achieving the primary goal of transparency activities, such as working directly from a storage area network For at least 20 years, the concepts of load balancing, resource bal- (SAN) and remote direct memory access (RDMA). ancing, and workload redistribution have appeared in academic lit- Given environments in which low-cost, commodity hardware erature describing distributed operating systems.4 Much research into components can communicate at performance levels approaching creating such systems has led to many of the advances now 3This term indicates compliance with IEEE standard 802.3ab for Gigabit Ethernet, and does not connote actual operating speed of 1 Gbps. For high-speed transmission, connection to a Gigabit Ethernet server and network infrastructure is required. 4“Distributed Operating Systems” by Andrew S. Tanenbaum and Robbert Van Renesse in Association of Computing Machinery Computing Surveys, vol. 17, no. 4, December 1985. 4 POWER SOLUTIONS November 2003
  5. 5. DATA CENTER ENVIRONMENT incorporated into the virtual data center concept. One key differ- Oracle software stack provides a entiating characteristic of distributed operating systems, as opposed Technologies described consistent virtualization layer to traditional operating systems, is transparency. That is, users of above the hardware and oper- the system neither know, nor need to know, what network com- years ago in academic ating system. Session migration ponents are cooperating to service their requests and how the com- is a natural extension of the grid ponents accomplish those tasks. journals are finally features currently available in Because the virtual data center is a form of the distributed Oracle products. operating system, transparency is a primary goal. To be truly suc- taking shape as cessful, the virtual data center must be able to host applications Moving closer to the ideal that are indifferent to underlying hardware details and location, and tangible products. virtual data center are unaware that they may be subject to migration. Application Achieving cluster reconfiguration developers should not have to worry about synchronizing with and load balancing that is both predictable and transparent is reconfiguration events. Ideally, the virtual data center presents extremely difficult and presents several trade-offs. Job scheduling applications with a virtual machine that provides continuity of exe- is transparent because no migration occurs, but it is not predictable. cution at all times, making no special demands on applications to Process migration also is transparent but not predictable, because accommodate reconfiguration below the virtualization layer. a potentially large amount of state information must be trans- Some data center management software uses network boot ferred. Session migration is more predictable when processes work capabilities to make a node execute entirely from shared storage, directly from shared storage across high-speed network inter- similar to diskless workstations. This method allows a node to connects, but this predictability often is achieved at the expense change “personalities” by rebooting to a different image under the of transparency. Products coming to market implement each of these direction of some controlling agent, thereby providing location approaches, some in ways that may eventually overcome their tra- transparency. Although certainly useful, this method is of limited ditional shortcomings. benefit in the virtual data center because it is essentially coarse- Despite these challenges, much progress has been made in grained job scheduling; administrators must wait for all executing moving toward the ideal virtual data center, and the pace of devel- jobs to complete before rebooting the node to a new image. opment is quickening. Technologies described years ago in One way to achieve finer time academic journals are finally taking shape as tangible products. granularity (that is, move appli- High-performance interconnects, virtualization-ready execution cations around the data center To be truly successful, the environments, and standard components and protocols are paving more quickly) without sacrificing the way to the eventual realization of this long-pursued comput- transparency is to construct envi- virtual data center must be ing model. ronments specifically designed to relieve applications of the able to host applications J. Craig Lowery, Ph.D. (craig_lowery@dell.com) is chief security architect and a soft- burdens associated with load ware architect and strategist in the Dell™ Product Group–Software Engineering. Craig has balancing and workload redistri- an M.S. and a Ph.D. in Computer Science from Vanderbilt University and a B.S. in Comput- that are indifferent to bution. For example, many ing Science and Mathematics from Mississippi College. His primary areas of interest include Oracle® products—most notably computer networking, security, and performance modeling. Oracle9i™ Real Application Clus- underlying hardware details ters databases—include this capa- FOR MORE INFORMATION bility, which Oracle refers to as and location, and are Microsoft Virtual Server: http://www.microsoft.com/windowsserver2003/ grid computing.5 Application pro- evaluation/trial/virtualserver.mspx gramming interfaces (APIs) exist unaware that they may be Oracle Real Application Clusters: http://www.oracle.com/ip/rac_home.html for location-transparent data VMware: http://www.vmware.com access and processing, while the subject to migration. 5The Oracle definition of grid computing differs somewhat from other connotations of grid computing. For more information about grid computing, visit http://www.globus.org. www.dell.com/powersolutions POWER SOLUTIONS 5