Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Adaptive load balancing techniques in global scale grid environment

578 views

Published on

  • Be the first to comment

  • Be the first to like this

Adaptive load balancing techniques in global scale grid environment

  1. 1. International JournalVolume 1, Number Engineering(IJCET), ISSN 0976 – 6367(Print), International Journal of Computer Engineering and Technology ISSN 0976 – 6375(Online) of Computer 2, Sept – Oct (2010), © IAEMEand Technology (IJCET), ISSN 0976 – 6367(Print)ISSN 0976 – 6375(Online) Volume 1 IJCETNumber 2, Sept - Oct (2010), pp. 85-96 ©IAEME© IAEME, http://www.iaeme.com/ijcet.html ADAPTIVE LOAD BALANCING TECHNIQUES IN GLOBAL SCALE GRID ENVIRONMENT D.Asir PG Scholar Department of Computer Science and Engineering Karunya University, Coimbatore E-Mail: asird@karunya.edu.in Shamila Ebenezer Assistant Professor Department of Computer Science and Engineering Karunya University, Coimbatore E-Mail: shamila_cse@karunya.edu Daniel.D, PG Scholar Department of Computer Science and Engineering Karunya University, Coimbatore E-Mail: Daniel_joen@yahoo.com ABSTRACT Data partitioning and load balancing are important components of parallel computations. Many different partitioning strategies have been developed, with great effectiveness in parallel applications. But the load-balancing problem is not yet solved completely; new applications and architectures require new partitioning features. Increased use of heterogeneous computing architectures requires partitioners that account for non-uniform computing, network, and memory resources. This paper surveys different adaptive technique for a partial differential system to solve load balancing problem. Index Terms: Dynamic load balancing; Performance characterization; Adaptive mesh refinement. 85
  2. 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEMEI. INTRODUCTION Adaptive Load Balancing Operate smoothly and scale reliably when facing spikesin data volumes or unexpected utilization loads on the grid. Also it selects the best nodefor session execution based on resource requirements and availability. An application-centric performance characterization of dynamic partitioning and Load balancingtechniques for distributed adaptive grid hierarchies that underlie parallel adaptive meshrefinement (AMR) techniques [1,14] for the solution of partial differential equations.Early adaptive techniques of mesh motion (r-refinement) have been giving way tomethods that combine mesh refinement/coarsening (h-refinement) with order variation(p-refinement) [3]. As advances in computer architecture enable the solution of complexthree-dimensional problems, the efficiency, reliability, and robustness provided byadaptively will make its use even more advantageous. Parallel computation will beessential in these simulations. Processor load-balancing must be dynamic since frequentadaptive enrichment will upset a balanced computation. An adaptive finite elementmethod, have workloads that are unpredictable or change during the computation; suchapplications require dynamic load balancers that adjust the decomposition as thecomputation proceeds. Numerous strategies for static and dynamic load balancing havebeen developed, including recursive bisection (RB) methods, space filling curve (SFC)partitioning and graph partitioning, multilevel, and diffusive methods [7,10]. Thesemethods provide effective partitioning for many applications, perhaps suggesting that theload-balancing problem is solved. Efficient parallel execution of these irregular gridapplications requires the partitioning of the associated graph into p parts with thefollowing two objectives: (i) each partition has an equal amount of total vertex weight;(ii) the total weight of the edges cut by the partitions is minimized [2]. Simulation ofthree dimensional flow with chemical reactions and plasma discharge in complexgeometries is one of the most resource demanding problems in computational science,requiring both high performance and high-throughput computing. Grid computingtechnologies opened up new opportunities to access virtually unlimited computationalresources, and inspired many researchers to develop new methodologies and algorithmsfor parallel distributed applications on the Grid. 86
  3. 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEMEII. ALB ALGORITHMSA. Adaptive mesh-refinement algorithms (AMR)1) Space-Filling Curves: Space-filling curves (SFC) [1] are a class of locality preserving mappings from d-dimensional space to 1- dimensional space. The self similar or recursive nature of mappings can be exploited to represent a hierarchical structure and to maintain locality across different levels of hierarchy. The SFC representation of the adaptive grid hierarchy is a 1-D ordered list of composite grid blocks where each composite block represents a block of the entire grid hierarchy and may contain more than one grid level.2) Independent Grid Distribution: Distributes the grids independently across the processors. This distribution leads to balanced loads and no redistribution is required when grids are created or deleted. In the adaptive grid hierarchy, a fine grid typically corresponds to a small region of the underlying coarse grid. If both, the fine and coarse grid are distributed over the entire set of processors, all the processors will communicate with the small set of processors corresponding to the associated coarse grid region, causing a serialization bottleneck.3) Combined Grid Distribution: Distributes the total work load in the grid hierarchy by first forming a simple linear structure by abutting grids at a level and then decomposing this structure into partitions of equal load. Regriding operations involving the creation or deletion of a grid are extremely expensive, as they require an almost complete redistribution of the grid hierarchy [4].The combined grid decomposition does not exploit the parallelism available within a level of the hierarchy.4) Independent Level Distribution: Each level of the grid hierarchy is distributed by partitioning the combined load of all component grids at the level among the processors. This scheme overcomes some of the drawbacks of the independent grid distribution. Parallelism within a level of the hierarchy is exploited. Although the inter- grid communication bottleneck is reduced in this case, the required scatter communications can be expensive. Creation or deletion of component grids at any level requires a redistribution of the entire level. 87
  4. 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME5) Iterative Tree balancing: A table is created from the grids at each time step, which keeps pointers to neighboring and parent grids. for every grid, immediate neighbors and children are also considered along with load distribution. Thus load balancing, inter level communication and intra level communication are addressed together. This scheme is used for distributing fine-element meshes and is promising as it deals with all the constraints to some extent.6) Weighted Distribution: First assign a weight to each of these overheads. This weight defines the significance and contribution of the overhead to the overall application performance, The next step uses these weights to compute the affinity of each component grid to the different processors. Initially, grids have no affinity for any processor.B. Dynamic Load Balancing via Tiling Tiling load-balancing system [3] is a modification of the global load-balancingtechnique of that is applicable to a wide class of two-dimensional, uniform-gridapplications. Global balance is achieved by performing local balancing withinoverlapping processor neighborhoods, where each processor is defined to be the center ofa neighborhood. Local balance involves element migrations to processors in the sameneighborhood that have elements sharing edges. tiling system is required by the adaptiverefinement algorithm. Because elemental workloads may vary due to refinement, thetiling algorithm must account for elemental workloads when performing local loadbalancing.C. Multi criteria Geometric Partitioning: Crash simulations are “multiphase" applications consisting of two separatephases: computation of forces and contact detection. Obtaining a single decompositionthat is good with respect to both phases would remove the need for communicationbetween phases. Each object would have multiple loads, corresponding to its workload ineach phase. The challenge would be computing a single decomposition that is balancedwith respect to all loads. Such a multi criteria partitioner could be used in other situationsas well, such as balancing both computational work and memory usage. Most geometricpartitioners reduce the partitioning problem [6] to a one-dimensional problem. Multi 88
  5. 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEMEcriteria load balancing can be formulated as either a multi constraint or multi objectiveproblem. Often, the balance of each load is considered a constraint and has to satisfy acertain tolerance. Such a formulation fits the standard form, where, in this case, there isno objective, only constraints. Unfortunately, there is no guarantee that a solution existsto this problem. In practice, we want a “best possible" decomposition [7], even if thedesired balance criteria cannot be satisfied. Thus, an alternative is to make the constraintsobjectives; that is, we want to achieve as good balance as possible with respect to allloads.D. Repartitioning Algorithms Based on Multilevel Diffusion The multilevel graph partitioning algorithm [2] implemented in METIS has threephases, a coarsening phase a partitioning phase, and a refinement phase. During thecoarsening phase, a sequence of smaller graphs are constructed from an input graph bycollapsing vertices together. When enough vertices have been collapsed together so thatthe coarsest graph is sufficiently small, a kway partition is found. Finally, the partition ofthe coarsest graph is projected back to the original graph by refining it at eachuncoarsening level using a kway partitioning refinement algorithm. In the coarseningphase, only pairs of nodes that belong to the same partition are considered for merging.Hence, the initial partition of the coarsest level graph is identical to the input partition ofthe graph that is being repartitioned and thus does not need to be computed. This makesthe coarsening phase completely parallelizable, as coarsening is local to each processor.The uncoarsening phase of MLD contains two subphases: multilevel diffusion andmultilevel refinement. In the multi-level diffusion phase, balance is sought on thecoarsest graph in a process similar to multilevel refinement. This is accomplished byforcing the migration of vertices out of overbalanced partitions. 89
  6. 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME Figure 2.1 Multilevel diffusion repartitioning Multilevel diffusion repartitioning algorithms are made up of three phases, graphcoarsening, multilevel diffusion, and multilevel refinement. The coarsening phase resultsin a series of contracted graphs. The multilevel diffusion phase balances the graph usingthe very coarsest graphs. The multilevel refinement phase seeks to improve the edge-cutdisturbed by the balancing process. Optionally, the multilevel diffusion can be guided bya diffusion solution. We will refer to our multilevel undirected diffusion repartitioningalgorithm as MLD and to our multilevel directed diffusion repartitioning algorithm asMLDD. Single-level directed diffusion (SLDD) will be used to provide a comparisonwith our multilevel diffusion schemes. In SLDD, diffusion and refinement are performedonly on the original input graph and thus, no graph contraction is performed.E. SAMR (Structured Adaptive Mesh Refinement) Adaptive Characteristics of SAMR Applications [14] are analyzed from fouraspects: granularity, dynamicity, imbalance and dispersion. 90
  7. 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME1) Granularity: The basic entity for data movement is a grid. Each grid consists of a computational interior and a ghost zone. The computational interior is the region of interest that has been refined from the immediately coarser level; the ghost zone is the part added to exterior of computational interior in order to obtain boundary information. For the computational interior, there is a requirement for the minimum number of cells, which is equal to the refinement ratio to the power of the number of dimensions.2) Dynamicity: After each time-step of every level, the adaptation process is invoked based on one or more refinement criteria defined at the beginning of the simulation. The local regions satisfying the criteria will be refined. High frequency of adaptation requires the underlying DLB method to execute very fast, as well as to maintain high quality of load balancing.3) Load Imbalance: The ideal balanced load is calculated. The standard deviation is pretty small compared to the average load, which means that the average load reflects the entire load distribution.4) Dispersion: A few processors whose loads are increased dramatically and most processors have little or no change. All the processors can be grouped into four subgroups and each subgroup has similar characteristics with the percentage of refinement ranging from zero to 86% .These calculation indicates that different datasets exhibit different load distribution, and the underlying DLB scheme should provide high quality of load balancing for all these datasets. After taking into consideration the adaptive characteristics of the SAMR application, we developed an improved DLB scheme. DLB is composed of two steps: moving grid phase and splitting-grid phase.Moving Grid Phase:Step 1: Assign Moveflag, Splitflag as one and Lastmin,Lastmax as zero.Step 2: When the condition Maxload/Avgload > threshold is set, the load is imbalanced.Step 3: Then the Maxproc moves its grid to Minproc(using global information) under the condition the load is no more than (threshold * Avgload-Minload)Step 4: This phase continues until all grids residing on the Maxproc are too large to be moved. 91
  8. 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEMESplitting Grid Phase:Step 1: The Maxproc finds the Maxgrid.Step 2: If the size of Maxgrid is no more than (Avgload-Minload) the grid moved to Minproc from Maxproc.Step 3: If not Maxproc Splits the grid into two smaller grids.Step 4: Any one size is around (Avgload-Minload) will be redistributed to Minproc.F. Adaptive workload balancing (AWLB) on heterogeneous resources One of the factors that determine the performance of parallel applications onheterogeneous resources is the quality of the workload distribution, e.g. throughfunctional decomposition or domain decomposition. Optimal load distribution ischaracterized by two things: (1) all processors have a workload proportional to theircomputational capacity and (2) communications between the processors are minimized.These goals are conflicting since the communication is minimized when all the workloadis processed by a single processor and no communication takes place, and distributing theworkload inevitably incurs communication overheads. Thus, it is necessary to find abalance and define a metric [15] that characterizes the quality of workload distributionfor a parallel problem.1. Benchmark the resources dynamically assigned to the parallel application; measure the resource characteristics that constitute the set of resource parameters µ (available processing power, memory and links bandwidth).2. Estimate the range of possible values of the application parameter fc. The minimal value is fmin=0, which corresponds to the case when no communications occur between the parallel processes of the application. The upper bound can be calculated based on the following reasoning: For the parallel processing to make sense, that is to ensure that running a parallel program on several processors is faster than sequential execution, the calculation time should exceed communication time. For homogeneous resources this can be expressed as follow3. Search through the range of possible values of fc in [0 . . . fc max] to find the optimal value fc* minimizing the application execution time. For each value of fc calculate the 92
  9. 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEME corresponding load distribution based on the resource parameters µ .With this distribution perform one time step, and measure the execution time the target optimization function. Selection of the next value of fc can be done by any optimization method for unimodal smooth functions; for instance a simple line-search method can be used.4. Execute further calculations using the discovered fc*.5. In the case of dynamic resources where performance is influenced by other factors (which is generally the case on the Grid), a periodic reestimation of resource parameters µ and load redistribution shall be performed during run-time of the application. Re-balancing shall be invoked if the application performance over the last step drops more than a certain user-defined threshold.6. If the application is dynamically changing then fc*must be periodically re-estimated on the same set of resources.G. The Path AlgorithmThere are two steps to implement the PATH algorithm: First Step: We use simple single-packet algorithm (SMSP) to check the networkstructure and to get the bottleneck link Lk. Compared with the standard single-packetalgorithm (SDSP) [12], SMSP algorithm does not have to measure the bandwidth of eachlink of the whole network. Second Step: Use Packet Train with header probe to measure the bandwidth ofthe link Lk. The source sends out a header packet H and a packet train T1, T2,… Tn.Both the header and the packet train are UDP packets. All the packets Ti of the packet-train are of the same size. Sh, the size of header packet H is much larger than St, the sizeof Ti. Each packet Ti contains only 8 bytes, used for identifying the packet. We denote the time-to-live (TTL) of a packet by tj if the packet expires afterreaching router Rj. The TTL of all the packet-train packets Ti is tj. So the Ti packets willstop at router Rj. Rj would respond through ICMP time-exceeded packets to the source.III EVALUATION Efficient data structures used for adaptive refinement and tiling include trees ofgrids with finer grids regarded as offspring of coarser ones. Within each grid, AVL tree 93
  10. 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEMEstructures [3] permit easy insertion and deletion of elements as they migrate betweenprocessors. Similar tree structures at inter-processor boundaries facilitate the transfer ofdata between neighboring processors. Most previous work focuses on incorporatingenvironment information into preselected partitioning algorithms [6,7,10]. As analternative, such information could be used to select appropriate partitioning strategies.The work assigned to these nodes is then recursively partitioned among the nodes in theirsub trees. Different partitioning methods can be used in each level and sub tree toproduce effective partitions with respect to the network; for example, graph or hypergraph partitioners could minimize communication between nodes connected by slownetworks while fast geometric partitioners operate within each node. A repartitioning of adynamic graph can be computed by simply partitioning the new graph from scratch.However, since no concern is given for the existing partition, most vertices are not likelyto be assigned to their initial partitions with this method. Intelligent remapping of theresulting partition can reduce the required movement of vertices, but vertex migration canstill be quite high. The second strategy is to use the existing partitioning as input for arepartitioning algorithm and to attempt to minimize the difference between the originalpartition and the output partition. This strategy can result in much smaller vertexmigration compared to schemes that partition the modified graph from scratch. ourmultilevel diffusion repartitioning algorithms are made up of three phases, graphcoarsening, multilevel diffusion, and multilevel refinement. The coarsening phase resultsin a series of contracted graphs. The multilevel diffusion phase balances the graph usingthe very coarsest graphs. The multilevel refinement phase [3] seeks to improve the edge-cut disturbed by the balancing process. Optionally, the multilevel diffusion can be guidedby a diffusion solution. DLB is not a Scratch-Remap Scheme because it takes intoconsideration the previous load distribution during the current redistribution process. Ascompared to Diffusion Scheme, our DLB scheme differs from it in two manners. First,our DLB scheme addresses the issue of coarse granularity of SAMR applications [14]. Itsplits large-sized grids located on overloaded processors if just the movement of grids isnot enough to handle load imbalance. Second, our DLB scheme chooses the direct datamovement between overloaded and under loaded processors instead of just betweenneighboring processors. 94
  11. 11. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEMEIV CONCLUSION In this paper we surveyed various Adaptive techniques for balancing the load in aglobal scale grid environment. By using DLB scheme including moving-grid phase andsplit-grid phase, the total execution time of SAMR applications was reduced up to 47%,and the quality of load balancing was improved by more than two times especially whenthe number of processors is larger than 16. In multilevel diffusion technique the resultson a variety of synthetic and application meshes show that it is a robust scheme forrepartitioning a wide variety of adaptive meshes. For adaptive finite element methods,data movement from an old decomposition to a new one can consume orders ofmagnitude more time than the actual computation of a new decomposition; highlyincremental partitioning strategies that minimize data movement are important for highperformance of adaptive simulationsREFERENCES[1] Characterizing the Performance of Dynamic Distribution and Load-Balancing Techniques for Adaptive Grid Hierarchies, Mausumi Shee, Samip Bhavsar, and Manish Parashar, Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems November 3-6, 1999 in Cambridge Massachusetts, USA.[2] Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes, Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes Kirk Schloegl, George Karypis, and Vipin Kumar, JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 47, 109–124 (1997) ARTICLE NO. PC971410[3] Parallel Adaptive hp-Refinement Techniques for Conservation Laws, Karen D. Devine and Joseph E. Flaherty, Applied Numerical Mathematics, 20 (1996) 367-386 Sandia National Laboratories Tech. Rep. SAND95-1142J[4] Adaptive Performance Modeling on Hierarchical Grid Computing Environments Wahid Nasri1, Luiz Angelo Steffenel and Denis Trystram, Laboratoire ID-IMAG, INPG, Grenoble, France, Author manuscript, published in " (2007)"[5] Object-Based Adaptive Load Balancing for MPI Programs Milind Bhandarkar, L. V. Kal’e, Eric de Sturler, and Jay Hoeinger, Research funded by the U.S. Department of 95
  12. 12. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 – 6367(Print),ISSN 0976 – 6375(Online) Volume 1, Number 2, Sept – Oct (2010), © IAEMEEnergy through the University of California under Subcontract number B341494, October 6, 2000[6] Parallel Dynamic Graph Partitioning for Adaptive Unstructured Meshes, C. Walshaw, M. Cross, and M. G. Everett, JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 47, 102–108 (1997) ARTICLE NO. PC971407[7] New Challenges in Dynamic Load Balancing, Karen D. Devine 1, Erik G. Boman, Robert T. Heaphy, Bruce A. Hendrickson, Sandia contract PO15162 and the Computer Science Research Institute at Sandia National Laboratories.[8] H. Casanova, “Simgrid: A Toolkit for the Simulation of Application Scheduling,” in Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid’01), May 2001, pp. 430–437.[9] G. Shao, Adaptive Scheduling of Master/Worker Applications on Distributed Computational Resources, Ph.D. thesis, University of California, San Diego, May 2001.[10] On Partitioning Dynamic Adaptive Grid Hierarchies,Manish Parashar and James C.Browne, Binary Black-Hole NSF Grand challenge (NSF ACS/PHY 9318152),January 1996.[11] Hash-Storage Techniques for Adaptive multilevel solvers and their domain Decomposition Parallelization, Contemporary Mathematics volume 218,1998.[12] A. B. Downey, “Using Pathchar to Estimate Internet Link Characteristics” ACM SIGCOMM 99 Pages: 241-250.[13] Adaptive Load Balancing for Divide-and-Conquer Grid Applications Rob V. van Nieuwpoort, Jason Maassen, Gosia Wrzesi_nska, Thilo Kielmann, Henri E. Bal, 2004 Kluwer Academic Publishers[14] Dynamic Load Balancing for Structured Adaptive Mesh Refinement Applications, Zhiling Lan, Valerie E. Taylor, Greg Bryan, National Computational Science Alliance (ACI- 9619019)[15] V.V. Korkhov, et al., A Grid-based Virtual Reactor: Parallel performance and adaptive load balancing, J. Parallel Distrib. Comput. (2007), doi: 10.1016/j.jpdc.2007.08.010 96

×