Probabilistic Adaptive Load Balancing For Parallel Queries - Presentation Transcript
Probabilistic Adaptive Load Balancing for Parallel Queries Daniel M. Yellin* Jorge Buenabad-Chávez** Norman W. Paton*** *IBM Israel Software Lab ** Centro de Investigacion y de Estudios Avanzados del IPN , Mexico *** University of Manchester
Autonomic Computing
Autonomic computing provides general framework for adaptive systems, MAPE
… but getting the details right is tough
When a trend is sensed, when to adapt?
Not too early, not too late
What to adapt to?
Adapting in the wrong way can make things worse!
Our problem
Given:
A computational system S that can operate in one of several (possibly infinite) modes, m 1 ,m 2 ,..
Each mode is optimized for a particular workload
Goal:
Monitor the existing workload and decide when to adapt S to a different mode, optimized for the current (or predicted) workload
Considerations:
Risk: No promise that future workload will be similar to current workload
Cost: Each time we change the mode of the system S from m i to m j , we incur a cost. Switching modes can be expensive!
m1 m2 m4 m3 m5 m1 m2 m4 m3 m5 Entails a cost
Example: Pub-sub systems
Given:
S = pub-sub system, including a server and a set of clients
Modes = {cache a particular data item on a client, store particular data item on server}
Goal:
Monitor the access patterns of clients and decide when to move data item to client (server) from server (client)
Daniel M. Yellin: Competitive algorithms for the dynamic selection of component implementations. IBM Systems Journal 42 (1): 85-97 (2003). Server Client 1 Client 2 d1 d2 d3 d3 … read d3, read d3,... write d3, write d3,...
Example: Data type implementation
Given:
S = abstract data type with multiple implementations, each optimized for specific sorts of operations
Goal:
Monitor the operations on S and decide when to switch from one implementation to another
Component data Requests of type X faster using key K1 K1 impl 1 impl 2 Requests of type Y faster using key K2 K2
Our approach
Monitor existing workload and response times
Determine (a finite number of) modes to consider for adaptation
Determine likelihood of (a finite number of) workloads in the immediate future
For each relevant mode, compute the expected cost of switching to that mode, based upon probability of different workloads and cost of processing workload in that mode
Note: cost of adaptation (SwitchCost) is included in EC
Adaptive Query Processing
A query optimiser , given a query and information on the data involved and the environment in which the query is to be run, proposes an execution plan for that query that is predicted to yield the best response time.
If the information used by the optimiser is misleading (e.g. partial, incorrect, out-of-date or subject to change during query evaluation), the execution plan chosen by the optimiser may be inappropriate.
In Adaptive Query Processing , the execution plan is modified at query runtime, on the basis of feedback received from the environment.
Adaptation for Load Balancing
In partitioned parallelism, a task is divided into subtasks that are run in parallel on different nodes.
For a join, A ⋈ B is represented as the union of the results of plan fragments F i = A i ⋈ B i , for i = 1 .. P , where P is the level of parallelism.
The time taken to evaluate the join is max(evaluation_time(F i )), for i = 1.. P .
As a result, any delay in completing a fragment F i delays the completion of the operator, so it is crucial to match fragment size to node capabilities.
Most join algorithms have state; as such changing the size of a fragment allocated to a machine involves replicating or relocating operator state.
Flux*
When load imbalance is detected:
Halt query execution.
Compute new distribution policy (dp).
Update hash tables by transferring data between nodes.
Update dp in parent exchange nodes.
Resume query execution
Many variations of this technique exists and have been compared **
Scan(A) Join(A 1 ,B 1 ) Join(A 2 ,B 2 ) Hash table A 1 dp Hash table A 2 * M. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin, Flux: An Adaptive Partitioning Operator for Continuous Query Systems. ICDE 2003. ** Paton, N.W., Raman, V., Swart, G. and Narang, I., Autonomic Query Parallelization using Non-dedicated Computers: An Evaluation of Adaptivity Options, Proc. 3rd IEEE Intl. Conf. on Autonomic Computing , 2006.
Heuristics used by Flux
Units of adaptation : table divided into partitions, and each node can gain/loose at most one partition during an adaptation
Scale of adaptation : at most half the partitions can be moved at any adaptation
Frequency of adaptation : once an adaptation takes place and takes time s, no further adaptation until after time s
Timing of adaptation : applies heuristics to determine when to transfer partition from over-utilized processor to under-utilized processor
A brief review …
Our algorithm is based upon the concept of mathematical expectation .
If the probabilities of obtaining the amounts a 1 , a 2 ,..., a k
are p 1 , p 2 ,..., p k , where p 1 + p 2 +...+ p k = 1 then the mathematical expectation is:
E = a 1 * p 1 + a 2 * p 2 +...+ a k * p k
For example, if we win $10 when a die comes up 1 or 6, and lose $ 5 when it comes up 2, 3, 4 or 5, our mathematical expectation is:
E = 10*(2/6) + (-5)(4/6) = 0
Moving from heuristics to evidence-based decision making
Define the notion of expected cost (EC) of using a particular distribution policy dp
EC of dp is cost of processing the parallel query using dp, given that in the future we will have actual workloads w1,w2,… with probabilities of p1,p2,…
In practice, we only consider two workloads & two distribution policies: the currently used dp and the “optimal” one obtained from monitored workloads
Cost(dp,w1) computed how much longer it would take dp to finish processing w1 than the optimal distribution policy. See paper for details. SwitchCost is not present if dp is the currently used distribution policy
Probabilistic Delta Algorithm Initialize current_dp // initially distribute uniformly TimeToSwitch = False while (not TimeToSwitch) Process next portion of query Compute preferred_dp // “ideal” distribution ecNoChange = EC_NoChange(current_dp, preferred_dp, count) ecChange = EC_Change(preferred_dp, current_dp, count) if ecNoChange >= ecChange TimeToSwitch = True endwhile current_dp = preferred_dp Adapt to preferred_dp Includes SwitchCost Does not include SwitchCost
Computing probabilities of future workloads
Let n_c be the number of workloads in the window that are most similar to current_dp
Let n_p be the number of workloads in the window that are most similar to preferred_dp
Let n_w be the total number of time units in the window.
prob(preferred_dp) = n_p / n_w and prob(current_dp)= n_c / n_w
Note: can use more sophisticated techniques; e.g., weight the workloads based on
“ proximity” to current time
Experiment Setup (Simulator)
Cost model parameters: drawn from micro benchmarks
Database from TPC-H benchmark.
As number of nodes grows, the data is assumed to be striped over the available machines.
All machines are assumed to have the same capabilities, and to be sharing the same network.
Experiments use Q1: P ⋈ PS ( P has 200,000 tuples, PS has 800,00 0 tuples).
Same as in: Automatic Query Parallelization using Non-dedicated Compters: An Evaluation of Adaptivity Options , The Very Large Data Bases Journal, N. W. Paton, J. Buenabad-Chavez, M. Chen, V. Raman, G. Swart, I. Narang, D. M. Yellin, and A.A.A. Fernandez. To VLDB
Experiments
Periodic imbalance : The load on one or more of the machines comes and goes during the experiment. The level , duration , and repeat duration of the external load are varied.
Poisson imbalance : The arrival rate of jobs follows a Poisson distribution in which the average number of jobs starting per second varies.
“ Cyclic Poisson” imbalance : Like a Poisson distribution except the average workload is not constant but changes over time in cyclic fashion (like sine wave). Trying to mimic more realistic workloads that change over time.
Periodic load imbalance Parallelism level =3 Single node affected Duration & repeat duration of load spike = 1s Level of imbalance = avg # of external jobs introduced
PD is more conservative in deciding to adapt For previous experiment w/ level of imbalance = 6 current dp =0 means adaptation taking place PD adjusts only once to periodic increased load on node 1 Expected cost of adaptation is greater than expected cost of sticking with current distribution Each node start w/ 1/3 of workload but nodes 2 and 3 gain workload over time
Poisson load imbalance Parallelism level =3 Single node affected Duration of load spike = 1s
Our approach is sensitive to window size. What is best window size to use?
Investigate better techniques for computing the probability of future workloads
Use more than just two alternative distribution policies; e.g., can we infer a trend and use a “predicted distribution policy”?
Test the algorithm on a real system, not only with simulator
Conclusions
We investigated replacing heuristics with a more fundamental approach for determining when to adapt the system
Initial experiments showed that using Probabilistic Delta (expected cost) algorithm for determining when to adapt usually improved on existing approaches, sometimes significantly
The gain of this approach is due to inhibiting specious adaptations while still encouraging necessary adaptations
Backup slides
Adaptive Parallel Queries A distribution policy describes how we partition work between processors
0 comments
Post a comment