Scheduling DFRS Heuristics Experiments Conclusions
Dynamic Fractional Resource Scheduling
for HPC Workloads
Mark Stillwell...
Scheduling DFRS Heuristics Experiments Conclusions
Formalization
HPC Job Scheduling Problem
0 < N homogeneous nodes
0 < J ...
Scheduling DFRS Heuristics Experiments Conclusions
Formalization
Schedule Evaluation
make span not relevant for unrelated ...
Scheduling DFRS Heuristics Experiments Conclusions
Current Approaches
Current Approaches
Batch Scheduling, which no one li...
Scheduling DFRS Heuristics Experiments Conclusions
Dynamic Fractional Resource Scheduling
VM Technology
basically, time sh...
Scheduling DFRS Heuristics Experiments Conclusions
Dynamic Fractional Resource Scheduling
Problem Formulation
extends basi...
Scheduling DFRS Heuristics Experiments Conclusions
Dynamic Fractional Resource Scheduling
Yield
Definition
The yield, yj(t)...
Scheduling DFRS Heuristics Experiments Conclusions
Dynamic Fractional Resource Scheduling
Virtual Time
Definition
The virtu...
Scheduling DFRS Heuristics Experiments Conclusions
Dynamic Fractional Resource Scheduling
The Need for Preemption
final goa...
Scheduling DFRS Heuristics Experiments Conclusions
Dynamic Fractional Resource Scheduling
Priority
Jobs should be preempte...
Scheduling DFRS Heuristics Experiments Conclusions
Greedy Heuristics
Greedy Scheduling Heuristics
GREEDY– Put tasks on the...
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
Connection to multi-capacity bin packing
For each discre...
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
Applying MCB heuristics
yield is continuous, so choose a...
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Heuristic
Based on [Leinberger et al., 1999], simpl...
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Heuristic
Based on [Leinberger et al., 1999], simpl...
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Heuristic
Based on [Leinberger et al., 1999], simpl...
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Heuristic
Based on [Leinberger et al., 1999], simpl...
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Heuristic
Based on [Leinberger et al., 1999], simpl...
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Heuristic
Based on [Leinberger et al., 1999], simpl...
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Heuristic
Based on [Leinberger et al., 1999], simpl...
Scheduling DFRS Heuristics Experiments Conclusions
MCB Heuristics
MCB8 Scheduling Heuristics
DYNMCB8– Apply heuristic on e...
Scheduling DFRS Heuristics Experiments Conclusions
Methodology
Methodology
discrete event simulator takes list of jobs and...
Scheduling DFRS Heuristics Experiments Conclusions
Results
Average Maximum Yield, No preemption/migration
penalty
1
10
100...
Scheduling DFRS Heuristics Experiments Conclusions
Results
Average Maximum Yield, No preemption/migration
penalty
1
10
100...
Scheduling DFRS Heuristics Experiments Conclusions
Results
Average Maximum Yield, No preemption/migration
penalty
1
10
100...
Scheduling DFRS Heuristics Experiments Conclusions
Results
Average Maximum Yield, 5 minute
preemption/migration penalty
1
...
Scheduling DFRS Heuristics Experiments Conclusions
Results
Average Maximum Yield, 5 minute
preemption/migration penalty
1
...
Scheduling DFRS Heuristics Experiments Conclusions
Results
Average Maximum Yield, 5 minute
preemption/migration penalty
1
...
Scheduling DFRS Heuristics Experiments Conclusions
Results
Comparison of Synthetic vs. Real workload results
Scaled synth....
Scheduling DFRS Heuristics Experiments Conclusions
Results
Computation Times
Most scheduling events involve 10 or fewer jo...
Scheduling DFRS Heuristics Experiments Conclusions
Results
Costs
Greedy approaches use significantly less bandwidth than
MC...
Scheduling DFRS Heuristics Experiments Conclusions
Conclusions
DFRS potentially much better than batch scheduling
multi-ca...
Appendix
For Further Reading
References I
Leinberger, W., Karypis, G., and Kumar, V. (1999).
Multi-capacity bin packing al...
Upcoming SlideShare
Loading in...5
×

Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon

137

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
137
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Dynamic Fractional Resource Scheduling For HPC Workloads -- 2009, Lyon

  1. 1. Scheduling DFRS Heuristics Experiments Conclusions Dynamic Fractional Resource Scheduling for HPC Workloads Mark Stillwell1 Frédéric Vivien2,1 Henri Casanova1 1Department of Information and Computer Sciences University of Hawai’i at M¯anoa 2INRIA, France Invited Talk, October 8, 2009 M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  2. 2. Scheduling DFRS Heuristics Experiments Conclusions Formalization HPC Job Scheduling Problem 0 < N homogeneous nodes 0 < J jobs, each job j has: arrival time 0 ≤ rj 0 < tj ≤ N tasks compute time 0 < cj J not known rj and tj not known before rj cj not known until j completes M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  3. 3. Scheduling DFRS Heuristics Experiments Conclusions Formalization Schedule Evaluation make span not relevant for unrelated jobs flow time over-emphasizes very long jobs stretch re-balances in favor of short jobs average stretch prone to starvation max stretch helps with average while bounding worst case M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  4. 4. Scheduling DFRS Heuristics Experiments Conclusions Current Approaches Current Approaches Batch Scheduling, which no one likes usually FCFS with backfilling backfilling needs (unreliable) compute time estimates unbounded wait times poor resource utilization No particular objective Gang Scheduling, which no one uses globally coordinated time sharing complicated and slow memory pressure a concern M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  5. 5. Scheduling DFRS Heuristics Experiments Conclusions Dynamic Fractional Resource Scheduling VM Technology basically, time sharing pooling of discrete resources (e.g., multiple CPUs) hard limits on resource consumption job preemption and task migration M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  6. 6. Scheduling DFRS Heuristics Experiments Conclusions Dynamic Fractional Resource Scheduling Problem Formulation extends basic HPC problem jobs now have per-task CPU need αj and memory requirement mj multiple tasks can run on one node if total memory requirement ≤ 100% job tasks must be assigned equal amounts of CPU resource assigning less than the need results in proportional slowdown assigned allocations can change no run-time estimates so we need another metric to optimize M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  7. 7. Scheduling DFRS Heuristics Experiments Conclusions Dynamic Fractional Resource Scheduling Yield Definition The yield, yj(t) of job j at time t is the ratio of the CPU allocation given to the job to the job’s CPU need. requires no knowledge of flow or compute times can be optimized for at each scheduling event maximizing minimum yield related to minimizing maximum stretch How do we keep track of job progress when the yield can vary? M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  8. 8. Scheduling DFRS Heuristics Experiments Conclusions Dynamic Fractional Resource Scheduling Virtual Time Definition The virtual time vj(t) of job j at time t is the subjective time experienced by the job. vj(t) = t rj yj(τ)dτ job completes when vj(t) = cj M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  9. 9. Scheduling DFRS Heuristics Experiments Conclusions Dynamic Fractional Resource Scheduling The Need for Preemption final goal is to minimize maximum stretch without preemption, stretch of non-clairvoyant on-line algorithms unbounded consider 2 jobs both require all of the system resources one has cj = 1 other has cj = ∆ need criteria to decide which jobs should be preempted M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  10. 10. Scheduling DFRS Heuristics Experiments Conclusions Dynamic Fractional Resource Scheduling Priority Jobs should be preempted in order by increasing priority. newly arrived jobs may have infinite priority 1/vj(t) performs well, but subject to starvation (t − rj)/vj(t) time avoids starvation, but does not perform well (t − rj)/(vj(t))2 seems a reasonable compromise other possibilities exist M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  11. 11. Scheduling DFRS Heuristics Experiments Conclusions Greedy Heuristics Greedy Scheduling Heuristics GREEDY– Put tasks on the host with the lowest CPU demand on which it can fit into memory; new jobs may have to be resubmitted using bounded exponential backoff. GREEDY-PMTN– Like GREEDY, but older tasks may be preempted GREEDY-PMTN-MIGR– Like GREEDY-PMTN, but older tasks may be migrated as well as preempted M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  12. 12. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics Connection to multi-capacity bin packing For each discrete scheduling event: problem similar to multi-capacity (vector) bin packing, but has optimization target and variable CPU allocations can formulate as an MILP [Stillwell et al., 2009] (NP-complete) relaxed LP heuristics slow, give low quality solutions M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  13. 13. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics Applying MCB heuristics yield is continuous, so choose a granularity (0.01) perform a binary search on yield, seeking to maximize for each fixed yield, set CPU requirement and apply heuristic found yield is the maximized minimum, leftover CPU used to improve average if a solution cannot be found at any yield, remove the lowest priority job and try again M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  14. 14. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Heuristic Based on [Leinberger et al., 1999], simplified to 2-dimensional case: 1 Put job tasks in two lists: CPU-intensive and memory-intensive 2 Sort lists by “some criterion”. (MCB8: descending order by maximum) 3 Starting with the first host, pick tasks that fit in order from the list that goes against the current imbalance. Example: current host tasks total 50% CPU and 60% memory Assign the next task that fits from the list of CPU-intensive jobs. 4 When no tasks can fit on a host, go to the next host. 5 If all tasks can be placed, then success, otherwise failure. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  15. 15. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Heuristic Based on [Leinberger et al., 1999], simplified to 2-dimensional case: 1 Put job tasks in two lists: CPU-intensive and memory-intensive 2 Sort lists by “some criterion”. (MCB8: descending order by maximum) 3 Starting with the first host, pick tasks that fit in order from the list that goes against the current imbalance. Example: current host tasks total 50% CPU and 60% memory Assign the next task that fits from the list of CPU-intensive jobs. 4 When no tasks can fit on a host, go to the next host. 5 If all tasks can be placed, then success, otherwise failure. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  16. 16. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Heuristic Based on [Leinberger et al., 1999], simplified to 2-dimensional case: 1 Put job tasks in two lists: CPU-intensive and memory-intensive 2 Sort lists by “some criterion”. (MCB8: descending order by maximum) 3 Starting with the first host, pick tasks that fit in order from the list that goes against the current imbalance. Example: current host tasks total 50% CPU and 60% memory Assign the next task that fits from the list of CPU-intensive jobs. 4 When no tasks can fit on a host, go to the next host. 5 If all tasks can be placed, then success, otherwise failure. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  17. 17. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Heuristic Based on [Leinberger et al., 1999], simplified to 2-dimensional case: 1 Put job tasks in two lists: CPU-intensive and memory-intensive 2 Sort lists by “some criterion”. (MCB8: descending order by maximum) 3 Starting with the first host, pick tasks that fit in order from the list that goes against the current imbalance. Example: current host tasks total 50% CPU and 60% memory Assign the next task that fits from the list of CPU-intensive jobs. 4 When no tasks can fit on a host, go to the next host. 5 If all tasks can be placed, then success, otherwise failure. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  18. 18. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Heuristic Based on [Leinberger et al., 1999], simplified to 2-dimensional case: 1 Put job tasks in two lists: CPU-intensive and memory-intensive 2 Sort lists by “some criterion”. (MCB8: descending order by maximum) 3 Starting with the first host, pick tasks that fit in order from the list that goes against the current imbalance. Example: current host tasks total 50% CPU and 60% memory Assign the next task that fits from the list of CPU-intensive jobs. 4 When no tasks can fit on a host, go to the next host. 5 If all tasks can be placed, then success, otherwise failure. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  19. 19. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Heuristic Based on [Leinberger et al., 1999], simplified to 2-dimensional case: 1 Put job tasks in two lists: CPU-intensive and memory-intensive 2 Sort lists by “some criterion”. (MCB8: descending order by maximum) 3 Starting with the first host, pick tasks that fit in order from the list that goes against the current imbalance. Example: current host tasks total 50% CPU and 60% memory Assign the next task that fits from the list of CPU-intensive jobs. 4 When no tasks can fit on a host, go to the next host. 5 If all tasks can be placed, then success, otherwise failure. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  20. 20. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Heuristic Based on [Leinberger et al., 1999], simplified to 2-dimensional case: 1 Put job tasks in two lists: CPU-intensive and memory-intensive 2 Sort lists by “some criterion”. (MCB8: descending order by maximum) 3 Starting with the first host, pick tasks that fit in order from the list that goes against the current imbalance. Example: current host tasks total 50% CPU and 60% memory Assign the next task that fits from the list of CPU-intensive jobs. 4 When no tasks can fit on a host, go to the next host. 5 If all tasks can be placed, then success, otherwise failure. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  21. 21. Scheduling DFRS Heuristics Experiments Conclusions MCB Heuristics MCB8 Scheduling Heuristics DYNMCB8– Apply heuristic on every event DYNMCB8-PER– Apply heuristic periodically DYNMCB8-ASAP-PER– like DYNMCB8-PER, but try to greedily schedule incoming jobs DYNMCB8-STRETCH-PER– like DYNMCB8-PER, but try to optimize worst-case max stretch M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  22. 22. Scheduling DFRS Heuristics Experiments Conclusions Methodology Methodology discrete event simulator takes list of jobs and returns stretch values workloads based on synthetic and real traces synthetic workload arrival times scaled to show performance on different load conditions algorithms evaluated by per-trace degredation factor experiment with “free” preemption/migration and experiment where preemption/migration costs job a constant amount of wall clock time. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  23. 23. Scheduling DFRS Heuristics Experiments Conclusions Results Average Maximum Yield, No preemption/migration penalty 1 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DegredationFactor Load M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  24. 24. Scheduling DFRS Heuristics Experiments Conclusions Results Average Maximum Yield, No preemption/migration penalty 1 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DegredationFactor Load M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  25. 25. Scheduling DFRS Heuristics Experiments Conclusions Results Average Maximum Yield, No preemption/migration penalty 1 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DegredationFactor Load M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  26. 26. Scheduling DFRS Heuristics Experiments Conclusions Results Average Maximum Yield, 5 minute preemption/migration penalty 1 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DegredationFactor Load M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  27. 27. Scheduling DFRS Heuristics Experiments Conclusions Results Average Maximum Yield, 5 minute preemption/migration penalty 1 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DegredationFactor Load M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  28. 28. Scheduling DFRS Heuristics Experiments Conclusions Results Average Maximum Yield, 5 minute preemption/migration penalty 1 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DegredationFactor Load M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  29. 29. Scheduling DFRS Heuristics Experiments Conclusions Results Comparison of Synthetic vs. Real workload results Scaled synth. Unscaled synth. Real-world Algs Deg. factor Deg. factor Deg. factor avg. max avg. max avg. max EASY 167 560 139 443 94 1476 FCFS 186 569 154 476 118 2219 greedy 294 1093 249 1050 153 1527 greedyp 41 875 35 785 9 147 greedypm 62 835 37 773 17 759 mcb 32 162 11 162 11 231 mcbp 1 12 2 21 3 20 gmcbp 1 9 2 22 2 20 mcbsp 1 12 2 21 3 23 M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  30. 30. Scheduling DFRS Heuristics Experiments Conclusions Results Computation Times Most scheduling events involve 10 or fewer jobs and require negligible time for all schedulers. Even when there are about 100 jobs, the time for MCB8 is under 5 seconds on a 3.2Ghz machine M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  31. 31. Scheduling DFRS Heuristics Experiments Conclusions Results Costs Greedy approaches use significantly less bandwidth than MCB approaches (<1GB/s in the worst case) MCB approaches cause jobs to be preempted around 5 times on average. DYNMCB8 uses 1.3GB/s on average, 5.1GB/s maximum periodic algorithms 0.6GB/s on average, 2.1GB/s maximum M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  32. 32. Scheduling DFRS Heuristics Experiments Conclusions Conclusions DFRS potentially much better than batch scheduling multi-capacity bin packing heuristics perform best targeting yield does as well as targeting worst case max stretch periodic MCB approaches perform nearly as well as aggressive ones when there is no migration cost and much better when there is a fixed migration cost adding an opportunistic greedy scheduling heuristic to DYNMCB8-PER gives no real benefit to max stretch MCB approaches can calculate resource allocations reasonably quickly MCB approaches need to try to mitigate migration/preemptions costs M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  33. 33. Appendix For Further Reading References I Leinberger, W., Karypis, G., and Kumar, V. (1999). Multi-capacity bin packing algorithms with applications to job scheduling under multiple constraints. In Proc. of the Intl. Conf. on Parallel Processing, pages 404–412. IEEE. Stillwell, M., Shanzenbach, D., Vivien, F., and Casanova, H. (2009). Resource Allocation using Virtual Clusters. In Proc. of CCGrid 2009, pages 260–267. IEEE. M Stillwell, F Vivien, H Casanova UH Manoa ICS, INRIA Dynamic Fractional Resource Schedulingfor HPC Workloads
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×