Dynamic Fractional Resource Scheduling Practical Issues and Future Directions -- 2011, Cetraro

  • 82 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
82
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Introduction Implementation Concluding Remarks Appendix Dynamic Fractional Resource Scheduling Practical Issues and Future Directions Mark Stillwell Fr´ed´eric Vivien Henri Casanova INRIA, Universit´e de Lyon, LIP International Research Workshop on Advanced High Performance Computing Systems Cetraro, Italy June 27–29, 2011 M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 2. Introduction Implementation Concluding Remarks Appendix Outline Introduction Implementation Concluding Remarks M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 3. Introduction Implementation Concluding Remarks Appendix Outline Introduction Implementation Concluding Remarks M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 4. Introduction Implementation Concluding Remarks Appendix Dynamic Fractional Resource Scheduling HPC Scheduling Problem: homogeneous nodes with high-speed interconnect network file system release date and number of tasks known at submission computation time known only after completion goal: minimize maximum stretch approach: based on virtual machine technology fractional resource allocations preemption / migration on-line scheduling problem → sequence of off-line resource allocation problems computable, on-line performance metric related to max stretch heuristics for task placement/resource allocation M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 5. Introduction Implementation Concluding Remarks Appendix Task Placement and Resource Allocation Problem nodes comprise a number of resources (CPU cores, memory, bandwith, etc...) tasks have rigid requirements and fluid needs job tasks have the same requirements and needs rigid requirements specify minimum resource allocations fluid needs specify minimum allocations without performance degradation yield of a job is a value between 0 and 1 correlated with performance jobs allocated product of yield and need in each resource Goal: find an allocation that maximizes the minimum yield M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 6. Introduction Implementation Concluding Remarks Appendix Task Placement Heuristics Greedy Task Placement – incremental, puts each task on the node with the lowest computational load on which it can fit without violating memory constraints MCB Task Placement – global, iteratively applies multi-capacity (vector) bin-packing heuristics during a binary search for the maximized minimum yield achieves higher minimum yield values than Greedy can potentially cause lots of migration what if the system is oversubscribed? use a priority function to decide which jobs to run M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 7. Introduction Implementation Concluding Remarks Appendix Priority Function? Virtual Time: subjective time experienced by a job first idea: 1 VIRTUAL TIME informed by ideas about fairness lead to good results theoretically prone to starvation second idea: FLOW TIME VIRTUAL TIME addresses starvation problem lead to poor performance Third Idea: FLOW TIME (VIRTUAL TIME)2 combines idea #1 and idea #2 addresses starvation performs about the same as first priority function M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 8. Introduction Implementation Concluding Remarks Appendix When to apply Heuristics We consider a number of different options: Job Submission – heuristics can use greedy or bin packing approaches Job Completion – as above, can help with throughput when there are lots of short running jobs Periodically – some heuristics periodically apply vector packing to improve overall job placement M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 9. Introduction Implementation Concluding Remarks Appendix Methodology experiments conducted using discrete event simulator mix of synthetic and real trace data considered only memory requirements and CPU needs preemption/migration of jobs causes 300 second penalty to runtime periodic approaches use a 600 second (10 minute) period absolute bound on max stretch computed for each instance performance comparison based on degradation from max stretch bound M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 10. Introduction Implementation Concluding Remarks Appendix Simulation Algorithms FCFS EASY Greedy* GreedyPM* /per GreedyPM*/per MCB*/per GreedyPM*/per/mvt MCB*/per/mvt /stretch-per M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 11. Introduction Implementation Concluding Remarks Appendix Max Stretch Degradation vs. Load, 5 minute penalty 1 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 MaxstretchdegradationFromBound Load FCFS EASY Greedy* GreedyPM* /per GreedyPM*/per MCB*/per /stretch-per MCB*/per/mvt GreedyPM*/per/mvt M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 12. Introduction Implementation Concluding Remarks Appendix Bandwidth vs. Period 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 2000 4000 6000 8000 10000 12000 AverageBandwidth(GB/s) Period (seconds) M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 13. Introduction Implementation Concluding Remarks Appendix Max Stretch Degradation vs. Period 5 6 7 8 9 10 11 12 13 14 2000 4000 6000 8000 10000 12000 AverageMaxstretchDegradation Period (seconds) M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 14. Introduction Implementation Concluding Remarks Appendix Outline Introduction Implementation Concluding Remarks M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 15. Introduction Implementation Concluding Remarks Appendix Development Platform Grid5000 (http://www.grid5000.fr) Kadeploy images Genepi Nodes: Quad Core Xeon 2.5GHz w/ 8GB ram Open-Source Software Xen 4.0 Debian Squeeze / Linux 2.6.32 M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 16. Introduction Implementation Concluding Remarks Appendix Goals “small” experiment topics: VM overhead resource requirement/needs estimation time sharing performance impact preemption/migration costs bandwidth consumption performance impact migration planning strategies “large” experiment topic: DFRS in practice M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 17. Introduction Implementation Concluding Remarks Appendix Goals “small” experiment topics: VM overhead resource requirement/needs estimation time sharing performance impact preemption/migration costs bandwidth consumption performance impact migration planning strategies “large” experiment topic: DFRS in practice M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 18. Introduction Implementation Concluding Remarks Appendix Goals “small” experiment topics: VM overhead resource requirement/needs estimation time sharing performance impact preemption/migration costs bandwidth consumption performance impact migration planning strategies “large” experiment topic: DFRS in practice M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 19. Introduction Implementation Concluding Remarks Appendix Goals “small” experiment topics: VM overhead resource requirement/needs estimation time sharing performance impact preemption/migration costs bandwidth consumption performance impact migration planning strategies “large” experiment topic: DFRS in practice M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 20. Introduction Implementation Concluding Remarks Appendix Goals “small” experiment topics: VM overhead resource requirement/needs estimation time sharing performance impact preemption/migration costs bandwidth consumption performance impact migration planning strategies “large” experiment topic: DFRS in practice M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 21. Introduction Implementation Concluding Remarks Appendix Estimating Resource Needs Techniques: repeated runs for benchmarking / model building online model building introspection / configuration variation Models: constant values time-varying values random variables / probability distributions Questions: how accurate can we be? how accurate do we need to be? M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 22. Introduction Implementation Concluding Remarks Appendix Benchmark Task VM Memory Requirements bench/tasks 1 2 3 4 CG 477M 251M – 147M EP <64M <64M <64M <64M FT 1930M 977M – 499M IS 637M 329M – 176M LU 218M 125M – 80M MG 485M 259M – 147M SP 369M – – 123M M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 23. Introduction Implementation Concluding Remarks Appendix Time Sharing Performance Impact Process thrashing!!! Gang Scheduling Flexible Co-scheduling Uncoordinated might not be so bad... M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 24. Introduction Implementation Concluding Remarks Appendix VM Consolidation 2 4 6 8 10 12 14 1 2 3 4 5 6 7 8 slowdown consolidation factor (maximum tasks/core) y = x CG EP FT IS LU MG M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 25. Introduction Implementation Concluding Remarks Appendix Parallel Task Preemption/Migration preemption of parallel distributed jobs is theoretically difficult in the general case this may require coordinated shutdown on the target platform everything happens quickly enough that it isn’t an issue migration can be done by preempt/restore or “live” M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 26. Introduction Implementation Concluding Remarks Appendix Preemption/Migration Timing Results 0 10 20 30 40 50 60 70 80 90 100 400 600 800 1000 1200 1400 1600 1800 2000 Time(seconds) Memory Allocation (MB) Preempt Restore Preempt+Restore Migrate M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 27. Introduction Implementation Concluding Remarks Appendix Migration Planning using only live migration is NP-hard could fail (circular dependencies) heuristic: try to move the highest priority job if it can’t move, try next highest if none can move, preempt the lowest priority job scheduled for migration and try again restore any preempted jobs at the end M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 28. Introduction Implementation Concluding Remarks Appendix Planned Implementation Validation Experiment generate traces using an accepted model annotate them with CPU/memory values ”fill in” the traces using benchmarks... NAS parallel benchmarks (HPC) TCPP benchmarks (Service Hosting) M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 29. Introduction Implementation Concluding Remarks Appendix Future Work complete development of practical implementation extend the model nonlocal resources heterogeneous nodes varying communications delays new or arbitrary optimization targets fault tolerance energy or other costs formal definitions of fairness M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 30. Introduction Implementation Concluding Remarks Appendix Summary DFRS performs well in simulation orders of magnitude better than batch close to a theoretical bound prototype in development preliminary results are promising M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice
  • 31. Introduction Implementation Concluding Remarks Appendix References I Stillwell, M., Schanzenbach, D., Vivien, F., and Casanova, H. (2010). Resource allocation algorithms for virtualized service hosting platforms. Journal of Parallel and Distributed Computing, 70(9):962–974. M. Stillwell, F. Vivien, H. Casanova INRIA DFRS in Practice