Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computing System
I am Sugree Phatanapherom from Kasetsart University.
This research is a co-work with Asst. Prof. Putchong Uthayopas.
Ready, steady, go.
What is batch scheduler?
Batch scheduler is responsible to schedule jobs to execute on resources at the right time.
Why do we need batch scheduler?
To utilize resources efficiently.
To finish all jobs as fast as possible.
To minimize power consumption.
In general, it is so called "resource scheduling problem".
Jobs, Resources and Time time resources
In this research, main criteria is to minimize cost to run the resources.
Back to the past, most works focused on improving algorithms.
To simplify the problem, this research limits scope job characteristics to independent sequential jobs.
In short, a job contains the one and only one task.
In other words, job = task.
Scheduling Algorithms Scheduling On-line Batch RR OLB MET MCT MinMin MaxMin Sufferage XSufferage CMinMin CMaxMin CSufferage
There are on-line and batch scheduling.
The most simple algorithm is "Round Robin".
"Opportunistic Load Balancing" assigns job to the next available machine.
"Minimum Execution Time" assigns job to the fastest machine.
"Minimum Completion Time" assigns job to the machine with minimum completion time for that job.
Next are batch scheduling algorithms.
"MinMin" assigns shortest job to the fastest machine.
"MaxMin" assign longest job to the fastest machine.
"Sufferage" is reassignable MaxMin.
"XSufferage" is Sufferage with data locality.
CMinMin, CMaxMin and CSufferage are derivative with costing.
How to verify? How to evaluate?
The answer is simulation. Why?
Closed. Controllable. Reproducible.
Simulation is assumption and modeling.
Grid is a meta-scheduler and underlying cluster schedulers managing hosts.
Grid Grid Scheduler Cluster Scheduler Host Cluster Scheduler Cluster Scheduler jobs Host
Interconnection between scheduler and processors are dedicated.
Network Scheduler Processor Storage Processor Processor Processor
Job consists of inputs, outputs and executable.
Job Executable Input Output Machine
Operations are 2 steps; mapping and scheduling.
Mapping "job" to "machine".
Schedule "job" to the exact time.
In short, the result is generic priority index.
 
Time ready time execution time deadline period before deadline time
Cost cumulative cost cost cost
Experimented based on GAMESS job log in ThaiGrid to assume a small and a big system and named them, KUGrid and ThaiGrid, r...
Makespan and cost are observed.
Makespan is the period of time from when the first job submitted to the last job finished.
Price-Performance
Cost
Makespan
Looks great! Any problems? Yes!
Priority index contains 5 factors. What are the right values?
What are the factors of those factors?
There are so many dependencies. Job characteristics. Resource characteristics. User characteristics.
This problem is so called "Multi-variate Optimization".
Plus, a bit more complex with evaluation in simulator.
How to solve?
Optimization Architecture Optimizer Simulator Simulator Simulator Simulator Batch Scheduler Monitoring System Accounting S...
Optimization Algorithm?
Particle Swarm Optimization is selected as the first one to try.
The position of each particle in n-dimension plane represents solution.
PSO is social influence in various scopes.
Local, neighbor and global.
Usually, one trust oneself, friends and the world, respectively. The level of trust.
PSO
How to fully automate self-tuning process?
Historical data are the key.
The quality of solution depends on optimizer.
Running optimizer longer may return better solution.
Precision of using historical data depends on data period and amount of data.
How to use historical data? Log replay or estimation.
How to maximize solution quality to near optimal?
Just run more simulations using the whole grid system to optimize itself at night!
Results? Please accept my apologize. They are not published yet.
Conclusion.
Flexible algorithms introduce more adjustable factors.
The factors are vary from time to time.
In other view, these algorithms are improved by external optimization periodically.
Particle swarm optimization is selected to solve multi-variate optimization.
Improve scheduler by scheduler itself.
Any questions?
Upcoming SlideShare
Loading in …5
×

Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computing System

2,307 views
2,232 views

Published on

Presentation of academic research with 140-char limit

Published in: Technology, Business
1 Comment
1 Like
Statistics
Notes
  • Batch
    Online
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
2,307
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
49
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computing System

  1. 1. Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computing System
  2. 2. I am Sugree Phatanapherom from Kasetsart University.
  3. 3. This research is a co-work with Asst. Prof. Putchong Uthayopas.
  4. 4. Ready, steady, go.
  5. 5. What is batch scheduler?
  6. 6. Batch scheduler is responsible to schedule jobs to execute on resources at the right time.
  7. 7. Why do we need batch scheduler?
  8. 8. To utilize resources efficiently.
  9. 9. To finish all jobs as fast as possible.
  10. 10. To minimize power consumption.
  11. 11. In general, it is so called "resource scheduling problem".
  12. 12. Jobs, Resources and Time time resources
  13. 13. In this research, main criteria is to minimize cost to run the resources.
  14. 14. Back to the past, most works focused on improving algorithms.
  15. 15. To simplify the problem, this research limits scope job characteristics to independent sequential jobs.
  16. 16. In short, a job contains the one and only one task.
  17. 17. In other words, job = task.
  18. 18. Scheduling Algorithms Scheduling On-line Batch RR OLB MET MCT MinMin MaxMin Sufferage XSufferage CMinMin CMaxMin CSufferage
  19. 19. There are on-line and batch scheduling.
  20. 20. The most simple algorithm is "Round Robin".
  21. 21. "Opportunistic Load Balancing" assigns job to the next available machine.
  22. 22. "Minimum Execution Time" assigns job to the fastest machine.
  23. 23. "Minimum Completion Time" assigns job to the machine with minimum completion time for that job.
  24. 24. Next are batch scheduling algorithms.
  25. 25. "MinMin" assigns shortest job to the fastest machine.
  26. 26. "MaxMin" assign longest job to the fastest machine.
  27. 27. "Sufferage" is reassignable MaxMin.
  28. 28. "XSufferage" is Sufferage with data locality.
  29. 29. CMinMin, CMaxMin and CSufferage are derivative with costing.
  30. 30. How to verify? How to evaluate?
  31. 31. The answer is simulation. Why?
  32. 32. Closed. Controllable. Reproducible.
  33. 33. Simulation is assumption and modeling.
  34. 34. Grid is a meta-scheduler and underlying cluster schedulers managing hosts.
  35. 35. Grid Grid Scheduler Cluster Scheduler Host Cluster Scheduler Cluster Scheduler jobs Host
  36. 36. Interconnection between scheduler and processors are dedicated.
  37. 37. Network Scheduler Processor Storage Processor Processor Processor
  38. 38. Job consists of inputs, outputs and executable.
  39. 39. Job Executable Input Output Machine
  40. 40. Operations are 2 steps; mapping and scheduling.
  41. 41. Mapping "job" to "machine".
  42. 42. Schedule "job" to the exact time.
  43. 43. In short, the result is generic priority index.
  44. 45. Time ready time execution time deadline period before deadline time
  45. 46. Cost cumulative cost cost cost
  46. 47. Experimented based on GAMESS job log in ThaiGrid to assume a small and a big system and named them, KUGrid and ThaiGrid, respectively.
  47. 48. Makespan and cost are observed.
  48. 49. Makespan is the period of time from when the first job submitted to the last job finished.
  49. 50. Price-Performance
  50. 51. Cost
  51. 52. Makespan
  52. 53. Looks great! Any problems? Yes!
  53. 54. Priority index contains 5 factors. What are the right values?
  54. 55. What are the factors of those factors?
  55. 56. There are so many dependencies. Job characteristics. Resource characteristics. User characteristics.
  56. 57. This problem is so called "Multi-variate Optimization".
  57. 58. Plus, a bit more complex with evaluation in simulator.
  58. 59. How to solve?
  59. 60. Optimization Architecture Optimizer Simulator Simulator Simulator Simulator Batch Scheduler Monitoring System Accounting System
  60. 61. Optimization Algorithm?
  61. 62. Particle Swarm Optimization is selected as the first one to try.
  62. 63. The position of each particle in n-dimension plane represents solution.
  63. 64. PSO is social influence in various scopes.
  64. 65. Local, neighbor and global.
  65. 66. Usually, one trust oneself, friends and the world, respectively. The level of trust.
  66. 67. PSO
  67. 68. How to fully automate self-tuning process?
  68. 69. Historical data are the key.
  69. 70. The quality of solution depends on optimizer.
  70. 71. Running optimizer longer may return better solution.
  71. 72. Precision of using historical data depends on data period and amount of data.
  72. 73. How to use historical data? Log replay or estimation.
  73. 74. How to maximize solution quality to near optimal?
  74. 75. Just run more simulations using the whole grid system to optimize itself at night!
  75. 76. Results? Please accept my apologize. They are not published yet.
  76. 77. Conclusion.
  77. 78. Flexible algorithms introduce more adjustable factors.
  78. 79. The factors are vary from time to time.
  79. 80. In other view, these algorithms are improved by external optimization periodically.
  80. 81. Particle swarm optimization is selected to solve multi-variate optimization.
  81. 82. Improve scheduler by scheduler itself.
  82. 83. Any questions?

×