Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computing System
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computing System

Uploaded on

Presentation of academic research with 140-char limit

Presentation of academic research with 140-char limit

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Batch
    Are you sure you want to
    Your message goes here
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 8 8

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computing System
  • 2. I am Sugree Phatanapherom from Kasetsart University.
  • 3. This research is a co-work with Asst. Prof. Putchong Uthayopas.
  • 4. Ready, steady, go.
  • 5. What is batch scheduler?
  • 6. Batch scheduler is responsible to schedule jobs to execute on resources at the right time.
  • 7. Why do we need batch scheduler?
  • 8. To utilize resources efficiently.
  • 9. To finish all jobs as fast as possible.
  • 10. To minimize power consumption.
  • 11. In general, it is so called "resource scheduling problem".
  • 12. Jobs, Resources and Time time resources
  • 13. In this research, main criteria is to minimize cost to run the resources.
  • 14. Back to the past, most works focused on improving algorithms.
  • 15. To simplify the problem, this research limits scope job characteristics to independent sequential jobs.
  • 16. In short, a job contains the one and only one task.
  • 17. In other words, job = task.
  • 18. Scheduling Algorithms Scheduling On-line Batch RR OLB MET MCT MinMin MaxMin Sufferage XSufferage CMinMin CMaxMin CSufferage
  • 19. There are on-line and batch scheduling.
  • 20. The most simple algorithm is "Round Robin".
  • 21. "Opportunistic Load Balancing" assigns job to the next available machine.
  • 22. "Minimum Execution Time" assigns job to the fastest machine.
  • 23. "Minimum Completion Time" assigns job to the machine with minimum completion time for that job.
  • 24. Next are batch scheduling algorithms.
  • 25. "MinMin" assigns shortest job to the fastest machine.
  • 26. "MaxMin" assign longest job to the fastest machine.
  • 27. "Sufferage" is reassignable MaxMin.
  • 28. "XSufferage" is Sufferage with data locality.
  • 29. CMinMin, CMaxMin and CSufferage are derivative with costing.
  • 30. How to verify? How to evaluate?
  • 31. The answer is simulation. Why?
  • 32. Closed. Controllable. Reproducible.
  • 33. Simulation is assumption and modeling.
  • 34. Grid is a meta-scheduler and underlying cluster schedulers managing hosts.
  • 35. Grid Grid Scheduler Cluster Scheduler Host Cluster Scheduler Cluster Scheduler jobs Host
  • 36. Interconnection between scheduler and processors are dedicated.
  • 37. Network Scheduler Processor Storage Processor Processor Processor
  • 38. Job consists of inputs, outputs and executable.
  • 39. Job Executable Input Output Machine
  • 40. Operations are 2 steps; mapping and scheduling.
  • 41. Mapping "job" to "machine".
  • 42. Schedule "job" to the exact time.
  • 43. In short, the result is generic priority index.
  • 44.  
  • 45. Time ready time execution time deadline period before deadline time
  • 46. Cost cumulative cost cost cost
  • 47. Experimented based on GAMESS job log in ThaiGrid to assume a small and a big system and named them, KUGrid and ThaiGrid, respectively.
  • 48. Makespan and cost are observed.
  • 49. Makespan is the period of time from when the first job submitted to the last job finished.
  • 50. Price-Performance
  • 51. Cost
  • 52. Makespan
  • 53. Looks great! Any problems? Yes!
  • 54. Priority index contains 5 factors. What are the right values?
  • 55. What are the factors of those factors?
  • 56. There are so many dependencies. Job characteristics. Resource characteristics. User characteristics.
  • 57. This problem is so called "Multi-variate Optimization".
  • 58. Plus, a bit more complex with evaluation in simulator.
  • 59. How to solve?
  • 60. Optimization Architecture Optimizer Simulator Simulator Simulator Simulator Batch Scheduler Monitoring System Accounting System
  • 61. Optimization Algorithm?
  • 62. Particle Swarm Optimization is selected as the first one to try.
  • 63. The position of each particle in n-dimension plane represents solution.
  • 64. PSO is social influence in various scopes.
  • 65. Local, neighbor and global.
  • 66. Usually, one trust oneself, friends and the world, respectively. The level of trust.
  • 67. PSO
  • 68. How to fully automate self-tuning process?
  • 69. Historical data are the key.
  • 70. The quality of solution depends on optimizer.
  • 71. Running optimizer longer may return better solution.
  • 72. Precision of using historical data depends on data period and amount of data.
  • 73. How to use historical data? Log replay or estimation.
  • 74. How to maximize solution quality to near optimal?
  • 75. Just run more simulations using the whole grid system to optimize itself at night!
  • 76. Results? Please accept my apologize. They are not published yet.
  • 77. Conclusion.
  • 78. Flexible algorithms introduce more adjustable factors.
  • 79. The factors are vary from time to time.
  • 80. In other view, these algorithms are improved by external optimization periodically.
  • 81. Particle swarm optimization is selected to solve multi-variate optimization.
  • 82. Improve scheduler by scheduler itself.
  • 83. Any questions?