Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computing System
Upcoming SlideShare
Loading in...5
×
 

Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computing System

on

  • 3,502 views

Presentation of academic research with 140-char limit

Presentation of academic research with 140-char limit

Statistics

Views

Total Views
3,502
Views on SlideShare
3,494
Embed Views
8

Actions

Likes
1
Downloads
45
Comments
1

1 Embed 8

http://sugree.com 8

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Batch
    Online
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computing System Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computing System Presentation Transcript

  • Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computing System
  • I am Sugree Phatanapherom from Kasetsart University.
  • This research is a co-work with Asst. Prof. Putchong Uthayopas.
  • Ready, steady, go.
  • What is batch scheduler?
  • Batch scheduler is responsible to schedule jobs to execute on resources at the right time.
  • Why do we need batch scheduler?
  • To utilize resources efficiently.
  • To finish all jobs as fast as possible.
  • To minimize power consumption.
  • In general, it is so called "resource scheduling problem".
  • Jobs, Resources and Time time resources
  • In this research, main criteria is to minimize cost to run the resources.
  • Back to the past, most works focused on improving algorithms.
  • To simplify the problem, this research limits scope job characteristics to independent sequential jobs.
  • In short, a job contains the one and only one task.
  • In other words, job = task.
  • Scheduling Algorithms Scheduling On-line Batch RR OLB MET MCT MinMin MaxMin Sufferage XSufferage CMinMin CMaxMin CSufferage
  • There are on-line and batch scheduling.
  • The most simple algorithm is "Round Robin".
  • "Opportunistic Load Balancing" assigns job to the next available machine.
  • "Minimum Execution Time" assigns job to the fastest machine.
  • "Minimum Completion Time" assigns job to the machine with minimum completion time for that job.
  • Next are batch scheduling algorithms.
  • "MinMin" assigns shortest job to the fastest machine.
  • "MaxMin" assign longest job to the fastest machine.
  • "Sufferage" is reassignable MaxMin.
  • "XSufferage" is Sufferage with data locality.
  • CMinMin, CMaxMin and CSufferage are derivative with costing.
  • How to verify? How to evaluate?
  • The answer is simulation. Why?
  • Closed. Controllable. Reproducible.
  • Simulation is assumption and modeling.
  • Grid is a meta-scheduler and underlying cluster schedulers managing hosts.
  • Grid Grid Scheduler Cluster Scheduler Host Cluster Scheduler Cluster Scheduler jobs Host
  • Interconnection between scheduler and processors are dedicated.
  • Network Scheduler Processor Storage Processor Processor Processor
  • Job consists of inputs, outputs and executable.
  • Job Executable Input Output Machine
  • Operations are 2 steps; mapping and scheduling.
  • Mapping "job" to "machine".
  • Schedule "job" to the exact time.
  • In short, the result is generic priority index.
  •  
  • Time ready time execution time deadline period before deadline time
  • Cost cumulative cost cost cost
  • Experimented based on GAMESS job log in ThaiGrid to assume a small and a big system and named them, KUGrid and ThaiGrid, respectively.
  • Makespan and cost are observed.
  • Makespan is the period of time from when the first job submitted to the last job finished.
  • Price-Performance
  • Cost
  • Makespan
  • Looks great! Any problems? Yes!
  • Priority index contains 5 factors. What are the right values?
  • What are the factors of those factors?
  • There are so many dependencies. Job characteristics. Resource characteristics. User characteristics.
  • This problem is so called "Multi-variate Optimization".
  • Plus, a bit more complex with evaluation in simulator.
  • How to solve?
  • Optimization Architecture Optimizer Simulator Simulator Simulator Simulator Batch Scheduler Monitoring System Accounting System
  • Optimization Algorithm?
  • Particle Swarm Optimization is selected as the first one to try.
  • The position of each particle in n-dimension plane represents solution.
  • PSO is social influence in various scopes.
  • Local, neighbor and global.
  • Usually, one trust oneself, friends and the world, respectively. The level of trust.
  • PSO
  • How to fully automate self-tuning process?
  • Historical data are the key.
  • The quality of solution depends on optimizer.
  • Running optimizer longer may return better solution.
  • Precision of using historical data depends on data period and amount of data.
  • How to use historical data? Log replay or estimation.
  • How to maximize solution quality to near optimal?
  • Just run more simulations using the whole grid system to optimize itself at night!
  • Results? Please accept my apologize. They are not published yet.
  • Conclusion.
  • Flexible algorithms introduce more adjustable factors.
  • The factors are vary from time to time.
  • In other view, these algorithms are improved by external optimization periodically.
  • Particle swarm optimization is selected to solve multi-variate optimization.
  • Improve scheduler by scheduler itself.
  • Any questions?