Loading...
Flash Player 9 (or above) is needed to view slideshows. We have detected that you do not have it on your computer.To install it, go here
Automatic Self-Tuning Architecture for Batch Scheduler on Large Scale Computing System
Presentation of academic research with 140-char limit
799 views | comments | 0 favorites | 4 downloads | 1 embeds (Stats)
More Info
This slideshow is Public
Total Views: 799 on Slideshare: 792 from embeds: 7
Most viewed embeds (Top 5):
More
Slideshow Transcript
- Slide 1: Automatic Self-Tuning Architecture for Batch
Scheduler on Large Scale Computing System
sugree
June 9, 2008 1
- Slide 2: I am Sugree Phatanapherom from Kasetsart
University.
sugree
June 9, 2008 2
- Slide 3: This research is a co-work with Asst. Prof.
Putchong Uthayopas.
sugree
June 9, 2008 3
- Slide 4: Ready, steady, go.
sugree
June 9, 2008 4
- Slide 5: What is batch scheduler?
sugree
June 9, 2008 5
- Slide 6: Batch scheduler is responsible to schedule
jobs to execute on resources at the right time.
sugree
June 9, 2008 6
- Slide 7: Why do we need batch scheduler?
sugree
June 9, 2008 7
- Slide 8: To utilize resources efficiently.
sugree
June 9, 2008 8
- Slide 9: To finish all jobs as fast as possible.
sugree
June 9, 2008 9
- Slide 10: To minimize power consumption.
sugree
June 9, 2008 10
- Slide 11: In general, it is so called \"resource scheduling
problem\".
sugree
June 9, 2008 11
- Slide 12: Jobs, Resources and Time
resources
time
June 9, 2008 12
- Slide 13: In this research, main criteria is to minimize
cost to run the resources.
sugree
June 9, 2008 13
- Slide 14: Back to the past, most works focused on
improving algorithms.
sugree
June 9, 2008 14
- Slide 15: To simplify the problem, this research limits
scope job characteristics to independent
sequential jobs.
sugree
June 9, 2008 15
- Slide 16: In short, a job contains the one and only one
task.
sugree
June 9, 2008 16
- Slide 17: In other words, job = task.
sugree
June 9, 2008 17
- Slide 18: Scheduling Algorithms
Scheduling
On-line Batch
RR OLB MET MCT MinMin MaxMin Sufferage
CSufferage XSufferage
CMinMin CMaxMin
June 9, 2008 18
- Slide 19: There are on-line and batch scheduling.
sugree
June 9, 2008 19
- Slide 20: The most simple algorithm is \"Round Robin\".
sugree
June 9, 2008 20
- Slide 21: \"Opportunistic Load Balancing\" assigns job to
the next available machine.
sugree
June 9, 2008 21
- Slide 22: \"Minimum Execution Time\" assigns job to the
fastest machine.
sugree
June 9, 2008 22
- Slide 23: \"Minimum Completion Time\" assigns job to the
machine with minimum completion time for
that job.
sugree
June 9, 2008 23
- Slide 24: Next are batch scheduling algorithms.
sugree
June 9, 2008 24
- Slide 25: \"MinMin\" assigns shortest job to the fastest
machine.
sugree
June 9, 2008 25
- Slide 26: \"MaxMin\" assign longest job to the fastest
machine.
sugree
June 9, 2008 26
- Slide 27: \"Sufferage\" is reassignable MaxMin.
sugree
June 9, 2008 27
- Slide 28: \"XSufferage\" is Sufferage with data locality.
sugree
June 9, 2008 28
- Slide 29: CMinMin, CMaxMin and CSufferage are
derivative with costing.
sugree
June 9, 2008 29
- Slide 30: How to verify? How to evaluate?
sugree
June 9, 2008 30
- Slide 31: The answer is simulation. Why?
sugree
June 9, 2008 31
- Slide 32: Closed. Controllable. Reproducible.
sugree
June 9, 2008 32
- Slide 33: Simulation is assumption and modeling.
sugree
June 9, 2008 33
- Slide 34: Grid is a meta-scheduler and underlying
cluster schedulers managing hosts.
sugree
June 9, 2008 34
- Slide 35: Grid
Host
Cluster
Scheduler
Host
jobs Cluster
Grid Scheduler Scheduler
Cluster
Scheduler
June 9, 2008 35
- Slide 36: Interconnection between scheduler and
processors are dedicated.
sugree
June 9, 2008 36
- Slide 37: Network
Storage
Scheduler
Processor Processor Processor Processor
June 9, 2008 37
- Slide 38: Job consists of inputs, outputs and
executable.
sugree
June 9, 2008 38
- Slide 39: Job
Output
Input
Executable
Machine
June 9, 2008 39
- Slide 40: Operations are 2 steps; mapping and
scheduling.
sugree
June 9, 2008 40
- Slide 41: Mapping \"job\" to \"machine\".
sugree
June 9, 2008 41
- Slide 42: Schedule \"job\" to the exact time.
sugree
June 9, 2008 42
- Slide 43: In short, the result is generic priority index.
sugree
June 9, 2008 43
- Slide 44: p ij= eij r j c ij g j d ij
sugree
June 9, 2008 44
- Slide 45: Time
execution time period before deadline
eij d ij
time
rj Di
ready time deadline
June 9, 2008 45
- Slide 46: Cost
cost
cij
cost
gj
cumulative cost
June 9, 2008 46
- Slide 47: Experimented based on GAMESS job log in
ThaiGrid to assume a small and a big system
and named them, KUGrid and ThaiGrid,
respectively.
sugree
June 9, 2008 47
- Slide 48: Makespan and cost are observed.
sugree
June 9, 2008 48
- Slide 49: Makespan is the period of time from when the
first job submitted to the last job finished.
sugree
June 9, 2008 49
- Slide 50: Price-Performance
30000
Cost-Time Ratio ($/h)
25000
20000
15000
10000
5000
0
KU Grid Thai Grid
rr olb mct met minmin
maxmin sufferage cminmin cmaxmin csufferage
June 9, 2008 50
- Slide 51: Cost
4500
Thousands
4000
3500
3000
Cost ($)
2500
2000
1500
1000
500
0
KU Grid Thai Grid
rr olb mct met minmin
maxmin sufferage cminmin cmaxmin csufferage
June 9, 2008 51
- Slide 52: Makespan
139.150
139.100
Makespan (hours)
139.050
139.000
138.950
138.900
138.850
138.800
KU Grid Thai Grid
rr olb mct met minmin
maxmin sufferage cminmin cmaxmin csufferage
June 9, 2008 52
- Slide 53: Looks great! Any problems? Yes!
sugree
June 9, 2008 53
- Slide 54: Priority index contains 5 factors. What are the
right values?
sugree
June 9, 2008 54
- Slide 55: What are the factors of those factors?
sugree
June 9, 2008 55
- Slide 56: There are so many dependencies. Job
characteristics. Resource characteristics. User
characteristics.
sugree
June 9, 2008 56
- Slide 57: This problem is so called \"Multi-variate
Optimization\".
sugree
June 9, 2008 57
- Slide 58: Plus, a bit more complex with evaluation in
simulator.
sugree
June 9, 2008 58
- Slide 59: How to solve?
sugree
June 9, 2008 59
- Slide 60: Optimization Architecture
Monitoring Accounting Batch
System System Scheduler
Simulator
Simulator
Simulator
Optimizer Simulator
June 9, 2008 60
- Slide 61: Optimization Algorithm?
sugree
June 9, 2008 61
- Slide 62: Particle Swarm Optimization is selected as the
first one to try.
sugree
June 9, 2008 62
- Slide 63: The position of each particle in n-dimension
plane represents solution.
sugree
June 9, 2008 63
- Slide 64: PSO is social influence in various scopes.
sugree
June 9, 2008 64
- Slide 65: Local, neighbor and global.
sugree
June 9, 2008 65
- Slide 66: Usually, one trust oneself, friends and the
world, respectively. The level of trust.
sugree
June 9, 2008 66
- Slide 67: PSO
June 9, 2008 67
- Slide 68: How to fully automate self-tuning process?
sugree
June 9, 2008 68
- Slide 69: Historical data are the key.
sugree
June 9, 2008 69
- Slide 70: The quality of solution depends on optimizer.
sugree
June 9, 2008 70
- Slide 71: Running optimizer longer may return better
solution.
sugree
June 9, 2008 71
- Slide 72: Precision of using historical data depends on
data period and amount of data.
sugree
June 9, 2008 72
- Slide 73: How to use historical data? Log replay or
estimation.
sugree
June 9, 2008 73
- Slide 74: How to maximize solution quality to near
optimal?
sugree
June 9, 2008 74
- Slide 75: Just run more simulations using the whole grid
system to optimize itself at night!
sugree
June 9, 2008 75
- Slide 76: Results? Please accept my apologize. They
are not published yet.
sugree
June 9, 2008 76
- Slide 77: Conclusion.
sugree
June 9, 2008 77
- Slide 78: Flexible algorithms introduce more adjustable
factors.
sugree
June 9, 2008 78
- Slide 79: The factors are vary from time to time.
sugree
June 9, 2008 79
- Slide 80: In other view, these algorithms are improved
by external optimization periodically.
sugree
June 9, 2008 80
- Slide 81: Particle swarm optimization is selected to
solve multi-variate optimization.
sugree
June 9, 2008 81
- Slide 82: Improve scheduler by scheduler itself.
sugree
June 9, 2008 82
- Slide 83: Any questions?
sugree
June 9, 2008 83