GRID COMPUTING
Grid Scheduling & Resource Management

Sandeep Kumar Poonia
Head of Dept. CS/IT, Jagan Nath University, Jaipur
B.E., M. Tech., UGC-NET
LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE
11/9/2013

Sandeep Kumar Poonia

1
OUTLINE
Introduction
Scheduling Paradigms
How Scheduling Works
A Review of Condor, SGE, PBS and LSF
Grid Scheduling with QoS
Introduction
Grid scheduling is a process of mapping Grid jobs to
resources over multiple administrative domains.
 A Grid job can be split into many small tasks.
 The scheduler has the responsibility of selecting
resources and scheduling jobs in such a way that the
user and application requirements are met, in terms of

overall execution time (throughput) and cost of the
resources utilized.
Introduction

Jobs, via Globus, can be submitted to systems managed
by Condor, the Sun Grid Engine (SGE), thr Portable Batch
System (PBS) and the Load Sharing Facility (LSF)
Scheduling Paradigms

 Centralized Scheduling
 Hierarchical Scheduling
 Distributed Scheduling
Centralized Scheduling
In a centralized scheduling environment, a central
machine (node) acts as a resource manager to
schedule jobs to all the surrounding nodes that are
part of the environment.

This scheduling paradigm is often used in situations
like a computing centre where resources have

similar characteristics and usage policies.
Centralized Scheduling

Here, jobs are first submitted to the central scheduler, which then

dispatches the jobs to the appropriate nodes. Those jobs that
cannot be started on a node are normally stored in a central job
queue for a later start.
Centralized Scheduling: Advantage & Disadvantage
 Centralized scheduling system may produce better scheduling
decisions because it has all necessary, and up-to-date,
information about the available resources.
 Centralized scheduling does not scale well with the increasing
size of the environment that it manages.
 The scheduler itself may well become a bottleneck, and if
there is a problem with the hardware or software of the
scheduler’s server, i.e. a failure,
 it presents a single point of failure in the environment.
Distributed Scheduling
 No central scheduler responsible for managing all the
jobs.

 It involves multiple localized schedulers, which interact
with each other in order to dispatch jobs to the
participating nodes.
 There are two mechanisms for
communicate with other schedulers

 Direct Communication
 Indirect Communication.

a

scheduler to
Distributed Scheduling: Direct Communication

 Each local scheduler can directly communicate with
other schedulers for job dispatching.
Distributed Scheduling: Direct Communication
 Each scheduler has a list of remote schedulers that they can
interact with, or there may exist a central directory that

maintains all the information related to each scheduler.
 If a job cannot be dispatched to its local resources, its
scheduler will communicate with other remote schedulers
to find resources appropriate and available for executing its
job.
 Each scheduler may maintain a local job queue(s) for job
management.
Distributed Scheduling: Indirect Communication
Communication via a central job pool

In this scenario, jobs that cannot be executed immediately
are sent to a central job pool.
Distributed Scheduling: Indirect Communication
Communication via a central job pool
 Compared with direct communication, the local
schedulers can potentially choose suitable jobs to
schedule on their resources.
Policies are required so that all the jobs in the pool are
executed at some time.
 This method can be modified, so that all jobs are
pushed directly in the job-pool after submission.
 This way all small jobs requiring few resources can
be used for utilizing free resources on all
machines.
Hierarchical scheduling

In hierarchical scheduling, a centralized scheduler interacts with
local schedulers for job submission. The centralized scheduler is a
kind of a meta-scheduler that dispatches submitted jobs to local
schedulers.
Hierarchical scheduling
 Similar to the centralized scheduling paradigm,
hierarchical scheduling can have scalability and
communication bottlenecks.
 However, compared with centralized scheduling,
one advantage of hierarchical scheduling is that
the global scheduler and local scheduler can have

different policies in scheduling jobs.
HOW SCHEDULING WORKS
Grid scheduling involves four main stages:
 resource discovery,
 resource selection,
 schedule generation and
 job execution
Resource discovery
Goal: identify a list of authenticated resources that are
available for job submission.
In order to cope with the dynamic nature of the Grid,
a scheduler needs to have some way of incorporating
dynamic state information about the available
resources into its decision-making process.
A Grid environment typically uses
a pull model,
a push model or
a push–pull model
for resource discovery.
Resource discovery : The pull model

A single daemon associated with the scheduler can query
Grid resources and collect state information such as CPU
loads or the available memory.
Resource discovery : The pull model
 The pull model for gathering resource information
incurs relatively small communication overhead,
but unless it requests resource information
frequently, it tends to provide fairly stale
information which is likely to be constantly out-ofdate, and potentially misleading.
 In
centralized
scheduling,
the
resource
discovery/query process could be rather intrusive
and begin to take significant amounts of time as
the environment being monitored gets larger and
larger.
Resource discovery : The push model
Resource discovery
 Each resource in the environment has a daemon
for gathering local state information,
 which will be sent to a centralized scheduler that
maintains a database to record each resource’s
activity.
 If the updates are frequent, an accurate view of
the system state can be maintained over time;
obviously, frequent updates to the database are
intrusive and consume network bandwidth.
Resource discovery : The push–pull model

The push–pull model lies somewhere between the pull model and
the push model.
Resource discovery : The push–pull model
 Each resource in the environment runs a daemon
that collects state information.
 Instead of directly sending this information to a
central scheduler, there exist some intermediate
nodes running daemons that aggregate state
information from different sub-resources that
respond to queries from the scheduler.
 A challenge of this model is to find out what
information is most useful, how often it should be
collected and how long this information should be
kept around.
Resource Selection
The second phase of the scheduling process :
 Select those resources that best suit the constraints
and conditions imposed by the user, such as CPU
usage, RAM available or disk storage.
 The result of resource selection is to identify a
resource list Rselected in which all resources can meet
the minimum requirements for a submitted job or a
job list.
 The relationship between resources available
Ravailable and resources selected Rselected is:
 Rselected ⊆ Ravailable
Resource Generation

The generation of schedules involves two
steps,
 selecting jobs and
 producing resource selection strategies.
Resource Generation : Job Selection
 The resource selection process is used to choose
resource(s) from the resource list Rselected for a given
job.
 Since all resources in the list Rselected could meet the
minimum requirements imposed by the job, an
algorithm is needed to choose the best resource(s) to
execute the job.
 Although random selection is a choice, it is not an
ideal resource selection policy.
 The resource selection algorithm should take into
account the current state of resources and choose the
best one based on a quantitative evaluation.
Resource Generation : Job Selection
A resource selection algorithm that only takes CPU and RAM into
account could be designed as follows:

where :
WCPU – the weight allocated to
CPU speed;
CPUload – the current CPU load;
CPUspeed – real CPU speed;
CPUmin – minimum CPU speed;

WRAM – the weight allocated to
RAM;
RAMusage – the current RAM
usage;
RAMsize – original RAM size; and
RAMmin – minimum RAM size.
Resource Generation : Job Selection
Example: Suppose that the total weighting used in the
algorithm is 10, where the CPU weight is 6 and the RAM weight
is 4. The minimum CPU speed is 1 GHz and minimum RAM size
is 256 MB. Resource information matrix is as follow:

Find the best resource for submitted job.
Resource Generation : Job Selection
Then, evaluation values for resources can be
calculated using the three formulas:

From the results we know Resource3 is the best choice
for the submitted job.
Resource Generation : Resource Selection
The goal of job selection is to select a job from a
job queue for execution. Four strategies that can
be used to select a job are given below.
 First come first serve
 Random Selection
 Priority-based Selection

 Backfilling Selection
Resource Generation : Resource Selection
First come first serve:
 The scheduler selects jobs for execution in the order of
their submissions.
 If there is no resource available for the selected job, the
scheduler will wait until the job can be started.
 The other jobs in the job queue have to wait.

There are two main drawbacks with this type of job selection.
1. It may waste resources when, for example, the job
selected needs more resources to be available before
it can start, which results in a long waiting time.
2. jobs with high priorities cannot get dispatched
immediately if a job with a low priority needs more
time to complete.
Resource Generation : Resource Selection
Random selection:
 The next job to be scheduled is randomly
selected from the job queue.
 Apart from the two drawbacks with the firstcome-first-serve strategy, jobs selection is not
fair and job submitted earlier may not be
scheduled until much later.
Resource Generation : Resource Selection
Priority-based selection:
 Jobs submitted to the scheduler have different
priorities.
 The next job to be scheduled is the job with the
highest priority in the job queue.
 A job priority can be set when the job is submitted.
 One drawback of this strategy is that it is hard to set
an optimal criterion for a job priority.
 A job with the highest priority may need more
resources than available and may also result in a long
waiting time and inability to make good use of the
available resources.
Resource Generation : Resource Selection

Backfilling selection:
 The backfilling strategy requires knowledge of
the expected execution time of a job to be
scheduled.

 If the next job in the job queue cannot be
started due to a lack of available resources,

backfilling tries to find another job in the queue
that can use the idle resources.
Job execution
 Once a job and a resource are selected, the next

step is to submit the job to the resource for
execution.
 Job execution may be as easy as running a single
command or as complicated as running a series
of scripts that may, or may not, include set up or

staging.

11. grid scheduling and resource managament

  • 1.
    GRID COMPUTING Grid Scheduling& Resource Management Sandeep Kumar Poonia Head of Dept. CS/IT, Jagan Nath University, Jaipur B.E., M. Tech., UGC-NET LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE 11/9/2013 Sandeep Kumar Poonia 1
  • 2.
    OUTLINE Introduction Scheduling Paradigms How SchedulingWorks A Review of Condor, SGE, PBS and LSF Grid Scheduling with QoS
  • 3.
    Introduction Grid scheduling isa process of mapping Grid jobs to resources over multiple administrative domains.  A Grid job can be split into many small tasks.  The scheduler has the responsibility of selecting resources and scheduling jobs in such a way that the user and application requirements are met, in terms of overall execution time (throughput) and cost of the resources utilized.
  • 4.
    Introduction Jobs, via Globus,can be submitted to systems managed by Condor, the Sun Grid Engine (SGE), thr Portable Batch System (PBS) and the Load Sharing Facility (LSF)
  • 5.
    Scheduling Paradigms  CentralizedScheduling  Hierarchical Scheduling  Distributed Scheduling
  • 6.
    Centralized Scheduling In acentralized scheduling environment, a central machine (node) acts as a resource manager to schedule jobs to all the surrounding nodes that are part of the environment. This scheduling paradigm is often used in situations like a computing centre where resources have similar characteristics and usage policies.
  • 7.
    Centralized Scheduling Here, jobsare first submitted to the central scheduler, which then dispatches the jobs to the appropriate nodes. Those jobs that cannot be started on a node are normally stored in a central job queue for a later start.
  • 8.
    Centralized Scheduling: Advantage& Disadvantage  Centralized scheduling system may produce better scheduling decisions because it has all necessary, and up-to-date, information about the available resources.  Centralized scheduling does not scale well with the increasing size of the environment that it manages.  The scheduler itself may well become a bottleneck, and if there is a problem with the hardware or software of the scheduler’s server, i.e. a failure,  it presents a single point of failure in the environment.
  • 9.
    Distributed Scheduling  Nocentral scheduler responsible for managing all the jobs.  It involves multiple localized schedulers, which interact with each other in order to dispatch jobs to the participating nodes.  There are two mechanisms for communicate with other schedulers  Direct Communication  Indirect Communication. a scheduler to
  • 10.
    Distributed Scheduling: DirectCommunication  Each local scheduler can directly communicate with other schedulers for job dispatching.
  • 11.
    Distributed Scheduling: DirectCommunication  Each scheduler has a list of remote schedulers that they can interact with, or there may exist a central directory that maintains all the information related to each scheduler.  If a job cannot be dispatched to its local resources, its scheduler will communicate with other remote schedulers to find resources appropriate and available for executing its job.  Each scheduler may maintain a local job queue(s) for job management.
  • 12.
    Distributed Scheduling: IndirectCommunication Communication via a central job pool In this scenario, jobs that cannot be executed immediately are sent to a central job pool.
  • 13.
    Distributed Scheduling: IndirectCommunication Communication via a central job pool  Compared with direct communication, the local schedulers can potentially choose suitable jobs to schedule on their resources. Policies are required so that all the jobs in the pool are executed at some time.  This method can be modified, so that all jobs are pushed directly in the job-pool after submission.  This way all small jobs requiring few resources can be used for utilizing free resources on all machines.
  • 14.
    Hierarchical scheduling In hierarchicalscheduling, a centralized scheduler interacts with local schedulers for job submission. The centralized scheduler is a kind of a meta-scheduler that dispatches submitted jobs to local schedulers.
  • 15.
    Hierarchical scheduling  Similarto the centralized scheduling paradigm, hierarchical scheduling can have scalability and communication bottlenecks.  However, compared with centralized scheduling, one advantage of hierarchical scheduling is that the global scheduler and local scheduler can have different policies in scheduling jobs.
  • 16.
    HOW SCHEDULING WORKS Gridscheduling involves four main stages:  resource discovery,  resource selection,  schedule generation and  job execution
  • 17.
    Resource discovery Goal: identifya list of authenticated resources that are available for job submission. In order to cope with the dynamic nature of the Grid, a scheduler needs to have some way of incorporating dynamic state information about the available resources into its decision-making process. A Grid environment typically uses a pull model, a push model or a push–pull model for resource discovery.
  • 18.
    Resource discovery :The pull model A single daemon associated with the scheduler can query Grid resources and collect state information such as CPU loads or the available memory.
  • 19.
    Resource discovery :The pull model  The pull model for gathering resource information incurs relatively small communication overhead, but unless it requests resource information frequently, it tends to provide fairly stale information which is likely to be constantly out-ofdate, and potentially misleading.  In centralized scheduling, the resource discovery/query process could be rather intrusive and begin to take significant amounts of time as the environment being monitored gets larger and larger.
  • 20.
    Resource discovery :The push model
  • 21.
    Resource discovery  Eachresource in the environment has a daemon for gathering local state information,  which will be sent to a centralized scheduler that maintains a database to record each resource’s activity.  If the updates are frequent, an accurate view of the system state can be maintained over time; obviously, frequent updates to the database are intrusive and consume network bandwidth.
  • 22.
    Resource discovery :The push–pull model The push–pull model lies somewhere between the pull model and the push model.
  • 23.
    Resource discovery :The push–pull model  Each resource in the environment runs a daemon that collects state information.  Instead of directly sending this information to a central scheduler, there exist some intermediate nodes running daemons that aggregate state information from different sub-resources that respond to queries from the scheduler.  A challenge of this model is to find out what information is most useful, how often it should be collected and how long this information should be kept around.
  • 24.
    Resource Selection The secondphase of the scheduling process :  Select those resources that best suit the constraints and conditions imposed by the user, such as CPU usage, RAM available or disk storage.  The result of resource selection is to identify a resource list Rselected in which all resources can meet the minimum requirements for a submitted job or a job list.  The relationship between resources available Ravailable and resources selected Rselected is:  Rselected ⊆ Ravailable
  • 25.
    Resource Generation The generationof schedules involves two steps,  selecting jobs and  producing resource selection strategies.
  • 26.
    Resource Generation :Job Selection  The resource selection process is used to choose resource(s) from the resource list Rselected for a given job.  Since all resources in the list Rselected could meet the minimum requirements imposed by the job, an algorithm is needed to choose the best resource(s) to execute the job.  Although random selection is a choice, it is not an ideal resource selection policy.  The resource selection algorithm should take into account the current state of resources and choose the best one based on a quantitative evaluation.
  • 27.
    Resource Generation :Job Selection A resource selection algorithm that only takes CPU and RAM into account could be designed as follows: where : WCPU – the weight allocated to CPU speed; CPUload – the current CPU load; CPUspeed – real CPU speed; CPUmin – minimum CPU speed; WRAM – the weight allocated to RAM; RAMusage – the current RAM usage; RAMsize – original RAM size; and RAMmin – minimum RAM size.
  • 28.
    Resource Generation :Job Selection Example: Suppose that the total weighting used in the algorithm is 10, where the CPU weight is 6 and the RAM weight is 4. The minimum CPU speed is 1 GHz and minimum RAM size is 256 MB. Resource information matrix is as follow: Find the best resource for submitted job.
  • 29.
    Resource Generation :Job Selection Then, evaluation values for resources can be calculated using the three formulas: From the results we know Resource3 is the best choice for the submitted job.
  • 30.
    Resource Generation :Resource Selection The goal of job selection is to select a job from a job queue for execution. Four strategies that can be used to select a job are given below.  First come first serve  Random Selection  Priority-based Selection  Backfilling Selection
  • 31.
    Resource Generation :Resource Selection First come first serve:  The scheduler selects jobs for execution in the order of their submissions.  If there is no resource available for the selected job, the scheduler will wait until the job can be started.  The other jobs in the job queue have to wait. There are two main drawbacks with this type of job selection. 1. It may waste resources when, for example, the job selected needs more resources to be available before it can start, which results in a long waiting time. 2. jobs with high priorities cannot get dispatched immediately if a job with a low priority needs more time to complete.
  • 32.
    Resource Generation :Resource Selection Random selection:  The next job to be scheduled is randomly selected from the job queue.  Apart from the two drawbacks with the firstcome-first-serve strategy, jobs selection is not fair and job submitted earlier may not be scheduled until much later.
  • 33.
    Resource Generation :Resource Selection Priority-based selection:  Jobs submitted to the scheduler have different priorities.  The next job to be scheduled is the job with the highest priority in the job queue.  A job priority can be set when the job is submitted.  One drawback of this strategy is that it is hard to set an optimal criterion for a job priority.  A job with the highest priority may need more resources than available and may also result in a long waiting time and inability to make good use of the available resources.
  • 34.
    Resource Generation :Resource Selection Backfilling selection:  The backfilling strategy requires knowledge of the expected execution time of a job to be scheduled.  If the next job in the job queue cannot be started due to a lack of available resources, backfilling tries to find another job in the queue that can use the idle resources.
  • 35.
    Job execution  Oncea job and a resource are selected, the next step is to submit the job to the resource for execution.  Job execution may be as easy as running a single command or as complicated as running a series of scripts that may, or may not, include set up or staging.