Fault Tolerant Grid Scheduling System

A Fault tolerant Scheduling System
Department of Computer Science
DCS
COMSATS Institute of
Information Technology
for Computational Grids
Presented to:
Dr.Babar Nazir
Presented By:
Ghulam Asfia
1

Grid
2

Points to be discussed…….
Introduction to grid computing
Scheduling
Genral Discussion
Problem Statement
Proposed Solution
Conclusion
Questions????
3

Grid Computing
• What is Grid Computing?
• Grid computing is nothing but using the
resources of many computers from different
domains connected by a network to achieve a
common goal.
• Explanation
4

Scheduling
Scheduling is a process of allocating jobs onto
available resources in time. Such process has to
respect constraints given by the jobs and the
Grid.
5

Terms of Grid Scheduling
A task is an atomic unit to be scheduled by the
scheduler and assigned to a resource.
A job (or metatask, or application) is a set of
atomic tasks that will be carried out on a set of
resources. Job can have a recursive structure,
meaning that jobs are composed of sub-jobs and /or
tasks, and sub-jobs can themselves be decomposed
further into atomic tasks.
A resource is something that is required to carry out
an operation, for example: a processor for data
processing, a data storage device, or a network link
for data transporting.
A site (or node) is an autonomous entity composed of
one or multiple resources.
A task scheduling is the mapping of tasks to a
selected group of resources which may be
distributed in administrative domains. 6

Problem description
• Users submit jobs with their QoS requirements.
Grid scheduler schedules these jobs on the
most suitable resources according to the
resource response time and the fault index. The
resource executes the job and the result is
submitted to the user.
• Major drawback: while there are resources that
fulfill the criterion of the response time, they
have a tendency to fail. Also, fault index is not a
suitable indicator for the resource failure
history. This results in selecting resources that
may have a higher tendency to fail.
7

Flow Chart of Problem Statement
8

Major Contribution….
• To introduce a fault-tolerant system with a
scheduling strategy that depends on a new
factor called scheduling indicator (SI). This
indicator comprises of the response time and
the fault rate of resources in the grid. The
main idea behind the proposed system is to
avoid resources that frequently fail.
• Compared with most recent scheduling
system.
• Improved grid reliability
9

Copmonents of Proposed System
Five main components:
• Grid portal.
• Scheduler.
• Resourse information server.
• Fault handler.
• Grid resources.
10

Flow chart of proposed Solution
11

Continued…..
Grid Portal: provides an interface for users to
submit their bids for execution.
Scheduler: selects the optimal resources to
execute the task.
The Resource Information Server (RIS):
Contains information on all the resources of the
grid. The information may include computing
speed, the available load, and memory.
Fault handler: responsible for detecting
defects in resources and the estimated
default rate resources.
12

Architecture of Proposed system
13

• The scheduler receives user jobs and its
information from the grid portal. Job
information includes job number, job type, and
job size.
• Assigns each job to the most reliable, suitable,
and available resource to execute the job. The
most reliable resource is the resource that has a
lower fault rate. This can be known from the
history of the resource failures stored in the
RIS. In this server,the fault rate of each
resource in the grid is stored.
The scheduler’s operation
14

Continued…..
• The fault ratePfj of resourcej is defined by:
• To achieve its purpose, the scheduler creates
a SImatrix. Each entry in the matrix
represents the scheduler indicator of each job
for each suitable resource in the grid.
Assuming there are m resources and n
jobs,the SImatrix will be as follows:
15

Continued…..
16

Inside Scheduler
• The agent’s role
• Scheduler agent (SA)
• Job agent (JA)
• Result agent (RA)
• Fault Handler Agent (FHA)
17

Scheduling Algorithm
18

Result Agent Algorithm
19

Fault Handler Algorithm
20

Conclusion
• In this paper, a fault tolerant scheduling system
for networks is proposed. The system
performance is evaluated under different
conditions with recent fault tolerant scheduling
system which depends on the response time
and the fault index.The parameters used for the
evaluation are throughput, turnaround time,
availability and the tendency of failure.
21

Thank you
22
Department of Computer Science 22

Fault Tolerant Grid Scheduling System

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Fault Tolerant Grid Scheduling System

Similar to Fault Tolerant Grid Scheduling System (20)

Recently uploaded

Recently uploaded (20)

Fault Tolerant Grid Scheduling System