This document presents a fault tolerant scheduling system for computational grids. It introduces a new factor called the scheduling indicator (SI) that considers both a resource's response time and fault rate. The system aims to improve grid reliability by avoiding resources that frequently fail. It consists of five main components: a grid portal, scheduler, resource information server, fault handler, and grid resources. The scheduler calculates the SI for each job-resource pair to select the most reliable resource for task execution.
1. A Fault tolerant Scheduling System
Department of Computer Science
DCS
COMSATS Institute of
Information Technology
for Computational Grids
Presented to:
Dr.Babar Nazir
Presented By:
Ghulam Asfia
1
3. Department of Computer Science
Points to be discussed…….
Introduction to grid computing
Scheduling
Genral Discussion
Problem Statement
Proposed Solution
Conclusion
Questions????
3
4. Department of Computer Science
Grid Computing
• What is Grid Computing?
• Grid computing is nothing but using the
resources of many computers from different
domains connected by a network to achieve a
common goal.
• Explanation
4
5. Department of Computer Science
Scheduling
Scheduling is a process of allocating jobs onto
available resources in time. Such process has to
respect constraints given by the jobs and the
Grid.
5
6. Department of Computer Science
Terms of Grid Scheduling
A task is an atomic unit to be scheduled by the
scheduler and assigned to a resource.
A job (or metatask, or application) is a set of
atomic tasks that will be carried out on a set of
resources. Job can have a recursive structure,
meaning that jobs are composed of sub-jobs and /or
tasks, and sub-jobs can themselves be decomposed
further into atomic tasks.
A resource is something that is required to carry out
an operation, for example: a processor for data
processing, a data storage device, or a network link
for data transporting.
A site (or node) is an autonomous entity composed of
one or multiple resources.
A task scheduling is the mapping of tasks to a
selected group of resources which may be
distributed in administrative domains. 6
7. Department of Computer Science
Problem description
• Users submit jobs with their QoS requirements.
Grid scheduler schedules these jobs on the
most suitable resources according to the
resource response time and the fault index. The
resource executes the job and the result is
submitted to the user.
• Major drawback: while there are resources that
fulfill the criterion of the response time, they
have a tendency to fail. Also, fault index is not a
suitable indicator for the resource failure
history. This results in selecting resources that
may have a higher tendency to fail.
7
8. Flow Chart of Problem Statement
Department of Computer Science
8
9. Department of Computer Science
Major Contribution….
• To introduce a fault-tolerant system with a
scheduling strategy that depends on a new
factor called scheduling indicator (SI). This
indicator comprises of the response time and
the fault rate of resources in the grid. The
main idea behind the proposed system is to
avoid resources that frequently fail.
• Compared with most recent scheduling
system.
• Improved grid reliability
9
10. Copmonents of Proposed System
Five main components:
• Grid portal.
• Scheduler.
• Resourse information server.
• Fault handler.
• Grid resources.
Department of Computer Science
10
11. Flow chart of proposed Solution
Department of Computer Science
11
12. Department of Computer Science
Continued…..
Grid Portal: provides an interface for users to
submit their bids for execution.
Scheduler: selects the optimal resources to
execute the task.
The Resource Information Server (RIS):
Contains information on all the resources of the
grid. The information may include computing
speed, the available load, and memory.
Fault handler: responsible for detecting
defects in resources and the estimated
default rate resources.
12
14. • The scheduler receives user jobs and its
information from the grid portal. Job
information includes job number, job type, and
job size.
• Assigns each job to the most reliable, suitable,
and available resource to execute the job. The
most reliable resource is the resource that has a
lower fault rate. This can be known from the
history of the resource failures stored in the
RIS. In this server,the fault rate of each
resource in the grid is stored.
Department of Computer Science
The scheduler’s operation
14
15. Department of Computer Science
Continued…..
• The fault ratePfj of resourcej is defined by:
• To achieve its purpose, the scheduler creates
a SImatrix. Each entry in the matrix
represents the scheduler indicator of each job
for each suitable resource in the grid.
Assuming there are m resources and n
jobs,the SImatrix will be as follows:
15
21. Department of Computer Science
Conclusion
• In this paper, a fault tolerant scheduling system
for networks is proposed. The system
performance is evaluated under different
conditions with recent fault tolerant scheduling
system which depends on the response time
and the fault index.The parameters used for the
evaluation are throughput, turnaround time,
availability and the tendency of failure.
21