SlideShare a Scribd company logo
ON FAULT TOLERANCE OF
RESOURCES IN
COMPUTATIONAL GRIDS
Prepared By::
Dave Maurvi Y.
ME CSE
Agenda
• Introduction
• Grid Computing
• Faults & Failure in Grid
• Fault Tolerance Techniques
• Future Enhancement
Introduction
 Grid computing is the collection of computer resources from
multiple locations to reach a common goal.
 One of the main strategies of grid computing is to use middleware
 to divide and apportion pieces of a program among several
computers, sometimes up to many thousands.
GRID COMPUTING
• Grids are a form of distributed computing whereby
a “super virtual computer” is composed of many
networked loosely coupled computers acting together to
perform large tasks.
• Grid size varies a considerable amount.
• Grid Computing forms virtual organization with
geographically distributed hardware and software
infrastructure.
• This infrastructure has flexible, secure and
coordinated shared vast amounts of
heterogeneous resources from multiple
administrative domains.
• The nodes in grid computing are easily
combined to produce a similar computing
resource like multiprocessor supercomputer but
at a lower cost.
• Due to unavailability of network or development
difficulty or faulty resources, fault may occur in the
results or performance may be degraded.
FAULTS AND FAILURES IN GRID
• Important terms
• Fault: A fault is a violation of a system’s
underlying assumptions.
• Error: An error is an internal data state that
reflects a fault.
• Failure: A failure is an externally visible
deviation from specifications.
FAULTS AND FAILURES IN GRID
• Fault tolerance is an important feature to be
taken care of that detects errors and recovers
them without participation of any external
agents, such as humans.
• The more resources and components involved
the more complicated and error-prone becomes
the system.
• Physical faults: faulty storage, faulty CPUs, faulty memory.
• Unconditional termination: Mostly, user pressed Ctrl+c.
• Network faults: packet corruption, faults due to network
partition, packet loss.
• Lifecycle faults: Legacy or versioning faults.
• Processor faults: Machine or operating system crashes.
• Media faults: Disk head crashes.
• Service expiry fault: The service time of a resource may
expire while application is using the resources in grid.
• Process faults: software bug, resource shortage.
• Interaction faults: timing overhead, protocol
incompatibilities, security incompatibilities, policy problems.
FAULTS
Three types of behaviors are possible in systems
after a failure:
• Failstop system: The system does not output any data once it
has failed. It immediately stops sending any events or
messages and does not respond to any messages.
• Failfast system: The system behaves like a Byzantine system
for some time but moves into a failstop mode after a short
period of time. It does not matter what type of fault or failure
has caused this behavior but it is necessary that the system
does not perform any operation once it has failed.
• Byzantine system: The system does not stop after a failure,
instead behaves in a inconsistent way. It may send out wrong
results of the application.
FAULT TOLERANCE TECHNIQUES
Job and Data Replication
• In grid environment, job/task or data are replicated to
tackle the faults.
• Algorithms used for this: Adaptive Job Replication (AJR)
algorithm, Backup Resources Selection (BRS) algorithm
Fault Tolerance using Adaptive Replication in Grid
Computing (FTARG) is an adaptive replication
middleware which addresses the fault tolerance of grid
based applications by providing data replication at
different sites.
• FTARG enables data synchronization between multiple
heterogeneous databases located in the grid by
supporting a variety of synchronization modes.
• Resubmission technique is based on a combination of
task replication and task resubmission using a
resubmission impact metric which measures the impact
of repeated task resubmission on the execution time of a
workflow.
Checkpointing:
• A Fault Tolerance and Recovery component that extends
the Active BPEL workflow engine has been proposed to
develop mechanisms for building an autonomic
workflow management system that effectively detects,
diagnoses, notifies, reacts and recovers automatically
from failures during workflow execution.
• The default behavior of Active BPEL can be modified in
order to recover a process from a faulty state, using a
non-intrusive checkpointing mechanism.
• Resource Fault Occurrence History (RFOH) is the
strategy maintains the history of fault occurrence of
resources in Grid Information Server (GIS).
• A resource broker with jobs to schedule, it uses this GIS
information in Genetic Algorithm and looks for a near
optimal solution for the problem.
• A proposition is also there to present an experience to
endow with fault tolerance support parallel executions
on grids through the integration of ComPiler for
Portable Checkpointing (CPPC), a checkpointing tool for
parallel applications, and GridWay: a meta-scheduler
provided with the Globus Toolkit.
Scheduling/Agent based migration
• Scheduling policies for grid systems can be classified into
space sharing and time sharing policies.
• In fault-tolerant scheduling, primary-backup approach is a
practiced methodology used for fault tolerance where each
task holds a primary copy and a backup copy submitted to two
different processors.
• Two algorithms: the Minimum Replication Cost with Early
Completion Time (MRC-ECT) algorithm and the Minimum
Completion Time with Less Replication Cost (MCT-LRC)
algorithm has been proposed to schedule backups of
independent jobs and dependent jobs, respectively
Load balancing
• A fault tolerant policy has been proposed to balance loads
dynamically in the P2P grid system, named the Fault
Tolerant policy on Dynamic Load Balancing (FTDLB)
• Load balancing strategies are classified as dynamic and
static. In general, the static load balancing strategy
needs the prior information to make decisions, such as
the execution rate of each node, for load distribution
• On the other hand, the dynamic load balancing strategy
exploits the system information to make decisions at run
time.
• Load balancing strategies could also be categorized as
centralized or decentralized.
• The centralized strategy selects a single processor to
handle load scheduling, while the distributed strategy
welcomes each participating node to handle load
balancing.
Global behaviour modelling
• Theoretically they are considered as single elements but,
when it comes to practice, especially in management
related issues, they are considered as a set of
independent, loosely related elements.
• Elements like CPUs, memory and its controllers, video
cards, hard drives; network interfaces etc. have distinctive
functionalities and are technologically complex.
• Fault tolerance techniques in grid systems can be split
into two categories:
1.resource-level (focused on every machine)
2.service-level (focused on global behavior).
CONCLUSION AND FUTURE WORK
• Checkpoint fixation level: either at system level (i.e. at OS or
middleware level) or at application level.
• In-transit and Orphan message management with checkpoint:
latency and resources held up for this reason would be freed if applied with
a suitable policy.
• Scope of Checkpoint: local – for each process instance or global – for
each parallel program in execution.
• Storage space requirement for checkpointing: light – only the
first/top level assignment is stored thereby less storage and communication
overhead and heavy – in addition to light, newly learnt clauses saved atop
the decision stack.
• Granularity of checkpointing: full – entire state of application saved
and incremental –application state saved from previous checkpoints only.
REFERENCES
• Asgarali Bouyer, Abdul Hanan Abdullah, Hasan Ebrahimpour and
Firouz Nasrollahi," Fault-Tolerance Scheduling,2009, IEEE
Computer Society.
• Elvin Sindrilaru, Alexandru Costan and Valentin Cristea, "Fault
Tolerance and Recovery in Grid Workflow Management Systems,"
International Conference on Complex, Intelligent and Software
Intensive Systems, 2010, IEEE Computer Society.
• Ian Foster, “What is the Grid? A Three Point Checklist”, Argonne
National Laboratory & University of Chicago, July 20, 2002.
THANK YOU

More Related Content

What's hot

HOST AND NETWORK SECURITY by ThesisScientist.com
HOST AND NETWORK SECURITY by ThesisScientist.comHOST AND NETWORK SECURITY by ThesisScientist.com
HOST AND NETWORK SECURITY by ThesisScientist.com
Prof Ansari
 
Distributed process and scheduling
Distributed process and scheduling Distributed process and scheduling
Distributed process and scheduling
SHATHAN
 
Power system transmission issues and effects
Power system transmission issues and effectsPower system transmission issues and effects
Power system transmission issues and effects
Anand Azad
 
CSI-503 - 3. Process Scheduling
CSI-503 - 3. Process SchedulingCSI-503 - 3. Process Scheduling
CSI-503 - 3. Process Scheduling
ghayour abbas
 
Smart optimization techniques for virtual power plants
Smart optimization techniques for virtual power plants Smart optimization techniques for virtual power plants
Smart optimization techniques for virtual power plants
Babatunde Odetayo, PhD, P.Eng, PMP
 
The Functions of the Operating System
The Functions of the Operating SystemThe Functions of the Operating System
The Functions of the Operating System
andyr91
 
The Functions of the Operating System
The Functions of the Operating SystemThe Functions of the Operating System
The Functions of the Operating System
andyr91
 
Checkpointing and Rollback Recovery Algorithms for Fault Tolerance in MANETs:...
Checkpointing and Rollback Recovery Algorithms for Fault Tolerance in MANETs:...Checkpointing and Rollback Recovery Algorithms for Fault Tolerance in MANETs:...
Checkpointing and Rollback Recovery Algorithms for Fault Tolerance in MANETs:...
Eswar Publications
 
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
vtunotesbysree
 
04. availability-concepts
04. availability-concepts04. availability-concepts
04. availability-concepts
Muhammad Ahad
 
Stop-and-Wait ARQ Technique for Repairing Frame and Acknowledgment Transmission
Stop-and-Wait ARQ Technique for Repairing Frame and Acknowledgment TransmissionStop-and-Wait ARQ Technique for Repairing Frame and Acknowledgment Transmission
Stop-and-Wait ARQ Technique for Repairing Frame and Acknowledgment Transmission
Universitas Pembangunan Panca Budi
 
System on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power ElectronicsSystem on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power Electronics
journalBEEI
 
A novel resource efficient dmms approach for network monitoring and controlli...
A novel resource efficient dmms approach for network monitoring and controlli...A novel resource efficient dmms approach for network monitoring and controlli...
A novel resource efficient dmms approach for network monitoring and controlli...
ijwmn
 
Multiprocessor scheduling 2
Multiprocessor scheduling 2Multiprocessor scheduling 2
Multiprocessor scheduling 2mrbourne
 
Cp7101 design and management of computer networks-requirements analysis 2
Cp7101 design and management of computer networks-requirements analysis 2 Cp7101 design and management of computer networks-requirements analysis 2
Cp7101 design and management of computer networks-requirements analysis 2
Dr Geetha Mohan
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
Subhasis Dash
 
Advanced Automated Approach for Interconnected Power System Congestion Forecast
Advanced Automated Approach for Interconnected Power System Congestion ForecastAdvanced Automated Approach for Interconnected Power System Congestion Forecast
Advanced Automated Approach for Interconnected Power System Congestion Forecast
Power System Operation
 
RTOS for Embedded System Design
RTOS for Embedded System DesignRTOS for Embedded System Design
RTOS for Embedded System Design
anand hd
 
dos mutual exclusion algos
dos mutual exclusion algosdos mutual exclusion algos
dos mutual exclusion algosAkhil Sharma
 

What's hot (20)

HOST AND NETWORK SECURITY by ThesisScientist.com
HOST AND NETWORK SECURITY by ThesisScientist.comHOST AND NETWORK SECURITY by ThesisScientist.com
HOST AND NETWORK SECURITY by ThesisScientist.com
 
Distributed process and scheduling
Distributed process and scheduling Distributed process and scheduling
Distributed process and scheduling
 
Power system transmission issues and effects
Power system transmission issues and effectsPower system transmission issues and effects
Power system transmission issues and effects
 
CSI-503 - 3. Process Scheduling
CSI-503 - 3. Process SchedulingCSI-503 - 3. Process Scheduling
CSI-503 - 3. Process Scheduling
 
Smart optimization techniques for virtual power plants
Smart optimization techniques for virtual power plants Smart optimization techniques for virtual power plants
Smart optimization techniques for virtual power plants
 
The Functions of the Operating System
The Functions of the Operating SystemThe Functions of the Operating System
The Functions of the Operating System
 
The Functions of the Operating System
The Functions of the Operating SystemThe Functions of the Operating System
The Functions of the Operating System
 
Checkpointing and Rollback Recovery Algorithms for Fault Tolerance in MANETs:...
Checkpointing and Rollback Recovery Algorithms for Fault Tolerance in MANETs:...Checkpointing and Rollback Recovery Algorithms for Fault Tolerance in MANETs:...
Checkpointing and Rollback Recovery Algorithms for Fault Tolerance in MANETs:...
 
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
 
04. availability-concepts
04. availability-concepts04. availability-concepts
04. availability-concepts
 
Stop-and-Wait ARQ Technique for Repairing Frame and Acknowledgment Transmission
Stop-and-Wait ARQ Technique for Repairing Frame and Acknowledgment TransmissionStop-and-Wait ARQ Technique for Repairing Frame and Acknowledgment Transmission
Stop-and-Wait ARQ Technique for Repairing Frame and Acknowledgment Transmission
 
System on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power ElectronicsSystem on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power Electronics
 
A novel resource efficient dmms approach for network monitoring and controlli...
A novel resource efficient dmms approach for network monitoring and controlli...A novel resource efficient dmms approach for network monitoring and controlli...
A novel resource efficient dmms approach for network monitoring and controlli...
 
Multiprocessor scheduling 2
Multiprocessor scheduling 2Multiprocessor scheduling 2
Multiprocessor scheduling 2
 
Hwswcd mp so_c_1
Hwswcd mp so_c_1Hwswcd mp so_c_1
Hwswcd mp so_c_1
 
Cp7101 design and management of computer networks-requirements analysis 2
Cp7101 design and management of computer networks-requirements analysis 2 Cp7101 design and management of computer networks-requirements analysis 2
Cp7101 design and management of computer networks-requirements analysis 2
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
 
Advanced Automated Approach for Interconnected Power System Congestion Forecast
Advanced Automated Approach for Interconnected Power System Congestion ForecastAdvanced Automated Approach for Interconnected Power System Congestion Forecast
Advanced Automated Approach for Interconnected Power System Congestion Forecast
 
RTOS for Embedded System Design
RTOS for Embedded System DesignRTOS for Embedded System Design
RTOS for Embedded System Design
 
dos mutual exclusion algos
dos mutual exclusion algosdos mutual exclusion algos
dos mutual exclusion algos
 

Viewers also liked

“Cities and Science – New Challenges” 12th-14th April 2011
“Cities and Science – New Challenges” 12th-14th April 2011“Cities and Science – New Challenges” 12th-14th April 2011
“Cities and Science – New Challenges” 12th-14th April 2011
mind(21)factory
 
Presentación equipo no 2 (1)
Presentación  equipo no 2 (1)Presentación  equipo no 2 (1)
Presentación equipo no 2 (1)IvetTe Eli
 
Joyería contemporánea
Joyería contemporáneaJoyería contemporánea
Joyería contemporáneaYOGENY
 
Manual de identidad hit baby1
Manual de identidad hit baby1Manual de identidad hit baby1
Manual de identidad hit baby1
Daniela R. Dioses
 
Foro Barranquilla Creativa - Julio 2013
Foro Barranquilla Creativa - Julio 2013Foro Barranquilla Creativa - Julio 2013
Foro Barranquilla Creativa - Julio 2013
Enrique Avogadro
 
eStrategy Magazin Ausgabe 03-2016-Leseprobe
eStrategy Magazin Ausgabe 03-2016-LeseprobeeStrategy Magazin Ausgabe 03-2016-Leseprobe
eStrategy Magazin Ausgabe 03-2016-Leseprobe
TechDivision GmbH
 
Musicas cifradas nivel_2 vol 2
Musicas cifradas nivel_2 vol 2Musicas cifradas nivel_2 vol 2
Musicas cifradas nivel_2 vol 2Elvis Live
 
ejemplo
ejemploejemplo
ejemplo
Arian Gal
 
curso de html
curso de htmlcurso de html
curso de html
Javier Araneda
 
Big data
Big dataBig data
Big data
TaniaQu
 
The business journey partner presentations_the auto-enrolment journey
The business journey partner presentations_the auto-enrolment journeyThe business journey partner presentations_the auto-enrolment journey
The business journey partner presentations_the auto-enrolment journey
Martin Jack
 
PORTAFOLIO DE SERVICIO WEB
PORTAFOLIO DE SERVICIO WEBPORTAFOLIO DE SERVICIO WEB
PORTAFOLIO DE SERVICIO WEBmishelldana
 
El arte de la transición alimenticia
El arte de la transición alimenticiaEl arte de la transición alimenticia
El arte de la transición alimenticia
Fundacion Innovacion
 
La gestion financiera del siglo xxi
La gestion financiera del siglo xxiLa gestion financiera del siglo xxi
La gestion financiera del siglo xxiCarmen Hevia Medina
 
Correo postal y electronico
Correo postal y electronicoCorreo postal y electronico
Correo postal y electronico
Miguel Puertas
 

Viewers also liked (20)

“Cities and Science – New Challenges” 12th-14th April 2011
“Cities and Science – New Challenges” 12th-14th April 2011“Cities and Science – New Challenges” 12th-14th April 2011
“Cities and Science – New Challenges” 12th-14th April 2011
 
Presentación equipo no 2 (1)
Presentación  equipo no 2 (1)Presentación  equipo no 2 (1)
Presentación equipo no 2 (1)
 
Joyería contemporánea
Joyería contemporáneaJoyería contemporánea
Joyería contemporánea
 
Manual de identidad hit baby1
Manual de identidad hit baby1Manual de identidad hit baby1
Manual de identidad hit baby1
 
Foro Barranquilla Creativa - Julio 2013
Foro Barranquilla Creativa - Julio 2013Foro Barranquilla Creativa - Julio 2013
Foro Barranquilla Creativa - Julio 2013
 
eStrategy Magazin Ausgabe 03-2016-Leseprobe
eStrategy Magazin Ausgabe 03-2016-LeseprobeeStrategy Magazin Ausgabe 03-2016-Leseprobe
eStrategy Magazin Ausgabe 03-2016-Leseprobe
 
Musicas cifradas nivel_2 vol 2
Musicas cifradas nivel_2 vol 2Musicas cifradas nivel_2 vol 2
Musicas cifradas nivel_2 vol 2
 
ejemplo
ejemploejemplo
ejemplo
 
Shaban CV (1)
Shaban CV (1)Shaban CV (1)
Shaban CV (1)
 
curso de html
curso de htmlcurso de html
curso de html
 
1898 09
1898 091898 09
1898 09
 
Final Thesis
Final ThesisFinal Thesis
Final Thesis
 
Big data
Big dataBig data
Big data
 
Top 10 senior construction property & recruitment roles in London
Top 10 senior construction property & recruitment roles in LondonTop 10 senior construction property & recruitment roles in London
Top 10 senior construction property & recruitment roles in London
 
The business journey partner presentations_the auto-enrolment journey
The business journey partner presentations_the auto-enrolment journeyThe business journey partner presentations_the auto-enrolment journey
The business journey partner presentations_the auto-enrolment journey
 
OTP c200 How To
OTP c200 How ToOTP c200 How To
OTP c200 How To
 
PORTAFOLIO DE SERVICIO WEB
PORTAFOLIO DE SERVICIO WEBPORTAFOLIO DE SERVICIO WEB
PORTAFOLIO DE SERVICIO WEB
 
El arte de la transición alimenticia
El arte de la transición alimenticiaEl arte de la transición alimenticia
El arte de la transición alimenticia
 
La gestion financiera del siglo xxi
La gestion financiera del siglo xxiLa gestion financiera del siglo xxi
La gestion financiera del siglo xxi
 
Correo postal y electronico
Correo postal y electronicoCorreo postal y electronico
Correo postal y electronico
 

Similar to FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS

02 Models of Distribution Systems.pdf
02 Models of Distribution Systems.pdf02 Models of Distribution Systems.pdf
02 Models of Distribution Systems.pdf
RobeliaJoyVillaruz
 
Module 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModul...
Module 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModul...Module 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModul...
Module 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModul...
morganjohn3
 
Grid computing
Grid computingGrid computing
Grid computing
Megha yadav
 
introduction to cloud computing for college.pdf
introduction to cloud computing for college.pdfintroduction to cloud computing for college.pdf
introduction to cloud computing for college.pdf
snehan789
 
Operating System
Operating SystemOperating System
Operating System
Hitesh Mohapatra
 
CC unit 1.pptx
CC unit 1.pptxCC unit 1.pptx
CC unit 1.pptx
DivyaRadharapu1
 
An Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.pptAn Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.ppt
HarshalUbale2
 
Os unit i
Os unit iOs unit i
Os unit i
SandhyaTatekalva
 
Cloud computing basic introduction and notes for exam
Cloud computing basic introduction and notes for examCloud computing basic introduction and notes for exam
Cloud computing basic introduction and notes for exam
UtkarshAnand512529
 
Real time operating systems
Real time operating systemsReal time operating systems
Real time operating systems
Sri Manakula Vinayagar Engineering College
 
Chapeter 2 introduction to cloud computing
Chapeter 2   introduction to cloud computingChapeter 2   introduction to cloud computing
Chapeter 2 introduction to cloud computing
eShikshak
 
IDEA.pptx
IDEA.pptxIDEA.pptx
IDEA.pptx
TirthMehta19
 
Introduction to operating systems
 Introduction to operating systems Introduction to operating systems
Introduction to operating systems
Kumbirai Junior Muzavazi
 
Real Time Operating Systems, Dynamic Precision: Exploring the Realm of Real-...
Real Time Operating Systems,  Dynamic Precision: Exploring the Realm of Real-...Real Time Operating Systems,  Dynamic Precision: Exploring the Realm of Real-...
Real Time Operating Systems, Dynamic Precision: Exploring the Realm of Real-...
Adobe2801
 
Os notes 1_5
Os notes 1_5Os notes 1_5
Os notes 1_5
NagarajMatheswaran
 
Resource management
Resource managementResource management
Resource management
peeyushanand6
 
Operating system
Operating systemOperating system
Operating system
Maitri Ratna Bajracharya
 
Cloud data management
Cloud data managementCloud data management
Cloud data managementambitlick
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptx
JoeBaker69
 

Similar to FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS (20)

02 Models of Distribution Systems.pdf
02 Models of Distribution Systems.pdf02 Models of Distribution Systems.pdf
02 Models of Distribution Systems.pdf
 
Module 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModul...
Module 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModul...Module 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModul...
Module 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModule 2 - PPT.pdfModul...
 
Linux basics
Linux basicsLinux basics
Linux basics
 
Grid computing
Grid computingGrid computing
Grid computing
 
introduction to cloud computing for college.pdf
introduction to cloud computing for college.pdfintroduction to cloud computing for college.pdf
introduction to cloud computing for college.pdf
 
Operating System
Operating SystemOperating System
Operating System
 
CC unit 1.pptx
CC unit 1.pptxCC unit 1.pptx
CC unit 1.pptx
 
An Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.pptAn Introduction to Cloud Computing and Lates Developments.ppt
An Introduction to Cloud Computing and Lates Developments.ppt
 
Os unit i
Os unit iOs unit i
Os unit i
 
Cloud computing basic introduction and notes for exam
Cloud computing basic introduction and notes for examCloud computing basic introduction and notes for exam
Cloud computing basic introduction and notes for exam
 
Real time operating systems
Real time operating systemsReal time operating systems
Real time operating systems
 
Chapeter 2 introduction to cloud computing
Chapeter 2   introduction to cloud computingChapeter 2   introduction to cloud computing
Chapeter 2 introduction to cloud computing
 
IDEA.pptx
IDEA.pptxIDEA.pptx
IDEA.pptx
 
Introduction to operating systems
 Introduction to operating systems Introduction to operating systems
Introduction to operating systems
 
Real Time Operating Systems, Dynamic Precision: Exploring the Realm of Real-...
Real Time Operating Systems,  Dynamic Precision: Exploring the Realm of Real-...Real Time Operating Systems,  Dynamic Precision: Exploring the Realm of Real-...
Real Time Operating Systems, Dynamic Precision: Exploring the Realm of Real-...
 
Os notes 1_5
Os notes 1_5Os notes 1_5
Os notes 1_5
 
Resource management
Resource managementResource management
Resource management
 
Operating system
Operating systemOperating system
Operating system
 
Cloud data management
Cloud data managementCloud data management
Cloud data management
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptx
 

Recently uploaded

急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
VivekSinghShekhawat2
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 

Recently uploaded (20)

急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 

FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS

  • 1. ON FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS Prepared By:: Dave Maurvi Y. ME CSE
  • 2. Agenda • Introduction • Grid Computing • Faults & Failure in Grid • Fault Tolerance Techniques • Future Enhancement
  • 3. Introduction  Grid computing is the collection of computer resources from multiple locations to reach a common goal.  One of the main strategies of grid computing is to use middleware  to divide and apportion pieces of a program among several computers, sometimes up to many thousands.
  • 4. GRID COMPUTING • Grids are a form of distributed computing whereby a “super virtual computer” is composed of many networked loosely coupled computers acting together to perform large tasks. • Grid size varies a considerable amount. • Grid Computing forms virtual organization with geographically distributed hardware and software infrastructure.
  • 5. • This infrastructure has flexible, secure and coordinated shared vast amounts of heterogeneous resources from multiple administrative domains. • The nodes in grid computing are easily combined to produce a similar computing resource like multiprocessor supercomputer but at a lower cost.
  • 6. • Due to unavailability of network or development difficulty or faulty resources, fault may occur in the results or performance may be degraded.
  • 7. FAULTS AND FAILURES IN GRID • Important terms • Fault: A fault is a violation of a system’s underlying assumptions. • Error: An error is an internal data state that reflects a fault. • Failure: A failure is an externally visible deviation from specifications.
  • 8. FAULTS AND FAILURES IN GRID • Fault tolerance is an important feature to be taken care of that detects errors and recovers them without participation of any external agents, such as humans. • The more resources and components involved the more complicated and error-prone becomes the system.
  • 9. • Physical faults: faulty storage, faulty CPUs, faulty memory. • Unconditional termination: Mostly, user pressed Ctrl+c. • Network faults: packet corruption, faults due to network partition, packet loss. • Lifecycle faults: Legacy or versioning faults. • Processor faults: Machine or operating system crashes. • Media faults: Disk head crashes. • Service expiry fault: The service time of a resource may expire while application is using the resources in grid. • Process faults: software bug, resource shortage. • Interaction faults: timing overhead, protocol incompatibilities, security incompatibilities, policy problems. FAULTS
  • 10. Three types of behaviors are possible in systems after a failure: • Failstop system: The system does not output any data once it has failed. It immediately stops sending any events or messages and does not respond to any messages. • Failfast system: The system behaves like a Byzantine system for some time but moves into a failstop mode after a short period of time. It does not matter what type of fault or failure has caused this behavior but it is necessary that the system does not perform any operation once it has failed. • Byzantine system: The system does not stop after a failure, instead behaves in a inconsistent way. It may send out wrong results of the application.
  • 11. FAULT TOLERANCE TECHNIQUES Job and Data Replication • In grid environment, job/task or data are replicated to tackle the faults. • Algorithms used for this: Adaptive Job Replication (AJR) algorithm, Backup Resources Selection (BRS) algorithm Fault Tolerance using Adaptive Replication in Grid Computing (FTARG) is an adaptive replication middleware which addresses the fault tolerance of grid based applications by providing data replication at different sites.
  • 12. • FTARG enables data synchronization between multiple heterogeneous databases located in the grid by supporting a variety of synchronization modes. • Resubmission technique is based on a combination of task replication and task resubmission using a resubmission impact metric which measures the impact of repeated task resubmission on the execution time of a workflow.
  • 13. Checkpointing: • A Fault Tolerance and Recovery component that extends the Active BPEL workflow engine has been proposed to develop mechanisms for building an autonomic workflow management system that effectively detects, diagnoses, notifies, reacts and recovers automatically from failures during workflow execution. • The default behavior of Active BPEL can be modified in order to recover a process from a faulty state, using a non-intrusive checkpointing mechanism.
  • 14. • Resource Fault Occurrence History (RFOH) is the strategy maintains the history of fault occurrence of resources in Grid Information Server (GIS). • A resource broker with jobs to schedule, it uses this GIS information in Genetic Algorithm and looks for a near optimal solution for the problem. • A proposition is also there to present an experience to endow with fault tolerance support parallel executions on grids through the integration of ComPiler for Portable Checkpointing (CPPC), a checkpointing tool for parallel applications, and GridWay: a meta-scheduler provided with the Globus Toolkit.
  • 15. Scheduling/Agent based migration • Scheduling policies for grid systems can be classified into space sharing and time sharing policies. • In fault-tolerant scheduling, primary-backup approach is a practiced methodology used for fault tolerance where each task holds a primary copy and a backup copy submitted to two different processors. • Two algorithms: the Minimum Replication Cost with Early Completion Time (MRC-ECT) algorithm and the Minimum Completion Time with Less Replication Cost (MCT-LRC) algorithm has been proposed to schedule backups of independent jobs and dependent jobs, respectively
  • 16. Load balancing • A fault tolerant policy has been proposed to balance loads dynamically in the P2P grid system, named the Fault Tolerant policy on Dynamic Load Balancing (FTDLB) • Load balancing strategies are classified as dynamic and static. In general, the static load balancing strategy needs the prior information to make decisions, such as the execution rate of each node, for load distribution
  • 17. • On the other hand, the dynamic load balancing strategy exploits the system information to make decisions at run time. • Load balancing strategies could also be categorized as centralized or decentralized. • The centralized strategy selects a single processor to handle load scheduling, while the distributed strategy welcomes each participating node to handle load balancing.
  • 18. Global behaviour modelling • Theoretically they are considered as single elements but, when it comes to practice, especially in management related issues, they are considered as a set of independent, loosely related elements. • Elements like CPUs, memory and its controllers, video cards, hard drives; network interfaces etc. have distinctive functionalities and are technologically complex. • Fault tolerance techniques in grid systems can be split into two categories: 1.resource-level (focused on every machine) 2.service-level (focused on global behavior).
  • 19. CONCLUSION AND FUTURE WORK • Checkpoint fixation level: either at system level (i.e. at OS or middleware level) or at application level. • In-transit and Orphan message management with checkpoint: latency and resources held up for this reason would be freed if applied with a suitable policy. • Scope of Checkpoint: local – for each process instance or global – for each parallel program in execution. • Storage space requirement for checkpointing: light – only the first/top level assignment is stored thereby less storage and communication overhead and heavy – in addition to light, newly learnt clauses saved atop the decision stack. • Granularity of checkpointing: full – entire state of application saved and incremental –application state saved from previous checkpoints only.
  • 20. REFERENCES • Asgarali Bouyer, Abdul Hanan Abdullah, Hasan Ebrahimpour and Firouz Nasrollahi," Fault-Tolerance Scheduling,2009, IEEE Computer Society. • Elvin Sindrilaru, Alexandru Costan and Valentin Cristea, "Fault Tolerance and Recovery in Grid Workflow Management Systems," International Conference on Complex, Intelligent and Software Intensive Systems, 2010, IEEE Computer Society. • Ian Foster, “What is the Grid? A Three Point Checklist”, Argonne National Laboratory & University of Chicago, July 20, 2002.