Paradigms in Fault Tolerant Checkpointing Protocols in Distributed             Mobile            Systems
Abstract• Distributed mobile systems are ubiquitous now-a days.• Distributed mobile systems are not fault tolerant. They  ...
• Various techniques and algorithms have been devised and  developed in this regard. One commonly applied solution to  the...
Distributed Transactions
• “distributed transaction” is a group of several sub-transactions,  each running and updating data on different computer ...
Failure Models in Mobile Distributed Systems1) Timing faults – occurs when a module does not complete its   services in ti...
FAULT TOLERANCE PROTOCOLSThe Two-phase commit (2PC) protocol: The two-phase commit (2PC) protocol is a distributedalgorit...
Phase-I Protocol for the coordinator:Starti) Send transaction to the participating nodes.ii) Wait for signal (YES/NO) from...
Decision making phase(YES)Phase-II Agreement Protocol for the coordinator:Starti) Send commit signal to the participating ...
In case of (NO)Phase-II Failure Protocol for the coordinator:Starti) Send switchback signal to the participating nodes.ii)...
conclusion• Reliability can be restored using the above mentioned  techniques of mobile distributed systems• Although ther...
Thank you!!
Fault Tolerant and Distributed System
Upcoming SlideShare
Loading in...5
×

Fault Tolerant and Distributed System

219

Published on

Fault Tolerance in Distributed systems will give you the architecture and how we can mitigate or solve the Faults in Distributed Systems.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
219
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Fault Tolerant and Distributed System

  1. 1. Paradigms in Fault Tolerant Checkpointing Protocols in Distributed Mobile Systems
  2. 2. Abstract• Distributed mobile systems are ubiquitous now-a days.• Distributed mobile systems are not fault tolerant. They introduce new challenges in the area of fault tolerant computing.• Mobile computing having many issues, such as lower throughput and latency, low bandwidth of wireless channels, lack of stable storage on mobile hosts, connection breakdowns and inadequate battery life.• This paper surveys the algorithms which will restore the system back to a consistent state after a failure.
  3. 3. • Various techniques and algorithms have been devised and developed in this regard. One commonly applied solution to these failures is the use of Checkpoint/Restart scheme.• But the problem with this technique is that it rollbacks all the processors to an earlier stage, even if single processor crashes.• The idea behind most of the fault tolerance protocols is to roll-back only the crashed processor instead of rolling-back all the processors.• In such cases, if some processors are not dependent upon the results of the crashed processors, they can continue to perform their task without further waiting
  4. 4. Distributed Transactions
  5. 5. • “distributed transaction” is a group of several sub-transactions, each running and updating data on different computer systems.• local “transaction manager” whose purpose is to enlist, prepare, commit, and abort the calls made by the distributed transactions.• Before the occurrence of any distributed transaction, each participating transaction manager must agree to commit an action; like, updating.
  6. 6. Failure Models in Mobile Distributed Systems1) Timing faults – occurs when a module does not complete its services in time;2) Omission faults - occurs when a module completely fails to accomplish its services;3) Crash faults - occurs when a module either stops operating completely or never yields to an effective state;4) Byzantine faults - these are the faults that are random in nature.
  7. 7. FAULT TOLERANCE PROTOCOLSThe Two-phase commit (2PC) protocol: The two-phase commit (2PC) protocol is a distributedalgorithm that assures the reliable termination of atransaction in a distributed environment.
  8. 8. Phase-I Protocol for the coordinator:Starti) Send transaction to the participating nodes.ii) Wait for signal (YES/NO) from all participating nodes.StopPhase-I Protocol for the participating nodes:Starti) Receive transaction from the coordinator.ii) Do local processing.iii) Send signal (YES/NO) to the coordinator node.Stop
  9. 9. Decision making phase(YES)Phase-II Agreement Protocol for the coordinator:Starti) Send commit signal to the participating nodes.ii) Receive acknowledgment from all participating nodes.iii) Commit or complete the transaction.StopPhase-II Agreement Protocol for the participating nodes:Starti) Receive commit signal from the coordinator.ii) Commit the transaction.iii) Release the resources.iv) Send acknowledgement to the coordinator node.Stop
  10. 10. In case of (NO)Phase-II Failure Protocol for the coordinator:Starti) Send switchback signal to the participating nodes.ii) Receive acknowledgment from all participating nodes.iii) Undo transaction.StopPhase-II Failure Protocol for the participating nodes:Starti) Receive switchback signal from the coordinator.ii) Undo transaction.iii) Release the resources.iv) Send acknowledgement to thecoordinator node.Stop
  11. 11. conclusion• Reliability can be restored using the above mentioned techniques of mobile distributed systems• Although there will be new challenges and thus making such protocols is still unsuitable.• Further protocols can be developed to add reliability to such systems.• This recent paper provides a further step to restore the system back to a consistent state even during the presence of a failure.
  12. 12. Thank you!!

×