Successfully reported this slideshow.

Foult Tolerence In Distributed System


Published on

Published in: Education
  • Be the first to comment

Foult Tolerence In Distributed System

  1. 1. ASeminar Presentationon<br />FAULT TOLERANCE <br />IN DISTRIBUTED SYSTEM<br />Coordinator : Submitted By:<br />Mr. JitendraYadavPankajMehra<br />Lecturer Final Yr. I.T.<br />
  2. 2. A distributed system consists of autonomous computing modules that interact with each other using messages.<br />Physical separation and the use of heterogeneous computers complicate interprocessor communication, management of resources, synchronization of cooperating activities, and maintenance of consistency among multiple copies of information<br />Distributed System<br />
  3. 3. Why Distributed System<br />A distributed system is a program that runs on several processing units at the same time. This partitioning across several processors and hosts may be necessary because of the following reasons.<br /> <br />Processing throughput<br />CPU specialization<br />Fault tolerance.<br />Repartition of the application on various sites<br />
  4. 4. A distributed system is modeled as a graph with directed edges. Vertices are called processes. Directed edges are called communication channels (or simply channels) as shown in figure. <br />Examples of systems that have fault-tolerant distributed implementations are databases, operatingsystems, communication busses, file systems, and server groups .<br />Distributed System<br />
  5. 5. Scenario of Distributed System<br />
  6. 6. A fault-tolerant system is one that continues to provide the required functionality in the presence of faults<br />One way to implement a fault tolerant system is :<br /><ul><li>To use a distributed collection of Processes </li></ul>FAULT TOLERANCE BASICS<br />
  7. 7. A fault is a violation of a system’s underlying assumptions. <br />An error is an internal data state that reflects a fault. <br />A failure is an externally visible deviation from specifications<br />Definition of Faults<br />
  8. 8. Data Corruption<br />Hanging Processes<br />Misleading Return Values<br />Misbehaving Machines<br />Hardware/Software/Network Outages<br />Over commitment of Resources<br />Insufficient Disk Space<br />Experienced Faults in Distributed Systems<br />
  9. 9. Other Sources Of Faults in Distributed Systems<br />
  10. 10. Crash faults<br />Omission faults<br />Timing faults<br />Byzantine faults<br />Fault Classifications<br />
  11. 11. Once failure has occurred in many cases, it is important to recover critical processes to a known state in order to resume processing. <br />There are two approaches for the recovery in distributed environments.<br /><ul><li>Backward recovery
  12. 12. Forward recovery </li></ul>RECOVERY<br />
  13. 13. Series Model<br />Parallel Model<br />FAILURE MODELS IN DISTRIBUTED SYSTEMS<br />
  14. 14. The main goal of replication of data in distributed systems is maintaining copies on multiple computers <br />The main benefits of replication of data can be classified as follows:<br />Performance enhancement<br />Reliability enhancement<br />Data closer to client<br />Share workload<br />Increased availability<br />Increased fault tolerance<br />The constraints are classified below:<br />How to keep data consistency (need to ensure a satisfactorily consistent image for clients)<br />Where to place replicas and how updates are propagated<br />Scalability <br />REPLICATION OF DATA<br />
  15. 15. Any Querries<br />Thank You<br />