Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Distributed Systems-Unit 4

12,412 views

Published on

Failure Recovery in Distributed Systems: Concepts in Backward and Forward recovery,
Recovery in Concurrent systems, Obtaining consistent Checkpoints, Recovery in Distributed
Database Systems.
Fault Tolerance: Issues in Fault Tolerance, Commit Protocols, Voting protocols, Dynamic
voting protocols.

Published in: Engineering
  • Be the first to comment

Distributed Systems-Unit 4

  1. 1. LECTURE NOTES: DISTRIBUTED SYSTEM (ECS-701) MUKESH KUMAR DEPARTMENT OF INFORMATION TECHNOLOGY I.T.S ENGINEERING COLLEGE, GREATER NOIDA PLOT NO: 46, KNOWLEDGE PARK 3, GREATER NOIDA UNIT- 4 Failure Recovery and Fault Tolerance Basic Concept 1. A system consists of a set of hardware and software components and is designed to provide a specified service. 2. Failure of a system occurs when the system does not perform its services in the manner specified. 3. An erroneous state of the system is a state which could lead to a system failure by a sequence of valid state transitions 4. A fault is an anomalous physical condition. 5. An error is a manifestation of a fault in a system, which can lead to system failure. 6. Failure recovery is a process that involves restoring an erroneous state to an error-free state Failure A system is said to “fail” when it cannot meet its promises. A failure is brought about by the existence of “errors” in the system. The cause of an error is a “fault”. Fault may occur due to manufacturing problems, design errors, external disturbances, fatigue and deterioration or any combination of them.
  2. 2. LECTURE NOTES: DISTRIBUTED SYSTEM (ECS-701) MUKESH KUMAR DEPARTMENT OF INFORMATION TECHNOLOGY I.T.S ENGINEERING COLLEGE, GREATER NOIDA PLOT NO: 46, KNOWLEDGE PARK 3, GREATER NOIDA Type of Failure Classification of Failure 1. Process failure: In a Process failure, the computation results in an incorrect outcome. The system state deviates from the specified result and may cause failure. Process failure may be occur due to deadlocks, timeouts, protection violation, wrong input from user or consistency violation. 2. System failure: A system failure occurs when the processor fails to execute. It is caused by software errors and hardware error such as CPU failure, main memory failure, bus failure, power failure etc. 3. Secondary storage failure: A secondary storage failure occurs when the stored data cannot be accessed. This failure is usually caused by parity error, head crash, or dust particles on medium. 4. Communication medium failure: A communication medium failure occurs when a site cannot be communicated with another operational site in the network. It is usually caused by the failure of switching nodes or links of communicating system.
  3. 3. LECTURE NOTES: DISTRIBUTED SYSTEM (ECS-701) MUKESH KUMAR DEPARTMENT OF INFORMATION TECHNOLOGY I.T.S ENGINEERING COLLEGE, GREATER NOIDA PLOT NO: 46, KNOWLEDGE PARK 3, GREATER NOIDA Backward and Forward Error Recovery There are two approaches for restoring an erroneous state to an error free state. 1. Forward Error Recovery: If the nature of the errors and damages caused by the fault can be completely and accurately accessed, then it is possible to remove those errors in the process’s state or system’s state and enable the process or system to move forward. This technique is known as forward error recovery. 2. Backward Error Recovery: If the nature of the errors and damages caused by the fault cannot be completely and accurately accessed, then it is not possible to remove those errors in the process’s state or system’s state. To remove the errors or fault, the system state can be restored to the previous error-free stable state of the system. This technique is known as backward error recovery. Backward Error Recovery Vs Forward Error Recovery Advantages: 1. Backward error recovery is simpler than forward error recovery. 2. Backward error recovery is independent of an arbitrary fault. Disadvantages: 1. Performance Penalty: The overhead to restore a process or system state to a prior state can be quite high. 2. There is no guarantee that faults will not occur again when processing begins. 3. Some component of the process or system may not be recoverable. Backward Error Recovery - System Model The system is assumed to consist of a single machine, connected to a secondary storage system and a stable storage system. A stable storage does not lose information in the event of system failure and used to keep logs & recovery points.
  4. 4. LECTURE NOTES: DISTRIBUTED SYSTEM (ECS-701) MUKESH KUMAR DEPARTMENT OF INFORMATION TECHNOLOGY I.T.S ENGINEERING COLLEGE, GREATER NOIDA PLOT NO: 46, KNOWLEDGE PARK 3, GREATER NOIDA There are two ways to implement backward error recovery: 1. The operation based approach 2. The State based approach The Operation based Approach: In this approach, the sufficient details of all the modifications made in the transaction are recorded so that the previous stable state of the system can be restored by reversing all the changes made in that state. The information in the logs contains the name of object, the old state of the object and the new state of the object. This can be done in two ways: 1. Updating-in-place: In Update-in-Place, every update or write operation to an object updates the object and create a log in the stable storage. The recoverable update operation can be implemented as a collection of operations as follows a. A ‘do’ operation, which does the action or update and write a log record. b. An ‘undo’ operation, which undoes the action performed by ‘do’ operation. c. A ‘redo’ operation, which redoes the action specified by ‘do’ operation. d. An optional ‘display’ operation, which displays the log record. The major problem with update-in-place that a ‘do’ operation cannot be undone if the system crashes after the update operation but before a log record is stored. 2. The Write-Ahead-Log Protocol: In Write-Ahead-Log Protocol, The recoverable update operation can be implemented as a collection of operations as follows a. Update an object only after the ‘undo’ log is recorded. b. Before committing the update, ‘redo’ and ‘undo’ logs are recorded. State Based Approach

×