Dr S Palaniappan , Associate Professor , Department of ADS
Unit IV
Recovery & Consensus
Chapter 13 and 14
1
Dr S Palaniappan , Associate Professor , Department of ADS
Planned
Hour Recovery & Consensus Relevant CO
Nos
Highest
Cognitive
level**
1 Checkpointing and rollback recovery: Introduction CO5 K2
1 Checkpoint-based recovery CO5 K2
1 Log-based rollback recovery CO5 K2
1 Coordinated checkpointing algorithm CO5 K2
1 Algorithm for asynchronous checkpointing and recovery. CO5 K2
1 Consensus and agreement algorithms: Problem definition CO5 K2
1 Overview of results CO5 K2
1 Agreement in a failure –free system CO5 K2
1 Agreement in synchronous systems with failures. CO5 K2
Dr S Palaniappan , Associate Professor , Department of ADS
• Distributed systems are subject to failures.
• And the vast computing potential of these systems is often hampered by their
susceptibility to failures
• Fault tolerance
• Offers the abstraction of a reliable system that tolerates faults
• Fault-recovery
• Once a fault occurs, we want to recover to a previous correct state
Introduction
Dr S Palaniappan , Associate Professor , Department of ADS
Treat a distributed system application as a
collection of processes that communicate
over a network
Dr S Palaniappan , Associate Professor , Department of ADS
Rollback Recovery
• Restore the system back to a consistent state after a failure
• Achieve fault tolerance by periodically saving the state of a process during the failure-
free execution
Dr S Palaniappan , Associate Professor , Department of ADS
Checkpointing
• Checkpointing is a technique that provides fault tolerance for computing systems.
• It basically consists of saving a snapshot of the application's state, so that
applications can restart from that point in case of failure
• The saved states of a process
Dr S Palaniappan , Associate Professor , Department of ADS
• Why is rollback recovery of distributed systems complicated? – messages induce
inter-process dependencies during failure-free operation
• Rollback propagation
– The dependencies may force some of the processes that did not fail to roll back
– This phenomenon is called “domino effect”
Dr S Palaniappan , Associate Professor , Department of ADS
Dr S Palaniappan , Associate Professor , Department of ADS
M 1
P 1
P 2
Rollback Scenario
This phenomenon of cascaded rollback is called the domino effect
Propagation may extend back to the initial state of the computation, losing all the work
performed before the failure.
The dependencies may force some of the processes that
did not fail also to roll back
Dr S Palaniappan , Associate Professor , Department of ADS
Dominos effect
m1
P i
P j
P k
m2
m3
m4
m5
m6
Recovery Line
Dr S Palaniappan , Associate Professor , Department of ADS
Dominos effect
m1
P i
P j
P k
m2
m3
m6
m8
m7
Recovery Line
m4
m5
Dr S Palaniappan , Associate Professor , Department of ADS
• If each process takes its checkpoints independently, then the system can not avoid
the domino effect
– This scheme is called independent or uncoordinated checkpointing
• Techniques that avoid domino effect
• Coordinated checkpointing rollback recovery.
Processes coordinate their checkpoints to form a system-wide consistent
state
• Communication-induced checkpointing rollback recovery
Forces each process to take checkpoints based on information piggybacked
on the application
Dr S Palaniappan , Associate Professor , Department of ADS
• Log-based rollback recovery
• Combines checkpointing with logging of non-deterministic events.
• Relies on piecewise deterministic (pwd) assumption
• In general it enables a system to recover beyond the most recent set of consistent
checkpoints.
• Particularly attractive for applications that frequently interact with the outside
world that cannot rollback
Dr S Palaniappan , Associate Professor , Department of ADS
Taxonomy
Dr S Palaniappan , Associate Professor , Department of ADS
Background and definitions
• System model
• Fixed number N of processes
• Processes cooperate to execute a distributed application and interact with the
outside world by receiving and sending input and output messages, respectively
Outside World Process (OWP)
Dr S Palaniappan , Associate Professor , Department of ADS
A local checkpoint
• All processes save their local states at certain instants of time
• A local check point is a snapshot of the state of the process at a given instance
• Assumption
– A process stores all local checkpoints on the stable storage
– A process is able to roll back to any of its existing local checkpoints
Dr S Palaniappan , Associate Professor , Department of ADS
Consistent states
• A global state of a distributed system
• A collection of the individual states of all participating processes and the states of the
communication channels
• Consistent global state
• A global state that may occur during a failure-free execution of distribution of distributed
computation
• If a process‟s state reflects a message receipt, then the state of the corresponding sender
must reflect the sending of the message
Dr S Palaniappan , Associate Professor , Department of ADS
Dr S Palaniappan , Associate Professor , Department of ADS
• A global checkpoint
• A set of local checkpoints, one from each process
• A consistent global checkpoint
• A global checkpoint such that no message is sent by a process after taking its
local point that is received by another process before taking its checkpoint
Dr S Palaniappan , Associate Professor , Department of ADS
The dotted line - recovery line
The parallel lines - Checkpoints
Dr S Palaniappan , Associate Professor , Department of ADS
Type of messages
• A process failure and subsequent recovery may leave messages that were perfectly
received (and processed) before the failure in abnormal states
• A rollback of processes for recovery may have to rollback the send and receive
operations of several messages
M 1
P 1
P 2
Dr S Palaniappan , Associate Professor , Department of ADS
Dr S Palaniappan , Associate Professor , Department of ADS
In-transit messages
• Messages sent, not yet received before recovery line
• No problem with global state consistency
• Can be lost or delayed`
Dr S Palaniappan , Associate Professor , Department of ADS
P i
P j
P k
m1
Recovery Line
m5
m2 m4
m3
Dr S Palaniappan , Associate Professor , Department of ADS
Lost messages
• Messages whose send is done but receive is undone due to rollback (After recovery) . It must
be replayed
• If implemented over
• unreliable communication, application is responsible
• reliable communication, recovery algorithm is responsible
Dr S Palaniappan , Associate Professor , Department of ADS
Delayed messages
• Messages whose receive is not recorded because the receiving process was either
down or the message arrived after the rollback of the receiving process
Duplicated Msg
Dr S Palaniappan , Associate Professor , Department of ADS
Orphan messages
• Messages with “receive‟ recorded but message “send‟ not recorded
• Do not arise if processes roll back to a consistent global state
Duplicate messages
• Arise due to message logging and replaying during process recovery
Dr S Palaniappan , Associate Professor , Department of ADS
Issues in Failure recovery
• We must not only restore the system to a consistent state, but also appropriately
handle messages that are left in an abnormal state due to the failure and recovery.
• The computation comprises of three processes Pi, Pj , and Pk, connected through a
communication network.
• The processes communicate solely by exchanging messages over fault-free, FIFO
communication channels.
Dr S Palaniappan , Associate Professor , Department of ADS
Issues in Failure recovery
m1
P i
P j
P k
m2
m3 m4
m5
Ci,0
Cj,0
Ck,0
Ci,1
Cj,1
Ck,1 Ck,2
Cj,2
The rollback of process 𝑃𝑖 to checkpoint 𝐶𝑖,1 created an orphan message m4
Orphan message m5 is created due to the roll back of process 𝑃𝑗 to checkpoint 𝐶j,2
m6
Checkpoints :
{𝐶𝑖,0, 𝐶𝑖,1},
{𝐶𝑗,0, 𝐶𝑗,1, 𝐶𝑗,2}, and
{𝐶𝑘,0, 𝐶𝑘,1, 𝐶𝑘,2}
Dr S Palaniappan , Associate Professor , Department of ADS
Issues in Failure recovery
m1
P i
P j
P k
m2
m3 m4
m5
Ci,0
Cj,0
Ck,0
Ci,1
Cj,1
Ck,1 Ck,2
Cj,2
Message m6: a lost message since the send event for m6 is recorded in the restored state for 𝑃𝑗 ,
but the receive event has been undone at process 𝑃𝑖 .
m6
Dr S Palaniappan , Associate Professor , Department of ADS
Issues in Failure recovery
m1
P i
P j
P k
m2
m3 m4
m5
Ci,0
Cj,0
Ck,0
Ci,1
Cj,1
Ck,1 Ck,2
Cj,2
Messages m7 & m8: delayed orphan messages. After resuming execution
from their checkpoints, processes will generate both of these messages
m6
m7
m8
Solution : Log of
all the sent
messages
Dr S Palaniappan , Associate Professor , Department of ADS
Uncoordinated Checkpointing
• Each process has autonomy in deciding when to take checkpoints positions
• Advantage
• No synchronization overhead and it allows processes to take checkpoints when it is
most convenient or efficient
• Lower runtime overhead during normal execution
Dr S Palaniappan , Associate Professor , Department of ADS
• Disadvantages
• Domino effect during a recovery
• Possible useless checkpoints
• Recovery from a failure is slow - because processes need to iterate to find a
consistent set of checkpoints - useless
• Each process maintains multiple checkpoints and periodically invoke a garbage
collection algorithm.
• Garbage collection is needed
• Not suitable for applications with outside world interaction (output commit)
Dr S Palaniappan , Associate Professor , Department of ADS
Direct Dependency Tracking Technique
i is the process ID
x is the checkpoint index
xth checkpoint of process Pi,
P j
P i
Cj,0
Ci,0
Cj,1
Ci,1
Cj,y-1
Ci,x-1
Cj,y
Ci,x
m
i,x
Checkpoint Interval
Dr S Palaniappan , Associate Professor , Department of ADS
Coordinated Checkpointing
• Coordinated Checkpointing requires processes to orchestrate their checkpoints in order
to form a consistent global state.
• Advantages
• No domino effect
• Each process is required to maintain only one checkpoint (the last)
• No computation of recovery line is required
• Disadvantages
• Large latency in committing output
• Two approaches
• Blocking
• Non-blocking
Dr S Palaniappan , Associate Professor , Department of ADS
2 Phase coordinated Checkpointing protocol
The computation is blocked during the Checkpointing
Dr S Palaniappan , Associate Professor , Department of ADS
Non-blocking checkpoint coordination
• Need not stop their execution while taking checkpoints.
• Fundamental Problem - Prevent a process from receiving application messages that could
make the checkpoint inconsistent
P i
Initiator
Ci,x
Cj,x
Pj
m
Dr S Palaniappan , Associate Professor , Department of ADS
• Idea - Chandy and lamport snapshot algorithm in which markers play the role of the
checkpoint request messages
• The initiator takes a checkpoint and sends a marker (a checkpoint request) on all
outgoing channels
P i
Initiator
Ci,x
Pj
m
Cj,x
Each process takes a checkpoint
upon receiving the first marker
And sends the marker on all
outgoing channels before sending
any application message
Dr S Palaniappan , Associate Professor , Department of ADS
Communication-induced checkpointing
• Processes may be forced to take additional checkpoints and thus process
independence is constrained to guarantee the eventual progress of the recovery line.
• Autonomous and Forced checkpoints – Local checkpoints and forced checkpoints
• Communication-induced checkpointing piggybacks protocol-related information on
each application message –
• The receiver determine if it has to take a forced checkpoint to advance the global
recovery line.
Dr S Palaniappan , Associate Professor , Department of ADS
• The forced checkpoint must be taken before the application may process the contents
of the message
• In contrast with coordinated checkpointing, no special coordination messages are
exchanged
• Two types of communication-induced checkpointing
– model-based checkpointing and
- index-based checkpointing.
Dr S Palaniappan , Associate Professor , Department of ADS
Log-based Rollback Recovery
• A log-based rollback recovery makes use of deterministic and nondeterministic
events in a computation.
• A non-deterministic - receipt of a message from another process or an event internal
to the process.(m0, m3, m7)
• A message send event is not a non-deterministic event.
Dr S Palaniappan , Associate Professor , Department of ADS
• This assumes that all non-deterministic events can be identified and their
corresponding determinants can be logged into the stable storage.
• How it works
• Each process logs on stable storage the determinants of all non-deterministic
events
• Additionally, checkpoints are taken to reduce the extent of rollback during
recovery
Dr S Palaniappan , Associate Professor , Department of ADS
The no-orphans consistency condition
• Let e be a non-deterministic event that occurs at process p. We define the following
• Depend(e)–set of processes affected by event e
• Log(e)–set of processes with e logged on volatile memory
• Stable(e)–set of processes with e logged on stable storage
Dr S Palaniappan , Associate Professor , Department of ADS
Pessimistic logging
• Assume that a failure can occur after any non-deterministic event in the computation.
• Determinant is logged to stable storage before message is delivered
• Pessimistic protocols implement the following property, often referred to as
synchronous logging.
• If an event has not been logged on the stable storage, then no process can
depend on it.
• It is a stronger version of Always-no-orphan
Dr S Palaniappan , Associate Professor , Department of ADS
Suppose processes 𝑃2 fail as shown, restart from checkpoint , and roll forward using their
determinant logs to deliver again the same sequence of messages as in the pre-failure
execution
Once the recovery is complete, the processes p2 will be consistent with the state of 𝑃1 that
includes the receipt of message 𝑚2 from 𝑃1
Dr S Palaniappan , Associate Professor , Department of ADS
Optimistic Logging
• Processes log determinants asynchronously to the stable storage
• Optimistically assume that logging will be complete before a failure occurs.
• Determinants are kept in a volatile log, and are periodically flushed to the stable
storage
• This do not implement the always-no-orphans condition.
Dr S Palaniappan , Associate Professor , Department of ADS
• To perform rollbacks correctly, optimistic logging protocols track causal dependencies
during failure free execution
• Pessimistic protocols need only keep the most recent checkpoint of each process,
whereas optimistic protocols may need to keep multiple checkpoints for each process
Dr S Palaniappan , Associate Professor , Department of ADS
Causal logging
• Causal logging combines the advantages of both pessimistic and optimistic logging
• Like optimistic logging, it does not require synchronous access to the stable storage
except during output commit
• Like pessimistic logging, it allows each process to commit output independently and
never creates orphans, thus isolating processes from the effects of failures at other
processes
• Make sure that the always-no-orphans property holds
• Each process maintains information about all the events that have causally affected
its state
Dr S Palaniappan , Associate Professor , Department of ADS
Process P0 at state X will have logged the determinants of the non-deterministic events that causally
precede its state according to Lamport’s happened-before relation.
Dr S Palaniappan , Associate Professor , Department of ADS
Koo–Toueg coordinated checkpointing
algorithm
• Assumptions made:
• Processes communicate by exchanging messages through communication channels
(FIFO)
• End-to-end protocols – cope with message loss due to rollback recovery and
communication failure
• Communication failures do not partition the network.
• Two kinds of checkpoints: permanent and tentative
• Local checkpoint, part of a consistent global checkpoint
• Temporary checkpoint, become permanent checkpoint when the algorithm
terminates successfully
Dr S Palaniappan , Associate Professor , Department of ADS
Two phases of the algorithm
First Phase:
Dr S Palaniappan , Associate Professor , Department of ADS
• Pi informs all the processes of the decision it reached at the end of the first phase.
• A process, on receiving the message from Pi, will act accordingly.
• A set of permanent checkpoints taken by this algorithm is consistent because of the
following two reason.
• Either all or none of the processes take permanent checkpoint
• No process sends message after taking permanent checkpoint
Two phases of the algorithm
Second Phase:
Dr S Palaniappan , Associate Professor , Department of ADS
An optimization
Consistent Checkpoint
X decides to initiate the checkpointing
algorithm after receiving message m
Dr S Palaniappan , Associate Professor , Department of ADS
The rollback recovery algorithm
• The rollback recovery algorithm restores the system state to a consistent state after a
failure
• 2 phases
• The initiating process send a message to all other processes and ask for the
preferences – restarting to the previous checkpoints. All need to agree about either
do or not.
• The initiating process send the final decision to all processes, all the processes act
accordingly after receiving the final decision.
Dr S Palaniappan , Associate Professor , Department of ADS
Juang–Venkatesan algorithm for asynchronous
checkpointing and recovery
• Assumptions:
• Communication channels are reliable,
• Delivery messages in FIFO order, infinite buffers,
• Message transmission delay is arbitrary but finite
• Underlying computation/application is event-driven:
• process P is at state s, receives message m, processes the message, moves to state
s’ and send messages out.
• So the triplet (s, m, msgs_sent) represents the state of P
Dr S Palaniappan , Associate Professor , Department of ADS
Two type of log storage are maintained:
• Volatile log: short time to access but lost if processor crash. Move to stable log
periodically.
• Stable log: longer time to access but remained if crashed
Asynchronous checkpointing:
• After executing an event, the triplet is recorded without any synchronization with
other processes.
• Local checkpoint consist of set of records, first are stored in volatile log, then
moved to stable log.
Dr S Palaniappan , Associate Professor , Department of ADS
Recovery algorithm
–Notations:
• 𝑅𝐶𝑉𝐷𝑖←𝑗(𝐶𝑘𝑃𝑡𝑖): number of messages received by 𝑝𝑖 from 𝑝𝑗, from the beginning of
computation to checkpoint 𝐶𝑘𝑃𝑡𝑖
• 𝑆𝐸𝑁𝑇𝑖→𝑗(𝐶𝑘𝑃𝑡𝑖): number of messages sent by 𝑝𝑖 to 𝑝𝑗, from the beginning of
computation to checkpoint 𝐶𝑘𝑃𝑡𝑖
–Idea:
• From the set of checkpoints, find a set of consistent checkpoints
• Doing that based on the number of messages sent and received
• Orphan  if RCVDi←jCkPti > SENTj→iCkPtj
Dr S Palaniappan , Associate Professor , Department of ADS
Orpahan Message
RCVDx←yCkPtx > SENTy→xCkPty
Note that if X and Z rolls back to checkpoint ex1 and ez1, then it will be consistent with Y ’s state,
ey1.
Y Crashes
2>1
Dr S Palaniappan , Associate Professor , Department of ADS
If event ey2 is the latest checkpointed event at Y , then Y will restart from the state
corresponding to ey2
Because of the broadcast nature of ROLLBACK messages, the recovery algorithm is also
initiated at processors X and Z
Y Crashes
Juang–Venkatesan algorithm
Dr S Palaniappan , Associate Professor , Department of ADS
Initially X, Y , and Z set
CkPtX ← ex3,
CkPtY ← ey2 and
CkPtZ ← ez2,
Dr S Palaniappan , Associate Professor , Department of ADS
• And the following messages during the first iteration:
• Y sends ROLLBACK(Y, 2) to X and ROLLBACK(Y, 1) to Z;
• X sends ROLLBACK(X, 2) to Y and ROLLBACK(X, 0) to Z; and
• Z sends ROLLBACK(Z, 0) to X and ROLLBACK(Z, 1) to Y .
Dr S Palaniappan , Associate Professor , Department of ADS
• RCVDX←Y (CkPtX) = 3 > 2 (2 is the value received in the ROLLBACK(Y , 2) message
from Y ), X will set CkPtX to ex2 satisfying RCVDX←Y (ex2) = 2 ≤ 2
Dr S Palaniappan , Associate Professor , Department of ADS
• Since RCVDZ←Y (CkPtZ) = 2 > 1, Z will set CkPtZ to ez1 satisfying RCVDZ←Y (ez1) =
1 ≤ 1. At
Dr S Palaniappan , Associate Professor , Department of ADS
In the second iteration,
Y sends ROLLBACK(Y, 2) to X and ROLLBACK(Y, 1) to Z;
Z sends ROLLBACK(Z, 1) to Y and ROLLBACK(Z, 0) to X;
X sends ROLLBACK(X, 0) to Z and ROLLBACK(X, 1) to Y
{ex2, ey2, ez1}, is consistent, and no further rollback occurs.
Dr S Palaniappan , Associate Professor , Department of ADS
ALGORITHM
Dr S Palaniappan , Associate Professor , Department of ADS
Dr S Palaniappan , Associate Professor , Department of ADS
Consensus and Agreement
• Agreement among the processes in a
distributed system is a fundamental
requirement for a wide range of
applications.
• Example - commit decision in database
systems
• A distributed key-value store had multiple
nodes
• Sys 1# “Transfer 1000 From A to B”
• Sys 2 # “Transfer 3000 From A to B”
Dr S Palaniappan , Associate Professor , Department of ADS
Classification of Faults
• Based on the components that fail
• Program / Process
• Processor / machine
• Link
• Storage
• Based on the behaviour of faulty components
• Crash – Just halts
• Fail-stop – crash with additional conditions
• Omission – process or channel fails to do something that it is expected
• Byzantine – behaves arbitrarily - imperfect information
• Timing – correct output, but provided outside a specified time interval
Dr S Palaniappan , Associate Professor , Department of ADS
• Some assumptions underlying our study of agreement algorithms:
Failure models
• Among the n processes in the system, at most f processes can be faulty. A faulty
process can behave in any manner allowed by the failure model assumed.
• The various failure models – fail-stop, send omission and receive omission, and
Byzantine failures
Dr S Palaniappan , Associate Professor , Department of ADS
• Synchronous communication
• Process run in lock step manner . Process receives a message sent to it earlier,
performs computation and sends a message to other process
• One step of synchronous computation is called round
• asynchronous communication
• Computation does not proceed in lock step
• Process can send or receive and perform computation at any time
• Network connectivity
• The system has full logical connectivity, i.e., each process can communicate with
any other by direct message passing
Dr S Palaniappan , Associate Professor , Department of ADS
• Sender identification
• A process that receives a message always knows the identity of the sender process.
• Channel reliability
• The channels are reliable, and only the processes may fail
• Authenticated vs. non-authenticated messages
• Using authentication via techniques such as digital signatures, it is easier to solve the
agreement problem
• it can forge the message or it can also tamper with the contents
• Agreement variable
• The agreement variable may be boolean or multivalued, and need not be an integer
Dr S Palaniappan , Associate Professor , Department of ADS
Byzantine Generals Problem
Four camps of the attacking army, each commanded by a general, are camped around the
fort of Byzantium
Dr S Palaniappan , Associate Professor , Department of ADS
Byzantine Generals Problem
They can succeed in attacking only if they attack simultaneously.
Generals should reach a consensus on the plan
● It could be ATTACK
Dr S Palaniappan , Associate Professor , Department of ADS
Byzantine Generals Problem
Generals should reach a consensus on the plan
● Or RETREAT
Dr S Palaniappan , Associate Professor , Department of ADS
• The only way they can communicate is to send messengers among themselves.
• An asynchronous system is modeled by messengers taking an unbounded time to
travel between two camps.
• A lost message is modeled by a messenger being captured by the enemy
Dr S Palaniappan , Associate Professor , Department of ADS
Byzantine Generals Problem
But there might be traitors
 All loyal generals should reach a consensus
Dr S Palaniappan , Associate Professor , Department of ADS
Byzantine Generals Problem
But traitors can act Arbitrarily
 All loyal generals should reach a consensus
Dr S Palaniappan , Associate Professor , Department of ADS
• The traitor will attempt to subvert the agreement-reaching mechanism, by giving
misleading information to the other generals
Dr S Palaniappan , Associate Professor , Department of ADS
The Solution
A commanding general (designated process, called the source process) sends an
order to his n-1 lieutenant generals such that
Dr S Palaniappan , Associate Professor , Department of ADS
The Solution
IC1  All loyal lieutenants obey the same order.
IC2  If the commanding general is loyal, then every loyal lieutenant obeys the
order he sends.
Dr S Palaniappan , Associate Professor , Department of ADS
The Solution
IC1  All loyal lieutenants obey the same order.
IC2  If the commanding general is loyal, then every loyal lieutenant obeys the
order he sends.
Dr S Palaniappan , Associate Professor , Department of ADS
The Solution
Consistency/Agreement
IC2  If the commanding general is loyal, then every loyal lieutenant obeys the
order he sends.
Agreement: All non-faulty processes must agree on the same value
Dr S Palaniappan , Associate Professor , Department of ADS
The Solution
Consistency/Agreement
Validity
Termination/ Liveness
Validity: If the source process is non-faulty, then the agreed upon value by all the
non-faulty processes must be the same as the initial value of the source.
Termination: Each non-faulty process must eventually decide on a value.
Dr S Palaniappan , Associate Professor , Department of ADS
The consensus problem
• The consensus problem differs from the Byzantine agreement problem in that each
process has an initial value and all the correct processes must agree on a single value
.
• Agreement All non-faulty processes must agree on the same (single) value.
• Validity If all the non-faulty processes have the same initial value, then the agreed
upon value by all the non-faulty processes must be that same value.
• Termination Each non-faulty process must eventually decide on a value.
Dr S Palaniappan , Associate Professor , Department of ADS
The interactive consistency problem
• The interactive consistency problem differs from the Byzantine agreement problem
in that each process has an initial value, and all the correct processes must agree
upon a set of values, with one value for each process.
• Agreement All non-faulty processes must agree on the same array of values
A[v1…vn].
• Validity If process i is non-faulty and its initial value is vi, then all non faulty
processes agree on vi as the ith element of the array A. If process j is faulty, then the
non-faulty processes can agree on any value for Aj.
• Termination Each non-faulty process must eventually decide on the array A.
Dr S Palaniappan , Associate Professor , Department of ADS 86
Dr S Palaniappan , Associate Professor , Department of ADS
Overview of results
• Overview of results on agreement. f denotes number of failure-prone processes. n is
the total number of processes.
Dr S Palaniappan , Associate Professor , Department of ADS
• Consensus is not solvable in asynchronous systems even if one process can fail by
crashing
• Asynchronous message passing and shared memory deal with trying to solve
consensus
Dr S Palaniappan , Associate Professor , Department of ADS
Impossibility of achieving Byzantine agreement
with n = 3 processes and f = 1 malicious process.
• With n = 3 processes, the Byzantine agreement problem cannot be solved if the
number of Byzantine processes f = 1.
89
Dr S Palaniappan , Associate Professor , Department of ADS
Agreement in a failure-free system
• The consensus can be reached by collecting information - arriving at a “decision,” and
distributing this decision.
• The decision can be reached by - majority, max, and min functions
• In a synchronous system, this can be done simply in a constant number of rounds
• In an asynchronous system, consensus can similarly be reached in a constant number
of message hops
Dr S Palaniappan , Associate Professor , Department of ADS
Agreement in (message-passing) synchronous
systems with failures
• Up to f (< n) crash failures possible.
• In f + 1 rounds, at least one round has no failures.
• Now justify: agreement, validity, termination conditions are satisfied.
• Complexity: O(f + 1)n2 messages
• f + 1 is lower bound on number of rounds
Dr S Palaniappan , Associate Professor , Department of ADS
• The correct generals reach agreement in two rounds of messages:
1. In the first round, the commander sends a value to each of the lieutenants.
2. In the second round, each of the lieutenants send the value it receives to its peers.
When both rounds are finished the correct lieutenants simply apply a majority function
to determine the value
If the commander is faulty, all the lieutenants faithfully report the values they received
but there will be no agreement.
If one of the lieutenants is faulty, the correct lieutenants will receive N −2 replicas of the
correct value, plus one incorrect one: the majority function will determine the correct
value.
Dr S Palaniappan , Associate Professor , Department of ADS
Consensus with up to f fail-stop processes in a
system of n processes, n > f
Dr S Palaniappan , Associate Professor , Department of ADS
Dr S Palaniappan , Associate Professor , Department of ADS

unit 3 all about the data science process, covering the steps present in almost every data science project.

  • 1.
    Dr S Palaniappan, Associate Professor , Department of ADS Unit IV Recovery & Consensus Chapter 13 and 14 1
  • 2.
    Dr S Palaniappan, Associate Professor , Department of ADS Planned Hour Recovery & Consensus Relevant CO Nos Highest Cognitive level** 1 Checkpointing and rollback recovery: Introduction CO5 K2 1 Checkpoint-based recovery CO5 K2 1 Log-based rollback recovery CO5 K2 1 Coordinated checkpointing algorithm CO5 K2 1 Algorithm for asynchronous checkpointing and recovery. CO5 K2 1 Consensus and agreement algorithms: Problem definition CO5 K2 1 Overview of results CO5 K2 1 Agreement in a failure –free system CO5 K2 1 Agreement in synchronous systems with failures. CO5 K2
  • 3.
    Dr S Palaniappan, Associate Professor , Department of ADS • Distributed systems are subject to failures. • And the vast computing potential of these systems is often hampered by their susceptibility to failures • Fault tolerance • Offers the abstraction of a reliable system that tolerates faults • Fault-recovery • Once a fault occurs, we want to recover to a previous correct state Introduction
  • 4.
    Dr S Palaniappan, Associate Professor , Department of ADS Treat a distributed system application as a collection of processes that communicate over a network
  • 5.
    Dr S Palaniappan, Associate Professor , Department of ADS Rollback Recovery • Restore the system back to a consistent state after a failure • Achieve fault tolerance by periodically saving the state of a process during the failure- free execution
  • 6.
    Dr S Palaniappan, Associate Professor , Department of ADS Checkpointing • Checkpointing is a technique that provides fault tolerance for computing systems. • It basically consists of saving a snapshot of the application's state, so that applications can restart from that point in case of failure • The saved states of a process
  • 7.
    Dr S Palaniappan, Associate Professor , Department of ADS • Why is rollback recovery of distributed systems complicated? – messages induce inter-process dependencies during failure-free operation • Rollback propagation – The dependencies may force some of the processes that did not fail to roll back – This phenomenon is called “domino effect”
  • 8.
    Dr S Palaniappan, Associate Professor , Department of ADS
  • 9.
    Dr S Palaniappan, Associate Professor , Department of ADS M 1 P 1 P 2 Rollback Scenario This phenomenon of cascaded rollback is called the domino effect Propagation may extend back to the initial state of the computation, losing all the work performed before the failure. The dependencies may force some of the processes that did not fail also to roll back
  • 10.
    Dr S Palaniappan, Associate Professor , Department of ADS Dominos effect m1 P i P j P k m2 m3 m4 m5 m6 Recovery Line
  • 11.
    Dr S Palaniappan, Associate Professor , Department of ADS Dominos effect m1 P i P j P k m2 m3 m6 m8 m7 Recovery Line m4 m5
  • 12.
    Dr S Palaniappan, Associate Professor , Department of ADS • If each process takes its checkpoints independently, then the system can not avoid the domino effect – This scheme is called independent or uncoordinated checkpointing • Techniques that avoid domino effect • Coordinated checkpointing rollback recovery. Processes coordinate their checkpoints to form a system-wide consistent state • Communication-induced checkpointing rollback recovery Forces each process to take checkpoints based on information piggybacked on the application
  • 13.
    Dr S Palaniappan, Associate Professor , Department of ADS • Log-based rollback recovery • Combines checkpointing with logging of non-deterministic events. • Relies on piecewise deterministic (pwd) assumption • In general it enables a system to recover beyond the most recent set of consistent checkpoints. • Particularly attractive for applications that frequently interact with the outside world that cannot rollback
  • 14.
    Dr S Palaniappan, Associate Professor , Department of ADS Taxonomy
  • 15.
    Dr S Palaniappan, Associate Professor , Department of ADS Background and definitions • System model • Fixed number N of processes • Processes cooperate to execute a distributed application and interact with the outside world by receiving and sending input and output messages, respectively Outside World Process (OWP)
  • 16.
    Dr S Palaniappan, Associate Professor , Department of ADS A local checkpoint • All processes save their local states at certain instants of time • A local check point is a snapshot of the state of the process at a given instance • Assumption – A process stores all local checkpoints on the stable storage – A process is able to roll back to any of its existing local checkpoints
  • 17.
    Dr S Palaniappan, Associate Professor , Department of ADS Consistent states • A global state of a distributed system • A collection of the individual states of all participating processes and the states of the communication channels • Consistent global state • A global state that may occur during a failure-free execution of distribution of distributed computation • If a process‟s state reflects a message receipt, then the state of the corresponding sender must reflect the sending of the message
  • 18.
    Dr S Palaniappan, Associate Professor , Department of ADS
  • 19.
    Dr S Palaniappan, Associate Professor , Department of ADS • A global checkpoint • A set of local checkpoints, one from each process • A consistent global checkpoint • A global checkpoint such that no message is sent by a process after taking its local point that is received by another process before taking its checkpoint
  • 20.
    Dr S Palaniappan, Associate Professor , Department of ADS The dotted line - recovery line The parallel lines - Checkpoints
  • 21.
    Dr S Palaniappan, Associate Professor , Department of ADS Type of messages • A process failure and subsequent recovery may leave messages that were perfectly received (and processed) before the failure in abnormal states • A rollback of processes for recovery may have to rollback the send and receive operations of several messages M 1 P 1 P 2
  • 22.
    Dr S Palaniappan, Associate Professor , Department of ADS
  • 23.
    Dr S Palaniappan, Associate Professor , Department of ADS In-transit messages • Messages sent, not yet received before recovery line • No problem with global state consistency • Can be lost or delayed`
  • 24.
    Dr S Palaniappan, Associate Professor , Department of ADS P i P j P k m1 Recovery Line m5 m2 m4 m3
  • 25.
    Dr S Palaniappan, Associate Professor , Department of ADS Lost messages • Messages whose send is done but receive is undone due to rollback (After recovery) . It must be replayed • If implemented over • unreliable communication, application is responsible • reliable communication, recovery algorithm is responsible
  • 26.
    Dr S Palaniappan, Associate Professor , Department of ADS Delayed messages • Messages whose receive is not recorded because the receiving process was either down or the message arrived after the rollback of the receiving process Duplicated Msg
  • 27.
    Dr S Palaniappan, Associate Professor , Department of ADS Orphan messages • Messages with “receive‟ recorded but message “send‟ not recorded • Do not arise if processes roll back to a consistent global state Duplicate messages • Arise due to message logging and replaying during process recovery
  • 28.
    Dr S Palaniappan, Associate Professor , Department of ADS Issues in Failure recovery • We must not only restore the system to a consistent state, but also appropriately handle messages that are left in an abnormal state due to the failure and recovery. • The computation comprises of three processes Pi, Pj , and Pk, connected through a communication network. • The processes communicate solely by exchanging messages over fault-free, FIFO communication channels.
  • 29.
    Dr S Palaniappan, Associate Professor , Department of ADS Issues in Failure recovery m1 P i P j P k m2 m3 m4 m5 Ci,0 Cj,0 Ck,0 Ci,1 Cj,1 Ck,1 Ck,2 Cj,2 The rollback of process 𝑃𝑖 to checkpoint 𝐶𝑖,1 created an orphan message m4 Orphan message m5 is created due to the roll back of process 𝑃𝑗 to checkpoint 𝐶j,2 m6 Checkpoints : {𝐶𝑖,0, 𝐶𝑖,1}, {𝐶𝑗,0, 𝐶𝑗,1, 𝐶𝑗,2}, and {𝐶𝑘,0, 𝐶𝑘,1, 𝐶𝑘,2}
  • 30.
    Dr S Palaniappan, Associate Professor , Department of ADS Issues in Failure recovery m1 P i P j P k m2 m3 m4 m5 Ci,0 Cj,0 Ck,0 Ci,1 Cj,1 Ck,1 Ck,2 Cj,2 Message m6: a lost message since the send event for m6 is recorded in the restored state for 𝑃𝑗 , but the receive event has been undone at process 𝑃𝑖 . m6
  • 31.
    Dr S Palaniappan, Associate Professor , Department of ADS Issues in Failure recovery m1 P i P j P k m2 m3 m4 m5 Ci,0 Cj,0 Ck,0 Ci,1 Cj,1 Ck,1 Ck,2 Cj,2 Messages m7 & m8: delayed orphan messages. After resuming execution from their checkpoints, processes will generate both of these messages m6 m7 m8 Solution : Log of all the sent messages
  • 32.
    Dr S Palaniappan, Associate Professor , Department of ADS Uncoordinated Checkpointing • Each process has autonomy in deciding when to take checkpoints positions • Advantage • No synchronization overhead and it allows processes to take checkpoints when it is most convenient or efficient • Lower runtime overhead during normal execution
  • 33.
    Dr S Palaniappan, Associate Professor , Department of ADS • Disadvantages • Domino effect during a recovery • Possible useless checkpoints • Recovery from a failure is slow - because processes need to iterate to find a consistent set of checkpoints - useless • Each process maintains multiple checkpoints and periodically invoke a garbage collection algorithm. • Garbage collection is needed • Not suitable for applications with outside world interaction (output commit)
  • 34.
    Dr S Palaniappan, Associate Professor , Department of ADS Direct Dependency Tracking Technique i is the process ID x is the checkpoint index xth checkpoint of process Pi, P j P i Cj,0 Ci,0 Cj,1 Ci,1 Cj,y-1 Ci,x-1 Cj,y Ci,x m i,x Checkpoint Interval
  • 35.
    Dr S Palaniappan, Associate Professor , Department of ADS Coordinated Checkpointing • Coordinated Checkpointing requires processes to orchestrate their checkpoints in order to form a consistent global state. • Advantages • No domino effect • Each process is required to maintain only one checkpoint (the last) • No computation of recovery line is required • Disadvantages • Large latency in committing output • Two approaches • Blocking • Non-blocking
  • 36.
    Dr S Palaniappan, Associate Professor , Department of ADS 2 Phase coordinated Checkpointing protocol The computation is blocked during the Checkpointing
  • 37.
    Dr S Palaniappan, Associate Professor , Department of ADS Non-blocking checkpoint coordination • Need not stop their execution while taking checkpoints. • Fundamental Problem - Prevent a process from receiving application messages that could make the checkpoint inconsistent P i Initiator Ci,x Cj,x Pj m
  • 38.
    Dr S Palaniappan, Associate Professor , Department of ADS • Idea - Chandy and lamport snapshot algorithm in which markers play the role of the checkpoint request messages • The initiator takes a checkpoint and sends a marker (a checkpoint request) on all outgoing channels P i Initiator Ci,x Pj m Cj,x Each process takes a checkpoint upon receiving the first marker And sends the marker on all outgoing channels before sending any application message
  • 39.
    Dr S Palaniappan, Associate Professor , Department of ADS Communication-induced checkpointing • Processes may be forced to take additional checkpoints and thus process independence is constrained to guarantee the eventual progress of the recovery line. • Autonomous and Forced checkpoints – Local checkpoints and forced checkpoints • Communication-induced checkpointing piggybacks protocol-related information on each application message – • The receiver determine if it has to take a forced checkpoint to advance the global recovery line.
  • 40.
    Dr S Palaniappan, Associate Professor , Department of ADS • The forced checkpoint must be taken before the application may process the contents of the message • In contrast with coordinated checkpointing, no special coordination messages are exchanged • Two types of communication-induced checkpointing – model-based checkpointing and - index-based checkpointing.
  • 41.
    Dr S Palaniappan, Associate Professor , Department of ADS Log-based Rollback Recovery • A log-based rollback recovery makes use of deterministic and nondeterministic events in a computation. • A non-deterministic - receipt of a message from another process or an event internal to the process.(m0, m3, m7) • A message send event is not a non-deterministic event.
  • 42.
    Dr S Palaniappan, Associate Professor , Department of ADS • This assumes that all non-deterministic events can be identified and their corresponding determinants can be logged into the stable storage. • How it works • Each process logs on stable storage the determinants of all non-deterministic events • Additionally, checkpoints are taken to reduce the extent of rollback during recovery
  • 43.
    Dr S Palaniappan, Associate Professor , Department of ADS The no-orphans consistency condition • Let e be a non-deterministic event that occurs at process p. We define the following • Depend(e)–set of processes affected by event e • Log(e)–set of processes with e logged on volatile memory • Stable(e)–set of processes with e logged on stable storage
  • 44.
    Dr S Palaniappan, Associate Professor , Department of ADS Pessimistic logging • Assume that a failure can occur after any non-deterministic event in the computation. • Determinant is logged to stable storage before message is delivered • Pessimistic protocols implement the following property, often referred to as synchronous logging. • If an event has not been logged on the stable storage, then no process can depend on it. • It is a stronger version of Always-no-orphan
  • 45.
    Dr S Palaniappan, Associate Professor , Department of ADS Suppose processes 𝑃2 fail as shown, restart from checkpoint , and roll forward using their determinant logs to deliver again the same sequence of messages as in the pre-failure execution Once the recovery is complete, the processes p2 will be consistent with the state of 𝑃1 that includes the receipt of message 𝑚2 from 𝑃1
  • 46.
    Dr S Palaniappan, Associate Professor , Department of ADS Optimistic Logging • Processes log determinants asynchronously to the stable storage • Optimistically assume that logging will be complete before a failure occurs. • Determinants are kept in a volatile log, and are periodically flushed to the stable storage • This do not implement the always-no-orphans condition.
  • 47.
    Dr S Palaniappan, Associate Professor , Department of ADS • To perform rollbacks correctly, optimistic logging protocols track causal dependencies during failure free execution • Pessimistic protocols need only keep the most recent checkpoint of each process, whereas optimistic protocols may need to keep multiple checkpoints for each process
  • 48.
    Dr S Palaniappan, Associate Professor , Department of ADS Causal logging • Causal logging combines the advantages of both pessimistic and optimistic logging • Like optimistic logging, it does not require synchronous access to the stable storage except during output commit • Like pessimistic logging, it allows each process to commit output independently and never creates orphans, thus isolating processes from the effects of failures at other processes • Make sure that the always-no-orphans property holds • Each process maintains information about all the events that have causally affected its state
  • 49.
    Dr S Palaniappan, Associate Professor , Department of ADS Process P0 at state X will have logged the determinants of the non-deterministic events that causally precede its state according to Lamport’s happened-before relation.
  • 50.
    Dr S Palaniappan, Associate Professor , Department of ADS Koo–Toueg coordinated checkpointing algorithm • Assumptions made: • Processes communicate by exchanging messages through communication channels (FIFO) • End-to-end protocols – cope with message loss due to rollback recovery and communication failure • Communication failures do not partition the network. • Two kinds of checkpoints: permanent and tentative • Local checkpoint, part of a consistent global checkpoint • Temporary checkpoint, become permanent checkpoint when the algorithm terminates successfully
  • 51.
    Dr S Palaniappan, Associate Professor , Department of ADS Two phases of the algorithm First Phase:
  • 52.
    Dr S Palaniappan, Associate Professor , Department of ADS • Pi informs all the processes of the decision it reached at the end of the first phase. • A process, on receiving the message from Pi, will act accordingly. • A set of permanent checkpoints taken by this algorithm is consistent because of the following two reason. • Either all or none of the processes take permanent checkpoint • No process sends message after taking permanent checkpoint Two phases of the algorithm Second Phase:
  • 53.
    Dr S Palaniappan, Associate Professor , Department of ADS An optimization Consistent Checkpoint X decides to initiate the checkpointing algorithm after receiving message m
  • 54.
    Dr S Palaniappan, Associate Professor , Department of ADS The rollback recovery algorithm • The rollback recovery algorithm restores the system state to a consistent state after a failure • 2 phases • The initiating process send a message to all other processes and ask for the preferences – restarting to the previous checkpoints. All need to agree about either do or not. • The initiating process send the final decision to all processes, all the processes act accordingly after receiving the final decision.
  • 55.
    Dr S Palaniappan, Associate Professor , Department of ADS Juang–Venkatesan algorithm for asynchronous checkpointing and recovery • Assumptions: • Communication channels are reliable, • Delivery messages in FIFO order, infinite buffers, • Message transmission delay is arbitrary but finite • Underlying computation/application is event-driven: • process P is at state s, receives message m, processes the message, moves to state s’ and send messages out. • So the triplet (s, m, msgs_sent) represents the state of P
  • 56.
    Dr S Palaniappan, Associate Professor , Department of ADS Two type of log storage are maintained: • Volatile log: short time to access but lost if processor crash. Move to stable log periodically. • Stable log: longer time to access but remained if crashed Asynchronous checkpointing: • After executing an event, the triplet is recorded without any synchronization with other processes. • Local checkpoint consist of set of records, first are stored in volatile log, then moved to stable log.
  • 57.
    Dr S Palaniappan, Associate Professor , Department of ADS Recovery algorithm –Notations: • 𝑅𝐶𝑉𝐷𝑖←𝑗(𝐶𝑘𝑃𝑡𝑖): number of messages received by 𝑝𝑖 from 𝑝𝑗, from the beginning of computation to checkpoint 𝐶𝑘𝑃𝑡𝑖 • 𝑆𝐸𝑁𝑇𝑖→𝑗(𝐶𝑘𝑃𝑡𝑖): number of messages sent by 𝑝𝑖 to 𝑝𝑗, from the beginning of computation to checkpoint 𝐶𝑘𝑃𝑡𝑖 –Idea: • From the set of checkpoints, find a set of consistent checkpoints • Doing that based on the number of messages sent and received • Orphan  if RCVDi←jCkPti > SENTj→iCkPtj
  • 58.
    Dr S Palaniappan, Associate Professor , Department of ADS Orpahan Message RCVDx←yCkPtx > SENTy→xCkPty Note that if X and Z rolls back to checkpoint ex1 and ez1, then it will be consistent with Y ’s state, ey1. Y Crashes 2>1
  • 59.
    Dr S Palaniappan, Associate Professor , Department of ADS If event ey2 is the latest checkpointed event at Y , then Y will restart from the state corresponding to ey2 Because of the broadcast nature of ROLLBACK messages, the recovery algorithm is also initiated at processors X and Z Y Crashes Juang–Venkatesan algorithm
  • 60.
    Dr S Palaniappan, Associate Professor , Department of ADS Initially X, Y , and Z set CkPtX ← ex3, CkPtY ← ey2 and CkPtZ ← ez2,
  • 61.
    Dr S Palaniappan, Associate Professor , Department of ADS • And the following messages during the first iteration: • Y sends ROLLBACK(Y, 2) to X and ROLLBACK(Y, 1) to Z; • X sends ROLLBACK(X, 2) to Y and ROLLBACK(X, 0) to Z; and • Z sends ROLLBACK(Z, 0) to X and ROLLBACK(Z, 1) to Y .
  • 62.
    Dr S Palaniappan, Associate Professor , Department of ADS • RCVDX←Y (CkPtX) = 3 > 2 (2 is the value received in the ROLLBACK(Y , 2) message from Y ), X will set CkPtX to ex2 satisfying RCVDX←Y (ex2) = 2 ≤ 2
  • 63.
    Dr S Palaniappan, Associate Professor , Department of ADS • Since RCVDZ←Y (CkPtZ) = 2 > 1, Z will set CkPtZ to ez1 satisfying RCVDZ←Y (ez1) = 1 ≤ 1. At
  • 64.
    Dr S Palaniappan, Associate Professor , Department of ADS In the second iteration, Y sends ROLLBACK(Y, 2) to X and ROLLBACK(Y, 1) to Z; Z sends ROLLBACK(Z, 1) to Y and ROLLBACK(Z, 0) to X; X sends ROLLBACK(X, 0) to Z and ROLLBACK(X, 1) to Y {ex2, ey2, ez1}, is consistent, and no further rollback occurs.
  • 65.
    Dr S Palaniappan, Associate Professor , Department of ADS ALGORITHM
  • 66.
    Dr S Palaniappan, Associate Professor , Department of ADS
  • 67.
    Dr S Palaniappan, Associate Professor , Department of ADS Consensus and Agreement • Agreement among the processes in a distributed system is a fundamental requirement for a wide range of applications. • Example - commit decision in database systems • A distributed key-value store had multiple nodes • Sys 1# “Transfer 1000 From A to B” • Sys 2 # “Transfer 3000 From A to B”
  • 68.
    Dr S Palaniappan, Associate Professor , Department of ADS Classification of Faults • Based on the components that fail • Program / Process • Processor / machine • Link • Storage • Based on the behaviour of faulty components • Crash – Just halts • Fail-stop – crash with additional conditions • Omission – process or channel fails to do something that it is expected • Byzantine – behaves arbitrarily - imperfect information • Timing – correct output, but provided outside a specified time interval
  • 69.
    Dr S Palaniappan, Associate Professor , Department of ADS • Some assumptions underlying our study of agreement algorithms: Failure models • Among the n processes in the system, at most f processes can be faulty. A faulty process can behave in any manner allowed by the failure model assumed. • The various failure models – fail-stop, send omission and receive omission, and Byzantine failures
  • 70.
    Dr S Palaniappan, Associate Professor , Department of ADS • Synchronous communication • Process run in lock step manner . Process receives a message sent to it earlier, performs computation and sends a message to other process • One step of synchronous computation is called round • asynchronous communication • Computation does not proceed in lock step • Process can send or receive and perform computation at any time • Network connectivity • The system has full logical connectivity, i.e., each process can communicate with any other by direct message passing
  • 71.
    Dr S Palaniappan, Associate Professor , Department of ADS • Sender identification • A process that receives a message always knows the identity of the sender process. • Channel reliability • The channels are reliable, and only the processes may fail • Authenticated vs. non-authenticated messages • Using authentication via techniques such as digital signatures, it is easier to solve the agreement problem • it can forge the message or it can also tamper with the contents • Agreement variable • The agreement variable may be boolean or multivalued, and need not be an integer
  • 72.
    Dr S Palaniappan, Associate Professor , Department of ADS Byzantine Generals Problem Four camps of the attacking army, each commanded by a general, are camped around the fort of Byzantium
  • 73.
    Dr S Palaniappan, Associate Professor , Department of ADS Byzantine Generals Problem They can succeed in attacking only if they attack simultaneously. Generals should reach a consensus on the plan ● It could be ATTACK
  • 74.
    Dr S Palaniappan, Associate Professor , Department of ADS Byzantine Generals Problem Generals should reach a consensus on the plan ● Or RETREAT
  • 75.
    Dr S Palaniappan, Associate Professor , Department of ADS • The only way they can communicate is to send messengers among themselves. • An asynchronous system is modeled by messengers taking an unbounded time to travel between two camps. • A lost message is modeled by a messenger being captured by the enemy
  • 76.
    Dr S Palaniappan, Associate Professor , Department of ADS Byzantine Generals Problem But there might be traitors  All loyal generals should reach a consensus
  • 77.
    Dr S Palaniappan, Associate Professor , Department of ADS Byzantine Generals Problem But traitors can act Arbitrarily  All loyal generals should reach a consensus
  • 78.
    Dr S Palaniappan, Associate Professor , Department of ADS • The traitor will attempt to subvert the agreement-reaching mechanism, by giving misleading information to the other generals
  • 79.
    Dr S Palaniappan, Associate Professor , Department of ADS The Solution A commanding general (designated process, called the source process) sends an order to his n-1 lieutenant generals such that
  • 80.
    Dr S Palaniappan, Associate Professor , Department of ADS The Solution IC1  All loyal lieutenants obey the same order. IC2  If the commanding general is loyal, then every loyal lieutenant obeys the order he sends.
  • 81.
    Dr S Palaniappan, Associate Professor , Department of ADS The Solution IC1  All loyal lieutenants obey the same order. IC2  If the commanding general is loyal, then every loyal lieutenant obeys the order he sends.
  • 82.
    Dr S Palaniappan, Associate Professor , Department of ADS The Solution Consistency/Agreement IC2  If the commanding general is loyal, then every loyal lieutenant obeys the order he sends. Agreement: All non-faulty processes must agree on the same value
  • 83.
    Dr S Palaniappan, Associate Professor , Department of ADS The Solution Consistency/Agreement Validity Termination/ Liveness Validity: If the source process is non-faulty, then the agreed upon value by all the non-faulty processes must be the same as the initial value of the source. Termination: Each non-faulty process must eventually decide on a value.
  • 84.
    Dr S Palaniappan, Associate Professor , Department of ADS The consensus problem • The consensus problem differs from the Byzantine agreement problem in that each process has an initial value and all the correct processes must agree on a single value . • Agreement All non-faulty processes must agree on the same (single) value. • Validity If all the non-faulty processes have the same initial value, then the agreed upon value by all the non-faulty processes must be that same value. • Termination Each non-faulty process must eventually decide on a value.
  • 85.
    Dr S Palaniappan, Associate Professor , Department of ADS The interactive consistency problem • The interactive consistency problem differs from the Byzantine agreement problem in that each process has an initial value, and all the correct processes must agree upon a set of values, with one value for each process. • Agreement All non-faulty processes must agree on the same array of values A[v1…vn]. • Validity If process i is non-faulty and its initial value is vi, then all non faulty processes agree on vi as the ith element of the array A. If process j is faulty, then the non-faulty processes can agree on any value for Aj. • Termination Each non-faulty process must eventually decide on the array A.
  • 86.
    Dr S Palaniappan, Associate Professor , Department of ADS 86
  • 87.
    Dr S Palaniappan, Associate Professor , Department of ADS Overview of results • Overview of results on agreement. f denotes number of failure-prone processes. n is the total number of processes.
  • 88.
    Dr S Palaniappan, Associate Professor , Department of ADS • Consensus is not solvable in asynchronous systems even if one process can fail by crashing • Asynchronous message passing and shared memory deal with trying to solve consensus
  • 89.
    Dr S Palaniappan, Associate Professor , Department of ADS Impossibility of achieving Byzantine agreement with n = 3 processes and f = 1 malicious process. • With n = 3 processes, the Byzantine agreement problem cannot be solved if the number of Byzantine processes f = 1. 89
  • 90.
    Dr S Palaniappan, Associate Professor , Department of ADS Agreement in a failure-free system • The consensus can be reached by collecting information - arriving at a “decision,” and distributing this decision. • The decision can be reached by - majority, max, and min functions • In a synchronous system, this can be done simply in a constant number of rounds • In an asynchronous system, consensus can similarly be reached in a constant number of message hops
  • 91.
    Dr S Palaniappan, Associate Professor , Department of ADS Agreement in (message-passing) synchronous systems with failures • Up to f (< n) crash failures possible. • In f + 1 rounds, at least one round has no failures. • Now justify: agreement, validity, termination conditions are satisfied. • Complexity: O(f + 1)n2 messages • f + 1 is lower bound on number of rounds
  • 92.
    Dr S Palaniappan, Associate Professor , Department of ADS • The correct generals reach agreement in two rounds of messages: 1. In the first round, the commander sends a value to each of the lieutenants. 2. In the second round, each of the lieutenants send the value it receives to its peers. When both rounds are finished the correct lieutenants simply apply a majority function to determine the value If the commander is faulty, all the lieutenants faithfully report the values they received but there will be no agreement. If one of the lieutenants is faulty, the correct lieutenants will receive N −2 replicas of the correct value, plus one incorrect one: the majority function will determine the correct value.
  • 93.
    Dr S Palaniappan, Associate Professor , Department of ADS Consensus with up to f fail-stop processes in a system of n processes, n > f
  • 94.
    Dr S Palaniappan, Associate Professor , Department of ADS
  • 95.
    Dr S Palaniappan, Associate Professor , Department of ADS

Editor's Notes

  • #4 Many techniques have been developed to add reliability and high availability to distributed systems. These techniques include transactions, group communication, and rollback recovery. These techniques have different tradeoffs and focus If no precautions are taken and the computer system supporting an application fails, the program must be restarted from the beginning
  • #14 Nondeterminism means that the path of execution isn't fully determined by the specification of the computation, so the same input can produce different outcomes, while deterministic execution is guaranteed to be the same, given the same input. Related terms to "nondeterministic" are "probabilistic" and "stochastic".
  • #16 Some protocols assume that the communication subsystem delivers messages reliably, in first-in-first-out (FIFO) order, while other protocols assume that the communication subsystem can lose, duplicate, or reorder messages. The choice between these two assumptions usually affects the complexity of checkpointing and failure recovery.
  • #35 When Pj receives m during interval Ijy , it records the dependency from Iix to Ijy , which is later saved onto stable storage when Pj takes checkpoint Cjy . direct dependency tracking technique commonly used in uncoordinated checkpointing
  • #37 After a process takes a local checkpoint, to prevent orphan messages, it remains blocked until the entire checkpointing activity is complete Step 1 : The coordinator takes a checkpoint and broadcasts a request message to all processes, asking them to take a checkpoint. Step 2: When a process receives this message, it stops its execution, flushes all the communication channels, takes a tentative checkpoint, and sends an acknowledgment message back to the coordinator. Step 3: After the coordinator receives acknowledgments from all processes, it broadcasts a commit message that completes the two-phase checkpointing protocol Step4 : After receiving the commit message, a process removes the old permanent checkpoint and atomically makes the tentative checkpoint permanent and then resumes execution and exchange of messages with other processes.
  • #38 A fundamental problem in coordinated checkpointing is to prevent a process from receiving application messages that could make the checkpoint inconsistent
  • #40 Communication-induced checkpointing is another way to avoid the domino effect, while allowing processes to take some of their checkpoints independently In communication-induced checkpointing, processes take two types of checkpoints, namely, autonomous and forced checkpoints. The checkpoints that a process takes independently are called local checkpoints, while those that a process is forced to take are called forced checkpoints.
  • #41 Model-based Checkpointing prevents patterns of communications and checkpoints that could result in inconsistent states among the existing checkpoints. A process detects the potential for inconsistent checkpoints and independently forces local checkpoints to prevent the formation of undesirable patterns Index-based communication-induced Checkpointing assigns monotonically increasing indexes to checkpoints, such that the checkpoints having the same index at different processes form a consistent state.
  • #42 The execution of process P0 is a sequence of four deterministic intervals. The first one starts with the creation of the process, while the remaining three start with the receipt of messages m0, m3, and m7, respectively. Send event of message m2 is uniquely determined by the initial state of P0 and by the receipt of message m0, and is therefore not a non-deterministic event.
  • #44 (subset) A ⊆ B means every element of A is also an element of B
  • #45 Pessimistic meaning - tending to see the worst aspect of things or believe that the worst will happen. if an event has not been logged on the stable storage, then no process can depend on it. In addition to logging determinants, processes also take periodic checkpoints to minimize the amount of work that has to be repeated during recovery. Disadvantage: performance penalty for synchronous logging Advantages: •immediate output commit•restart from most recent checkpoint•recovery limited to failed process(es)•simple garbage collection
  • #47 Optimistic meaning - hopeful and confident about the future. Failures are rare They do not occur before determinant is logged to stable storage
  • #48 Consider the example shown in Figure 13.9. Suppose process P2 fails before the determinant for m5 is logged to the stable storage. Process P1 then becomes an orphan process and must roll back to undo the effects of receiving the orphan message m6. The rollback of P1 further forces P0 to roll back to undo the effects of receiving message m7.
  • #54 Consider the example shown in Figure 13.11. The set x1 y1 z1 is a consistent set of checkpoints. Suppose process X decides to initiate the checkpointing algorithm after receiving message m. It takes a tentative checkpoint x2 and sends “take tentative checkpoint" messages to processes Y and Z, causing Y and Z to take checkpoints y2 and z2, respectively. Clearly, x2 y2 z2 forms a consistent set of checkpoints. Note, however, that x2 y2 z1 also forms a consistent set of checkpoints. In this example, there is no need for process Z to take checkpoint z2 because Z has not sent any message since its last checkpoint. However, process Y must take a checkpoint since it has sent messages since its last checkpoint.
  • #55 An initiating process Pi sends a message to all other processes to check if the all are willing to restart from their previous checkpoints A process may reply “no” to a restart request due to any reason If Pi learns that all processes are willing to restart from their previous checkpoints, Pi decides that all processes should roll back to their previous checkpoints. Otherwise, Pi aborts the rollback attempt and it may attempt a recovery at a later time Second phase Pi propagates its decision to all the processes. On receiving Pi’s decision, a process acts accordingly.
  • #56 The underlying computation or application is assumed to be event-driven: a processor P waits until a message m is received, it processes the message m, changes its state from s to s prime and sends zero or more messages to some of its neighbors Then the processor remains idle until the receipt of the next message. The new state s prime and the contents of messages sent to its neighbors depend on state s and the contents of message m.
  • #58 The recovery algorithm achieves this by making each processor keep track of both the number of messages it has sent to other processors as well as the number of messages it has received from other processors. Recovery may involve several iterations of roll backs by processors. Whenever a processor rolls back, it is necessary for all other processors to find out if any message sent by the rolled back processor has become an orphan message.
  • #68 Consensus (noun) A process of decision-making that seeks widespread agreement among group members. Many forms of coordination require the processes to exchange information to negotiate with one another and eventually reach a common understanding or agreement
  • #69 Byzantine generals problem”. It’s a game theory problem that describes how actors in a decentralized system arrive at a consensus without a trusted central party Fail-stop, also known as fail-silent, is the condition when a failed process does not communicate.
  • #70 In the fail-stop model, a process may crash in the middle of a step, which could be the execution of a local operation or processing of a message for a send or receive event. In particular, it may send a message to only a subset of the destination set before crashing. Synchronous/asynchronous communication If a failure-prone process chooses to send a message to process Pi but fails, then Pi cannot detect the non-arrival of the message in an asynchronous system because this scenario is indistinguishable from the scenario in which the message takes a very long time in transit. We will see this argument again when we consider the impossibility of reaching agreement in asynchronous systems in any failure model. In a synchronous system, however, the scenario in which a message has not been sent can be recognized by the intended recipient, at the end of the round. The intended recipient can deal with the non-arrival of the expected message by assuming the arrival of a message containing some default data, and then proceeding with the next round of the algorithm
  • #73 Byzantine Generals Problem Consensus with reliable communication lines and byzantine failures is illustrated by the Byzantine Generals Problem. In this problem, there are n army generals who head different divisions. Communication is reliable (radio or telephone) but m of the generals are traitors (faulty) and are trying to prevent others from reaching agreement by feeding them incorrect information. The question is: can the loyal generals still reach agreement? Specifically, each general knows the size of his division. At the end of the algorithm can each general know the troop strength of every other loyal division? Lamport demonstrated a solution that works for certain cases. His answer to this problem is that any solution to the problem of overcoming m traitors requires a minimum of 3m+1 participants (2m+1 loyal generals). This means that more than 2/3 of the generals must be loyal. Moreover, it was demonstrated that no protocol can overcome m faults with fewer than m+1 rounds of message exchanges and O(mn2) messages. If n < 3m + 1 then the problem has no solution.
  • #80 https://www.youtube.com/watch?v=EFb8E4j49Ng
  • #85 Broadcast the initial value to all other process .
  • #88 F+1 round is require to achieve the agreement
  • #94 The agreement condition is satisfied because in the f +1 rounds, there must be at least one round in which no process failed. In this round, say round r, all the processes that have not failed so far succeed in broadcasting their values, and all these processes take the minimum of the values broadcast and received in that round.