unit 3 all about the data science process, covering the steps present in almost every data science project.

Dr S Palaniappan , Associate Professor , Department of ADS
Unit IV
Recovery & Consensus
Chapter 13 and 14
1

Planned
Hour Recovery & Consensus Relevant CO
Nos
Highest
Cognitive
level**
1 Checkpointing and rollback recovery: Introduction CO5 K2
1 Checkpoint-based recovery CO5 K2
1 Log-based rollback recovery CO5 K2
1 Coordinated checkpointing algorithm CO5 K2
1 Algorithm for asynchronous checkpointing and recovery. CO5 K2
1 Consensus and agreement algorithms: Problem definition CO5 K2
1 Overview of results CO5 K2
1 Agreement in a failure –free system CO5 K2
1 Agreement in synchronous systems with failures. CO5 K2

• Distributed systems are subject to failures.
• And the vast computing potential of these systems is often hampered by their
susceptibility to failures
• Fault tolerance
• Offers the abstraction of a reliable system that tolerates faults
• Fault-recovery
• Once a fault occurs, we want to recover to a previous correct state
Introduction

Treat a distributed system application as a
collection of processes that communicate
over a network

Rollback Recovery
• Restore the system back to a consistent state after a failure
• Achieve fault tolerance by periodically saving the state of a process during the failure-
free execution

Checkpointing
• Checkpointing is a technique that provides fault tolerance for computing systems.
• It basically consists of saving a snapshot of the application's state, so that
applications can restart from that point in case of failure
• The saved states of a process

• Why is rollback recovery of distributed systems complicated? – messages induce
inter-process dependencies during failure-free operation
• Rollback propagation
– The dependencies may force some of the processes that did not fail to roll back
– This phenomenon is called “domino effect”

M 1
P 1
P 2
Rollback Scenario
This phenomenon of cascaded rollback is called the domino effect
Propagation may extend back to the initial state of the computation, losing all the work
performed before the failure.
The dependencies may force some of the processes that
did not fail also to roll back

Dominos effect
m1
P i
P j
P k
m2
m3
m4
m5
m6
Recovery Line

Dominos effect
m1
P i
P j
P k
m2
m3
m6
m8
m7
Recovery Line
m4
m5

• If each process takes its checkpoints independently, then the system can not avoid
the domino effect
– This scheme is called independent or uncoordinated checkpointing
• Techniques that avoid domino effect
• Coordinated checkpointing rollback recovery.
Processes coordinate their checkpoints to form a system-wide consistent
state
• Communication-induced checkpointing rollback recovery
Forces each process to take checkpoints based on information piggybacked
on the application

• Log-based rollback recovery
• Combines checkpointing with logging of non-deterministic events.
• Relies on piecewise deterministic (pwd) assumption
• In general it enables a system to recover beyond the most recent set of consistent
checkpoints.
• Particularly attractive for applications that frequently interact with the outside
world that cannot rollback

Taxonomy

Background and definitions
• System model
• Fixed number N of processes
• Processes cooperate to execute a distributed application and interact with the
outside world by receiving and sending input and output messages, respectively
Outside World Process (OWP)

A local checkpoint
• All processes save their local states at certain instants of time
• A local check point is a snapshot of the state of the process at a given instance
• Assumption
– A process stores all local checkpoints on the stable storage
– A process is able to roll back to any of its existing local checkpoints

Consistent states
• A global state of a distributed system
• A collection of the individual states of all participating processes and the states of the
communication channels
• Consistent global state
• A global state that may occur during a failure-free execution of distribution of distributed
computation
• If a process‟s state reflects a message receipt, then the state of the corresponding sender
must reflect the sending of the message

• A global checkpoint
• A set of local checkpoints, one from each process
• A consistent global checkpoint
• A global checkpoint such that no message is sent by a process after taking its
local point that is received by another process before taking its checkpoint

The dotted line - recovery line
The parallel lines - Checkpoints

Type of messages
• A process failure and subsequent recovery may leave messages that were perfectly
received (and processed) before the failure in abnormal states
• A rollback of processes for recovery may have to rollback the send and receive
operations of several messages
M 1
P 1
P 2

In-transit messages
• Messages sent, not yet received before recovery line
• No problem with global state consistency
• Can be lost or delayed`

P i
P j
P k
m1
Recovery Line
m5
m2 m4
m3

Lost messages
• Messages whose send is done but receive is undone due to rollback (After recovery) . It must
be replayed
• If implemented over
• unreliable communication, application is responsible
• reliable communication, recovery algorithm is responsible

Delayed messages
• Messages whose receive is not recorded because the receiving process was either
down or the message arrived after the rollback of the receiving process
Duplicated Msg

Orphan messages
• Messages with “receive‟ recorded but message “send‟ not recorded
• Do not arise if processes roll back to a consistent global state
Duplicate messages
• Arise due to message logging and replaying during process recovery

Issues in Failure recovery
• We must not only restore the system to a consistent state, but also appropriately
handle messages that are left in an abnormal state due to the failure and recovery.
• The computation comprises of three processes Pi, Pj , and Pk, connected through a
communication network.
• The processes communicate solely by exchanging messages over fault-free, FIFO
communication channels.

m1
P i
P j
P k
m2
m3 m4
m5
Ci,0
Cj,0
Ck,0
Ci,1
Cj,1
Ck,1 Ck,2
Cj,2
The rollback of process 𝑃𝑖 to checkpoint 𝐶𝑖,1 created an orphan message m4
Orphan message m5 is created due to the roll back of process 𝑃𝑗 to checkpoint 𝐶j,2
m6
Checkpoints :
{𝐶𝑖,0, 𝐶𝑖,1},
{𝐶𝑗,0, 𝐶𝑗,1, 𝐶𝑗,2}, and
{𝐶𝑘,0, 𝐶𝑘,1, 𝐶𝑘,2}

m1
P i
P j
P k
m2
m3 m4
m5
Ci,0
Cj,0
Ck,0
Ci,1
Cj,1
Ck,1 Ck,2
Cj,2
Message m6: a lost message since the send event for m6 is recorded in the restored state for 𝑃𝑗 ,
but the receive event has been undone at process 𝑃𝑖 .
m6

m1
P i
P j
P k
m2
m3 m4
m5
Ci,0
Cj,0
Ck,0
Ci,1
Cj,1
Ck,1 Ck,2
Cj,2
Messages m7 & m8: delayed orphan messages. After resuming execution
from their checkpoints, processes will generate both of these messages
m6
m7
m8
Solution : Log of
all the sent
messages

Uncoordinated Checkpointing
• Each process has autonomy in deciding when to take checkpoints positions
• Advantage
• No synchronization overhead and it allows processes to take checkpoints when it is
most convenient or efficient
• Lower runtime overhead during normal execution

• Disadvantages
• Domino effect during a recovery
• Possible useless checkpoints
• Recovery from a failure is slow - because processes need to iterate to find a
consistent set of checkpoints - useless
• Each process maintains multiple checkpoints and periodically invoke a garbage
collection algorithm.
• Garbage collection is needed
• Not suitable for applications with outside world interaction (output commit)

Direct Dependency Tracking Technique
i is the process ID
x is the checkpoint index
xth checkpoint of process Pi,
P j
P i
Cj,0
Ci,0
Cj,1
Ci,1
Cj,y-1
Ci,x-1
Cj,y
Ci,x
m
i,x
Checkpoint Interval

Coordinated Checkpointing
• Coordinated Checkpointing requires processes to orchestrate their checkpoints in order
to form a consistent global state.
• Advantages
• No domino effect
• Each process is required to maintain only one checkpoint (the last)
• No computation of recovery line is required
• Disadvantages
• Large latency in committing output
• Two approaches
• Blocking
• Non-blocking

2 Phase coordinated Checkpointing protocol
The computation is blocked during the Checkpointing

Non-blocking checkpoint coordination
• Need not stop their execution while taking checkpoints.
• Fundamental Problem - Prevent a process from receiving application messages that could
make the checkpoint inconsistent
P i
Initiator
Ci,x
Cj,x
Pj
m

• Idea - Chandy and lamport snapshot algorithm in which markers play the role of the
checkpoint request messages
• The initiator takes a checkpoint and sends a marker (a checkpoint request) on all
outgoing channels
P i
Initiator
Ci,x
Pj
m
Cj,x
Each process takes a checkpoint
upon receiving the first marker
And sends the marker on all
outgoing channels before sending
any application message

Communication-induced checkpointing
• Processes may be forced to take additional checkpoints and thus process
independence is constrained to guarantee the eventual progress of the recovery line.
• Autonomous and Forced checkpoints – Local checkpoints and forced checkpoints
• Communication-induced checkpointing piggybacks protocol-related information on
each application message –
• The receiver determine if it has to take a forced checkpoint to advance the global
recovery line.

• The forced checkpoint must be taken before the application may process the contents
of the message
• In contrast with coordinated checkpointing, no special coordination messages are
exchanged
• Two types of communication-induced checkpointing
– model-based checkpointing and
- index-based checkpointing.

Log-based Rollback Recovery
• A log-based rollback recovery makes use of deterministic and nondeterministic
events in a computation.
• A non-deterministic - receipt of a message from another process or an event internal
to the process.(m0, m3, m7)
• A message send event is not a non-deterministic event.

• This assumes that all non-deterministic events can be identified and their
corresponding determinants can be logged into the stable storage.
• How it works
• Each process logs on stable storage the determinants of all non-deterministic
events
• Additionally, checkpoints are taken to reduce the extent of rollback during
recovery

The no-orphans consistency condition
• Let e be a non-deterministic event that occurs at process p. We define the following
• Depend(e)–set of processes affected by event e
• Log(e)–set of processes with e logged on volatile memory
• Stable(e)–set of processes with e logged on stable storage

Pessimistic logging
• Assume that a failure can occur after any non-deterministic event in the computation.
• Determinant is logged to stable storage before message is delivered
• Pessimistic protocols implement the following property, often referred to as
synchronous logging.
• If an event has not been logged on the stable storage, then no process can
depend on it.
• It is a stronger version of Always-no-orphan

Suppose processes 𝑃2 fail as shown, restart from checkpoint , and roll forward using their
determinant logs to deliver again the same sequence of messages as in the pre-failure
execution
Once the recovery is complete, the processes p2 will be consistent with the state of 𝑃1 that
includes the receipt of message 𝑚2 from 𝑃1

Optimistic Logging
• Processes log determinants asynchronously to the stable storage
• Optimistically assume that logging will be complete before a failure occurs.
• Determinants are kept in a volatile log, and are periodically flushed to the stable
storage
• This do not implement the always-no-orphans condition.

• To perform rollbacks correctly, optimistic logging protocols track causal dependencies
during failure free execution
• Pessimistic protocols need only keep the most recent checkpoint of each process,
whereas optimistic protocols may need to keep multiple checkpoints for each process

Causal logging
• Causal logging combines the advantages of both pessimistic and optimistic logging
• Like optimistic logging, it does not require synchronous access to the stable storage
except during output commit
• Like pessimistic logging, it allows each process to commit output independently and
never creates orphans, thus isolating processes from the effects of failures at other
processes
• Make sure that the always-no-orphans property holds
• Each process maintains information about all the events that have causally affected
its state

Process P0 at state X will have logged the determinants of the non-deterministic events that causally
precede its state according to Lamport’s happened-before relation.

Koo–Toueg coordinated checkpointing
algorithm
• Assumptions made:
• Processes communicate by exchanging messages through communication channels
(FIFO)
• End-to-end protocols – cope with message loss due to rollback recovery and
communication failure
• Communication failures do not partition the network.
• Two kinds of checkpoints: permanent and tentative
• Local checkpoint, part of a consistent global checkpoint
• Temporary checkpoint, become permanent checkpoint when the algorithm
terminates successfully

Two phases of the algorithm
First Phase:

• Pi informs all the processes of the decision it reached at the end of the first phase.
• A process, on receiving the message from Pi, will act accordingly.
• A set of permanent checkpoints taken by this algorithm is consistent because of the
following two reason.
• Either all or none of the processes take permanent checkpoint
• No process sends message after taking permanent checkpoint
Two phases of the algorithm
Second Phase:

An optimization
Consistent Checkpoint
X decides to initiate the checkpointing
algorithm after receiving message m

The rollback recovery algorithm
• The rollback recovery algorithm restores the system state to a consistent state after a
failure
• 2 phases
• The initiating process send a message to all other processes and ask for the
preferences – restarting to the previous checkpoints. All need to agree about either
do or not.
• The initiating process send the final decision to all processes, all the processes act
accordingly after receiving the final decision.

Juang–Venkatesan algorithm for asynchronous
checkpointing and recovery
• Assumptions:
• Communication channels are reliable,
• Delivery messages in FIFO order, infinite buffers,
• Message transmission delay is arbitrary but finite
• Underlying computation/application is event-driven:
• process P is at state s, receives message m, processes the message, moves to state
s’ and send messages out.
• So the triplet (s, m, msgs_sent) represents the state of P

Two type of log storage are maintained:
• Volatile log: short time to access but lost if processor crash. Move to stable log
periodically.
• Stable log: longer time to access but remained if crashed
Asynchronous checkpointing:
• After executing an event, the triplet is recorded without any synchronization with
other processes.
• Local checkpoint consist of set of records, first are stored in volatile log, then
moved to stable log.

Recovery algorithm
–Notations:
• 𝑅𝐶𝑉𝐷𝑖←𝑗(𝐶𝑘𝑃𝑡𝑖): number of messages received by 𝑝𝑖 from 𝑝𝑗, from the beginning of
computation to checkpoint 𝐶𝑘𝑃𝑡𝑖
• 𝑆𝐸𝑁𝑇𝑖→𝑗(𝐶𝑘𝑃𝑡𝑖): number of messages sent by 𝑝𝑖 to 𝑝𝑗, from the beginning of
computation to checkpoint 𝐶𝑘𝑃𝑡𝑖
–Idea:
• From the set of checkpoints, find a set of consistent checkpoints
• Doing that based on the number of messages sent and received
• Orphan  if RCVDi←jCkPti > SENTj→iCkPtj

Orpahan Message
RCVDx←yCkPtx > SENTy→xCkPty
Note that if X and Z rolls back to checkpoint ex1 and ez1, then it will be consistent with Y ’s state,
ey1.
Y Crashes
2>1

If event ey2 is the latest checkpointed event at Y , then Y will restart from the state
corresponding to ey2
Because of the broadcast nature of ROLLBACK messages, the recovery algorithm is also
initiated at processors X and Z
Y Crashes
Juang–Venkatesan algorithm

Initially X, Y , and Z set
CkPtX ← ex3,
CkPtY ← ey2 and
CkPtZ ← ez2,

• And the following messages during the first iteration:
• Y sends ROLLBACK(Y, 2) to X and ROLLBACK(Y, 1) to Z;
• X sends ROLLBACK(X, 2) to Y and ROLLBACK(X, 0) to Z; and
• Z sends ROLLBACK(Z, 0) to X and ROLLBACK(Z, 1) to Y .

• RCVDX←Y (CkPtX) = 3 > 2 (2 is the value received in the ROLLBACK(Y , 2) message
from Y ), X will set CkPtX to ex2 satisfying RCVDX←Y (ex2) = 2 ≤ 2

• Since RCVDZ←Y (CkPtZ) = 2 > 1, Z will set CkPtZ to ez1 satisfying RCVDZ←Y (ez1) =
1 ≤ 1. At

In the second iteration,
Y sends ROLLBACK(Y, 2) to X and ROLLBACK(Y, 1) to Z;
Z sends ROLLBACK(Z, 1) to Y and ROLLBACK(Z, 0) to X;
X sends ROLLBACK(X, 0) to Z and ROLLBACK(X, 1) to Y
{ex2, ey2, ez1}, is consistent, and no further rollback occurs.

ALGORITHM

Consensus and Agreement
• Agreement among the processes in a
distributed system is a fundamental
requirement for a wide range of
applications.
• Example - commit decision in database
systems
• A distributed key-value store had multiple
nodes
• Sys 1# “Transfer 1000 From A to B”
• Sys 2 # “Transfer 3000 From A to B”

Classification of Faults
• Based on the components that fail
• Program / Process
• Processor / machine
• Link
• Storage
• Based on the behaviour of faulty components
• Crash – Just halts
• Fail-stop – crash with additional conditions
• Omission – process or channel fails to do something that it is expected
• Byzantine – behaves arbitrarily - imperfect information
• Timing – correct output, but provided outside a specified time interval

• Some assumptions underlying our study of agreement algorithms:
Failure models
• Among the n processes in the system, at most f processes can be faulty. A faulty
process can behave in any manner allowed by the failure model assumed.
• The various failure models – fail-stop, send omission and receive omission, and
Byzantine failures

• Synchronous communication
• Process run in lock step manner . Process receives a message sent to it earlier,
performs computation and sends a message to other process
• One step of synchronous computation is called round
• asynchronous communication
• Computation does not proceed in lock step
• Process can send or receive and perform computation at any time
• Network connectivity
• The system has full logical connectivity, i.e., each process can communicate with
any other by direct message passing

• Sender identification
• A process that receives a message always knows the identity of the sender process.
• Channel reliability
• The channels are reliable, and only the processes may fail
• Authenticated vs. non-authenticated messages
• Using authentication via techniques such as digital signatures, it is easier to solve the
agreement problem
• it can forge the message or it can also tamper with the contents
• Agreement variable
• The agreement variable may be boolean or multivalued, and need not be an integer

Byzantine Generals Problem
Four camps of the attacking army, each commanded by a general, are camped around the
fort of Byzantium

They can succeed in attacking only if they attack simultaneously.
Generals should reach a consensus on the plan
● It could be ATTACK

Generals should reach a consensus on the plan
● Or RETREAT

• The only way they can communicate is to send messengers among themselves.
• An asynchronous system is modeled by messengers taking an unbounded time to
travel between two camps.
• A lost message is modeled by a messenger being captured by the enemy

But there might be traitors
 All loyal generals should reach a consensus

But traitors can act Arbitrarily
 All loyal generals should reach a consensus

• The traitor will attempt to subvert the agreement-reaching mechanism, by giving
misleading information to the other generals

The Solution
A commanding general (designated process, called the source process) sends an
order to his n-1 lieutenant generals such that

The Solution
IC1  All loyal lieutenants obey the same order.
IC2  If the commanding general is loyal, then every loyal lieutenant obeys the
order he sends.

The Solution
Consistency/Agreement
IC2  If the commanding general is loyal, then every loyal lieutenant obeys the
order he sends.
Agreement: All non-faulty processes must agree on the same value

The Solution
Consistency/Agreement
Validity
Termination/ Liveness
Validity: If the source process is non-faulty, then the agreed upon value by all the
non-faulty processes must be the same as the initial value of the source.
Termination: Each non-faulty process must eventually decide on a value.

The consensus problem
• The consensus problem differs from the Byzantine agreement problem in that each
process has an initial value and all the correct processes must agree on a single value
.
• Agreement All non-faulty processes must agree on the same (single) value.
• Validity If all the non-faulty processes have the same initial value, then the agreed
upon value by all the non-faulty processes must be that same value.
• Termination Each non-faulty process must eventually decide on a value.

The interactive consistency problem
• The interactive consistency problem differs from the Byzantine agreement problem
in that each process has an initial value, and all the correct processes must agree
upon a set of values, with one value for each process.
• Agreement All non-faulty processes must agree on the same array of values
A[v1…vn].
• Validity If process i is non-faulty and its initial value is vi, then all non faulty
processes agree on vi as the ith element of the array A. If process j is faulty, then the
non-faulty processes can agree on any value for Aj.
• Termination Each non-faulty process must eventually decide on the array A.

Dr S Palaniappan , Associate Professor , Department of ADS 86

Overview of results
• Overview of results on agreement. f denotes number of failure-prone processes. n is
the total number of processes.

• Consensus is not solvable in asynchronous systems even if one process can fail by
crashing
• Asynchronous message passing and shared memory deal with trying to solve
consensus

Impossibility of achieving Byzantine agreement
with n = 3 processes and f = 1 malicious process.
• With n = 3 processes, the Byzantine agreement problem cannot be solved if the
number of Byzantine processes f = 1.
89

Agreement in a failure-free system
• The consensus can be reached by collecting information - arriving at a “decision,” and
distributing this decision.
• The decision can be reached by - majority, max, and min functions
• In a synchronous system, this can be done simply in a constant number of rounds
• In an asynchronous system, consensus can similarly be reached in a constant number
of message hops

Agreement in (message-passing) synchronous
systems with failures
• Up to f (< n) crash failures possible.
• In f + 1 rounds, at least one round has no failures.
• Now justify: agreement, validity, termination conditions are satisfied.
• Complexity: O(f + 1)n2 messages
• f + 1 is lower bound on number of rounds

• The correct generals reach agreement in two rounds of messages:
1. In the first round, the commander sends a value to each of the lieutenants.
2. In the second round, each of the lieutenants send the value it receives to its peers.
When both rounds are finished the correct lieutenants simply apply a majority function
to determine the value
If the commander is faulty, all the lieutenants faithfully report the values they received
but there will be no agreement.
If one of the lieutenants is faulty, the correct lieutenants will receive N −2 replicas of the
correct value, plus one incorrect one: the majority function will determine the correct
value.

Consensus with up to f fail-stop processes in a
system of n processes, n > f

unit 3 all about the data science process, covering the steps present in almost every data science project.

More Related Content

Similar to unit 3 all about the data science process, covering the steps present in almost every data science project.

Recently uploaded

unit 3 all about the data science process, covering the steps present in almost every data science project.

Editor's Notes