SlideShare a Scribd company logo
1 
Slides for Chapter 12: 
Coordination and Agreement 
From Coulouris, Dollimore and Kindberg 
Distributed Systems: 
Concepts and Design 
Edition 4, © Pearson Education 2005 
Failure Assumptions and Failure Detectors 
 reliable communication channels 
 process failures: crashes 
 failure detector: object/code in a process that detects 
failures of other processes 
 unreliable failure detector 
unsuspected or suspected (evidence of possible failures) 
each process sends ``alive'' message to everyone else 
not receiving ``alive'' message after timeout 
most practical systems 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
 reliable failure detector 
unsuspected or failure 
synchronous system 
few practical systems 
Distributed Mutual Exclusion 
provide critical region in a distributed 
environment 
message passing 
for example, locking files, lockd daemon in UNIX 
(NFS is stateless, no file-locking at the NFS level) 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Algorithms for mutual exclusion 
N processes 
processes don't fail 
message delivery is reliable 
critical region: enter(), resourceAccesses(), 
exit() 
Properties: 
[ME1] safety: only one process at a time 
[ME2] liveness: eventually enter or exit 
[ME3] happened-before ordering: ordering of enter() 
is the same as HB ordering 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Algorithms for mutual exclusion 
Performance evaluation: 
overhead and bandwidth consumption: # of 
messages sent 
client delay incurred by a process at entry and exit 
throughput measured by synchronization delay: 
delay between one's exit and next's entry 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
A central server algorithm 
server keeps track of a token---permission to 
enter critical region 
a process requests the server for the token 
the server grants the token if it has the token 
a process can enter if it gets the token, 
otherwise waits 
when done, a process sends release and exits 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
2 
Server managing a mutual exclusion token for a 
set of processes 
Server 
Queue of 
requests 
3. Grant 
token 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
1. Request 
token 
2. Release 
token 
4 
2 
p 
4 
p 
p 3 
2 
p 
1 
A central server algorithm 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
 Properties: 
safety, why? 
liveness, why? 
HB ordering not guaranteed, why? [VC in processes vs. server] 
 Performance: 
enter overhead: two messages (request and grant) 
enter delay: time between request and grant 
exit overhead: one message (release) 
exit delay: none 
synchronization delay: between release and grant 
centralized server is the bottle neck 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
A ring-based algorithm 
logical ring, could be unrelated to the physical 
configuration 
pi sends messages to p(i+1) mod N 
when a process holds a token, it can enter, 
otherwise waits 
when a process releases a token (exit), it sends 
to its neighbor 
A ring of processes transferring a mutual exclusion 
token 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
p 
n 
p 
2 
p 
3 
p 
4 
Token 
p 
1 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
A ring-based algorithm 
properties: 
safety, why? 
liveness, why? 
HB ordering not guaranteed, why? 
Performance: 
bandwidth consumption: token keeps circulating 
enter overhead: 0 to N messages 
enter delay: delay for 0 to N messages 
exit overhead: one message 
exit delay: none 
synchronization delay: delay for 1 to N messages 
An algorithm using multicast and logical clocks 
 multicast a request message for the token 
 enter only if all the other processes reply 
 totally-ordered timestamps: T, pi  
 each process keeps a state: RELEASED, HELD, 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
WANTED 
 if all have state = RELEASED, all reply, a process can 
hold the token and enter 
 if a process has state = HELD, doesn't reply until it exits 
 if more than one process has state = WANTED, process 
with the lowest timestamp will get all N-1 replies first.
3 
Ricart and Agrawala’s algorithm 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
On initialization 
state := RELEASED; 
To enter the section 
state := WANTED; 
Multicast request to all processes; request processing deferred here 
T := request’s timestamp; 
Wait until (number of replies received = (N – 1)); 
state := HELD; 
On receipt of a request Ti, pi at pj (i  j) 
if (state = HELD or (state = WANTED and (T, pj)  (Ti, pi))) 
then 
queue request from pi without replying; 
else 
reply immediately to pi; 
end if 
To exit the critical section 
state := RELEASED; 
reply to any queued requests; 
Multicast synchronization 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
p 
3 
34 
Reply 
34 
41 
41 
41 
34 
p 
1 
p 
2 
Reply 
Reply 
An algorithm using multicast and logical locks 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Properties 
safety, why? 
liveness, why? 
HB ordering, why? 
An algorithm using multicast and logical locks 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
performance: 
bandwidth consumption: no token keeps circulating 
entry overhead: 2(N-1), why? [with multicast support: 
1 + (N -1) = N] 
entry delay: delay between request and getting all 
replies 
exit overhead: 0 to N-1 messages 
exit delay: none 
synchronization delay: delay for 1 message (one last 
reply from the previous holder) 
Maekawa’s Voting Algorithm – Main Idea 
We actually don’t need all N – 1 replies 
Consider processes A, B, X 
A needs replies from “A” and X 
B needs replies from “B” and X 
If X can only reply to (vote) *one process at a time* 
A and B cannot have a reply from X at the same time 
Mutex between A and B--hinges on X 
Processes in overlapping groups 
members in the overlap “control” mutex 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Mutual exclusion for all processes 
A group for every process 
A pair of groups for A and B overlaps 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
= mutex(A, B) 
Every possible pair of groups overlaps 
= mutex(all possible pairs of processes) 
= mutex(all processes)
4 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Groups and members 
A group for each process 
N processes 
N groups 
Each group has K members 
numbers of processes to request for permission 
Each process is in M groups (M  1) 
Allows overlapping = mutex 
If the groups are disjoint, M = 1, no overlapping, no mutex 
number of processes to grant permission 
Parameters and group membership 
 N = number of processes 
 K = voting group size 
 M = number of voting groups each process is in 
 Optimal (smallest K, why?) 
K = M ~= sqrt(N) 
Non-trivial to construct the groups 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
 Approximation 
K = 2 * sqrt(N) - 1 
Put process id’s in a sqrt(N) by sqrt(N) table 
Union the rows and columns where pi is 
N groups 
M = 2 * sqrt(N) - 1 
A B C 
D E F 
G H I 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Group membership 
A B C 
D E F 
G H I 
•Group for A: {A, B, C, D, G} 
•Group for B: {A, B, C, E, H} 
•… 
•Group for I: {G, H, I, C, F} 
•Every pair of groups overlap 
•N groups 
•Each group has K = 2 * sqrt(N) – 1 members 
• M = 2 * sqrt(N) – 1 [# of groups each process are in] 
Group membership (alternative?) 
A B C 
D E F 
G H I 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
A B C 
D E F 
G H I 
•Groups for A, B, C: {A, B, C, D, G} 
•Groups for D, E, F: {D, E, F, B, H} 
•Groups for G, H, I: {G, H, I, C, F} 
•Every pair of groups overlap 
•N groups [Sqrt(N) unique groups] 
•Each group has K = 2 * sqrt(N) – 1 members 
•M = 3 [sqrt(N)] for A, E, I, but M = 6 [2*sqrt(N)] for the rest 
•Undesirable, why? 
Sketch of the Voting Algorithm 
A group of processes Vi for each process pi 
Each pair of groups overlap 
X in groups: [A,X] and [B,X] 
= the groups for any pair of processes overlap 
To enter the critical region, pi 
Sends REQUEST’s to all processes in Vi 
Waits for REPLY’s (VOTE’s) from all processes in Vi 
To exit the critical region, pi 
Sends RELEASE’s to all processes in Vi 
Three types of messages, not two as in the 
multicast alg. 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Figure 12.6 
Maekawa’s voting algorithm 
On initialization 
state := RELEASED; 
voted := FALSE; 
For pi to enter the critical section 
state := WANTED; 
Multicast request to all processes in Vi; 
Wait until (number of replies received = K); 
state := HELD; 
On receipt of a request from pi at pj 
if (state = HELD or voted = TRUE) 
then 
queue request from pi without replying; 
else 
send reply to pi; 
voted := TRUE; 
end if 
For pi to exit the critical section 
state := RELEASED; 
Multicast release to all processes in Vi; 
On receipt of a release from pi at pj 
if (queue of requests is non-empty) 
then 
remove head of queue – from pk, say; 
send reply to pk; 
voted := TRUE; 
else 
voted := FALSE; 
end if 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005
5 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Deadlock? 
Processes: A, B, C 
Group A: A, B 
Group B: B, C 
Group C: C, A 
Deadlock 
A has A’s reply, waiting for B’s reply 
B has B’s reply, waiting for C’s reply 
C has C’s reply, waiting for A’s reply 
Timestamp the requests in HB ordering 
 holding according to the timestamp 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Properties 
Safety: No process can reply/vote more than 
once at any time. 
Liveness: timestamp (HB ordering) 
HB ordering: above 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Performance 
Entry overhead [assuming k = sqrt(N)] 
 Sqrt(N) requests + Sqrt(N) replies 
 2* Sqrt(N) 
 2(N - 1) [N  4] 
Exit overhead 
 Sqrt(N) releases 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Elections 
choosing a unique process for a particular role 
for example, server in dist. mutex 
each process can call only one election 
multiple concurrent elections can be called by 
different processes 
participant: engages in an election 
process with the largest id wins 
each process pi has variable electedi = ? (don't 
know) initially 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Elections 
Properties: 
[E1] electedi of a ``participant'' process must be Pmax 
(elected process---largest id) or ? 
[E2] liveness: all processes participate and 
eventually set electedi != ? (or crash) 
Performance: 
overhead (bandwidth consumption): # of messages 
turnaround time: # of messages to complete an 
election 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
A ring-based algorithm 
logical ring, could be unrelated to the physical 
configuration 
pi sends messages to p(i+1) mod N 
no failures 
elect the coordinator with the largest id 
initially, every process is a non-participant 
any process can call an election: 
marks itself as participant 
places its id in an election message 
sends the message to its neighbor
6 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Ring-based algorithm 
receiving an election message: 
if id  myid, forward the msg, mark participant 
if id  myid 
non-participant: replace id with myid: forward the msg, mark 
participant 
participant: stop forwarding (why? Later, multiple elections) 
if id = myid, coordinator found, mark non-participant, 
electedi := id, send elected message with myid 
receiving an elected message: 
id != myid, mark non-participant, electedi := id 
forward the msg 
if id = myid, stop forwarding 
A ring-based election in progress 
17 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
24 
15 
9 
4 
3 
28 
24 
1 
Note: The election was started by process 17. 
The highest process identifier encountered so far is 24. 
Participant processes are shown darkened 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Ring-based algorithm 
Properties: 
safety: only the process with the largest id can send 
an elected message 
liveness: every process in the ring eventually 
participates in the election; extra elections are 
stopped 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
Ring-based algorithm 
Performance: 
one election, best case, when? 
N election messages 
N elected messages 
turnaround: 2N messages 
one election, worst case, when? 
2N - 1 election messages 
N elected messages 
turnaround: 3N - 1 messages 
can't tolerate failures, not very practical 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
The bully algorithm 
processes can crash and can be detected by 
other processes 
timeout T = 2Ttransmitting + Tprocessing 
each process knows all the other processes and 
can communicate with them 
Messages: election, answer, coordinator 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
The bully algorithm 
start an election 
detects the coordinator has failed 
sends an election message to all processes with 
higher id's and waits for answers (except the failed 
coordinator/process) 
if no answers in time T, 
it is the coordinator 
sends coordinator message (with its id) to all processes with lower 
id's 
else 
waits for a coordinator message 
starts an election if timeout
7 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
The bully algorithm 
p1 p 
2 
p 
3 
C 
p 
4 
p 
1 
p 
2 
p 
3 
p 
4 
C 
coordinator 
Stage 4 
C 
election 
election 
Stage 2 
p 
1 
p 
2 
p 
3 
p 
4 
election 
answer 
answer 
election 
Stage 1 
timeout 
Stage 3 
Eventually..... 
p 
1 
p 
2 
p 
3 
p 
4 
election 
answer 
The election of coordinator p2, 
after the failure of p4 and then p3 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
The bully algorithm 
 receiving an election message 
sends an answer message back 
starts an election if it hasn't started one—send election 
messages to all higher-id processes (including the “failed” 
coordinator—the coordinator might be up by now) 
 receiving a coordinator message 
set electedi to the new coordinator 
 to be a coordinator, it has to start an election 
 when a crashed process is replaced 
the new process starts an election and 
can replace the current coordinator (hence ``bully'') 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
The bully algorithm 
properties: 
safety: 
a lower-id process always yields to a higher-id process 
However, during an election, if a failed process is replaced 
• the low-id processes might have two different coordinators: the 
newly elected coordinator and the new process, why? 
failure detection might be unreliable 
liveness: all processes participate and know the 
coordinator at the end 
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 
© Pearson Education 2005 
The bully algorithm 
Performance 
best case: when? 
overhead: N-2 coordinator messages 
turnaround delay: no election/answer messages 
worst case: when? 
overhead: 
1+ 2 + ...+ (N-2) + (N-2)= (N-1)(N-2)/2 + (N-2) election messages, 
1+...+ (N-2) answer messages, 
N-2 coordinator messages, 
total: (N-1)(N-2) + 2(N-2) = (N+1)(N-2) = O(N2) 
 turnaround delay: delay of election and answer 
messages

More Related Content

Viewers also liked

Chapter 3 slides (Distributed Systems)
Chapter 3 slides (Distributed Systems)Chapter 3 slides (Distributed Systems)
Chapter 3 slides (Distributed Systems)
soe sumijan
 
Chapter 2 system models
Chapter 2 system modelsChapter 2 system models
Chapter 2 system modelsAbDul ThaYyal
 
4.file service architecture
4.file service architecture4.file service architecture
4.file service architectureAbDul ThaYyal
 
Chapter 8 distributed file systems
Chapter 8 distributed file systemsChapter 8 distributed file systems
Chapter 8 distributed file systemsAbDul ThaYyal
 
Chapter 3 networking and internetworking
Chapter 3 networking and internetworkingChapter 3 networking and internetworking
Chapter 3 networking and internetworkingAbDul ThaYyal
 
Chapter 4 a interprocess communication
Chapter 4 a interprocess communicationChapter 4 a interprocess communication
Chapter 4 a interprocess communicationAbDul ThaYyal
 
How to Write a Research Paper, Fast!
How to Write a Research Paper, Fast!How to Write a Research Paper, Fast!
How to Write a Research Paper, Fast!
Essay Academia
 
How to Write a Research Paper
How to Write a Research Paper How to Write a Research Paper
How to Write a Research Paper
Jamaica Olazo
 
4. concurrency control
4. concurrency control4. concurrency control
4. concurrency controlAbDul ThaYyal
 
Unit 1 architecture of distributed systems
Unit 1 architecture of distributed systemsUnit 1 architecture of distributed systems
Unit 1 architecture of distributed systemskaran2190
 
Writing A Research Paper In 10 Easy Steps
Writing A Research Paper In 10 Easy StepsWriting A Research Paper In 10 Easy Steps
Writing A Research Paper In 10 Easy Stepslauren
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed SystemsRupsee
 

Viewers also liked (18)

Chapter 9 names
Chapter 9 namesChapter 9 names
Chapter 9 names
 
Chapter 3 slides (Distributed Systems)
Chapter 3 slides (Distributed Systems)Chapter 3 slides (Distributed Systems)
Chapter 3 slides (Distributed Systems)
 
Chapter 6 os
Chapter 6 osChapter 6 os
Chapter 6 os
 
2. microkernel new
2. microkernel new2. microkernel new
2. microkernel new
 
Chapter 17 corba
Chapter 17 corbaChapter 17 corba
Chapter 17 corba
 
Chapter 1 slides
Chapter 1 slidesChapter 1 slides
Chapter 1 slides
 
3. challenges
3. challenges3. challenges
3. challenges
 
Chapter 2 system models
Chapter 2 system modelsChapter 2 system models
Chapter 2 system models
 
4.file service architecture
4.file service architecture4.file service architecture
4.file service architecture
 
Chapter 8 distributed file systems
Chapter 8 distributed file systemsChapter 8 distributed file systems
Chapter 8 distributed file systems
 
Chapter 3 networking and internetworking
Chapter 3 networking and internetworkingChapter 3 networking and internetworking
Chapter 3 networking and internetworking
 
Chapter 4 a interprocess communication
Chapter 4 a interprocess communicationChapter 4 a interprocess communication
Chapter 4 a interprocess communication
 
How to Write a Research Paper, Fast!
How to Write a Research Paper, Fast!How to Write a Research Paper, Fast!
How to Write a Research Paper, Fast!
 
How to Write a Research Paper
How to Write a Research Paper How to Write a Research Paper
How to Write a Research Paper
 
4. concurrency control
4. concurrency control4. concurrency control
4. concurrency control
 
Unit 1 architecture of distributed systems
Unit 1 architecture of distributed systemsUnit 1 architecture of distributed systems
Unit 1 architecture of distributed systems
 
Writing A Research Paper In 10 Easy Steps
Writing A Research Paper In 10 Easy StepsWriting A Research Paper In 10 Easy Steps
Writing A Research Paper In 10 Easy Steps
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 

Similar to Chapter 11b

Chapter 11c coordination agreement
Chapter 11c coordination agreementChapter 11c coordination agreement
Chapter 11c coordination agreementAbDul ThaYyal
 
Chapter 11d coordination agreement
Chapter 11d coordination agreementChapter 11d coordination agreement
Chapter 11d coordination agreementAbDul ThaYyal
 
NLP applied to French legal decisions
NLP applied to French legal decisionsNLP applied to French legal decisions
NLP applied to French legal decisions
Michael BENESTY
 
DC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdf
DC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdfDC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdf
DC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdf
LegesseSamuel
 
Ch3 (1)
Ch3 (1)Ch3 (1)
L14.C3.FA18.ppt
L14.C3.FA18.pptL14.C3.FA18.ppt
L14.C3.FA18.ppt
FarhanKhan371680
 
ch3.ppt
ch3.pptch3.ppt
ch3.ppt
QadarAhmed1
 
Symbexecsearch
SymbexecsearchSymbexecsearch
Symbexecsearch
Abhik Roychoudhury
 
Evolution of Online Course Delivery
Evolution of Online Course DeliveryEvolution of Online Course Delivery
Evolution of Online Course Delivery
Colm Dunphy
 
Defuzzification
DefuzzificationDefuzzification
Defuzzification
Dr. C.V. Suresh Babu
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
Gaurav Khurana
 
CIB W78 2007 - Comparison of distance learning courses
CIB W78 2007 - Comparison of distance learning coursesCIB W78 2007 - Comparison of distance learning courses
CIB W78 2007 - Comparison of distance learning courses
Robert Klinc
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
Xavier Ochoa
 
01.pdf
01.pdf01.pdf
01.pdf
Rakesh Kumar
 
Verilog Lecture1
Verilog Lecture1Verilog Lecture1
Verilog Lecture1
Béo Tú
 
Stij5014 distributed systems
Stij5014 distributed systemsStij5014 distributed systems
Stij5014 distributed systemslara_ays
 

Similar to Chapter 11b (20)

Cordination&agreement
Cordination&agreementCordination&agreement
Cordination&agreement
 
Chapter 11c coordination agreement
Chapter 11c coordination agreementChapter 11c coordination agreement
Chapter 11c coordination agreement
 
Chapter 11d coordination agreement
Chapter 11d coordination agreementChapter 11d coordination agreement
Chapter 11d coordination agreement
 
Basic Paxos Implementation in Orc
Basic Paxos Implementation in OrcBasic Paxos Implementation in Orc
Basic Paxos Implementation in Orc
 
NLP applied to French legal decisions
NLP applied to French legal decisionsNLP applied to French legal decisions
NLP applied to French legal decisions
 
DC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdf
DC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdfDC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdf
DC Lecture 04 and 05 Mutual Excution and Election Algorithms.pdf
 
Ch3 (1)
Ch3 (1)Ch3 (1)
Ch3 (1)
 
Presentation MaSE 18-102012
Presentation MaSE 18-102012Presentation MaSE 18-102012
Presentation MaSE 18-102012
 
L14.C3.FA18.ppt
L14.C3.FA18.pptL14.C3.FA18.ppt
L14.C3.FA18.ppt
 
ch3.ppt
ch3.pptch3.ppt
ch3.ppt
 
Symbexecsearch
SymbexecsearchSymbexecsearch
Symbexecsearch
 
Evolution of Online Course Delivery
Evolution of Online Course DeliveryEvolution of Online Course Delivery
Evolution of Online Course Delivery
 
Defuzzification
DefuzzificationDefuzzification
Defuzzification
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
 
CIB W78 2007 - Comparison of distance learning courses
CIB W78 2007 - Comparison of distance learning coursesCIB W78 2007 - Comparison of distance learning courses
CIB W78 2007 - Comparison of distance learning courses
 
Systems Analysis and Design for BSA
Systems Analysis and Design for BSASystems Analysis and Design for BSA
Systems Analysis and Design for BSA
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
01.pdf
01.pdf01.pdf
01.pdf
 
Verilog Lecture1
Verilog Lecture1Verilog Lecture1
Verilog Lecture1
 
Stij5014 distributed systems
Stij5014 distributed systemsStij5014 distributed systems
Stij5014 distributed systems
 

More from AbDul ThaYyal

Chapter 15 distributed mm systems
Chapter 15 distributed mm systemsChapter 15 distributed mm systems
Chapter 15 distributed mm systemsAbDul ThaYyal
 
Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replicationAbDul ThaYyal
 
Chapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency controlChapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency controlAbDul ThaYyal
 
3. distributed file system requirements
3. distributed file system requirements3. distributed file system requirements
3. distributed file system requirementsAbDul ThaYyal
 

More from AbDul ThaYyal (9)

Chapter 15 distributed mm systems
Chapter 15 distributed mm systemsChapter 15 distributed mm systems
Chapter 15 distributed mm systems
 
Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replication
 
Chapter 13
Chapter 13Chapter 13
Chapter 13
 
Chapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency controlChapter 12 transactions and concurrency control
Chapter 12 transactions and concurrency control
 
Chapter 11
Chapter 11Chapter 11
Chapter 11
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Chapter 7 security
Chapter 7 securityChapter 7 security
Chapter 7 security
 
4. system models
4. system models4. system models
4. system models
 
3. distributed file system requirements
3. distributed file system requirements3. distributed file system requirements
3. distributed file system requirements
 

Chapter 11b

  • 1. 1 Slides for Chapter 12: Coordination and Agreement From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, © Pearson Education 2005 Failure Assumptions and Failure Detectors reliable communication channels process failures: crashes failure detector: object/code in a process that detects failures of other processes unreliable failure detector unsuspected or suspected (evidence of possible failures) each process sends ``alive'' message to everyone else not receiving ``alive'' message after timeout most practical systems Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 reliable failure detector unsuspected or failure synchronous system few practical systems Distributed Mutual Exclusion provide critical region in a distributed environment message passing for example, locking files, lockd daemon in UNIX (NFS is stateless, no file-locking at the NFS level) Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Algorithms for mutual exclusion N processes processes don't fail message delivery is reliable critical region: enter(), resourceAccesses(), exit() Properties: [ME1] safety: only one process at a time [ME2] liveness: eventually enter or exit [ME3] happened-before ordering: ordering of enter() is the same as HB ordering Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Algorithms for mutual exclusion Performance evaluation: overhead and bandwidth consumption: # of messages sent client delay incurred by a process at entry and exit throughput measured by synchronization delay: delay between one's exit and next's entry Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 A central server algorithm server keeps track of a token---permission to enter critical region a process requests the server for the token the server grants the token if it has the token a process can enter if it gets the token, otherwise waits when done, a process sends release and exits Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 2. 2 Server managing a mutual exclusion token for a set of processes Server Queue of requests 3. Grant token Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 1. Request token 2. Release token 4 2 p 4 p p 3 2 p 1 A central server algorithm Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Properties: safety, why? liveness, why? HB ordering not guaranteed, why? [VC in processes vs. server] Performance: enter overhead: two messages (request and grant) enter delay: time between request and grant exit overhead: one message (release) exit delay: none synchronization delay: between release and grant centralized server is the bottle neck Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 A ring-based algorithm logical ring, could be unrelated to the physical configuration pi sends messages to p(i+1) mod N when a process holds a token, it can enter, otherwise waits when a process releases a token (exit), it sends to its neighbor A ring of processes transferring a mutual exclusion token Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 p n p 2 p 3 p 4 Token p 1 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 A ring-based algorithm properties: safety, why? liveness, why? HB ordering not guaranteed, why? Performance: bandwidth consumption: token keeps circulating enter overhead: 0 to N messages enter delay: delay for 0 to N messages exit overhead: one message exit delay: none synchronization delay: delay for 1 to N messages An algorithm using multicast and logical clocks multicast a request message for the token enter only if all the other processes reply totally-ordered timestamps: T, pi each process keeps a state: RELEASED, HELD, Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 WANTED if all have state = RELEASED, all reply, a process can hold the token and enter if a process has state = HELD, doesn't reply until it exits if more than one process has state = WANTED, process with the lowest timestamp will get all N-1 replies first.
  • 3. 3 Ricart and Agrawala’s algorithm Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 On initialization state := RELEASED; To enter the section state := WANTED; Multicast request to all processes; request processing deferred here T := request’s timestamp; Wait until (number of replies received = (N – 1)); state := HELD; On receipt of a request Ti, pi at pj (i j) if (state = HELD or (state = WANTED and (T, pj) (Ti, pi))) then queue request from pi without replying; else reply immediately to pi; end if To exit the critical section state := RELEASED; reply to any queued requests; Multicast synchronization Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 p 3 34 Reply 34 41 41 41 34 p 1 p 2 Reply Reply An algorithm using multicast and logical locks Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Properties safety, why? liveness, why? HB ordering, why? An algorithm using multicast and logical locks Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 performance: bandwidth consumption: no token keeps circulating entry overhead: 2(N-1), why? [with multicast support: 1 + (N -1) = N] entry delay: delay between request and getting all replies exit overhead: 0 to N-1 messages exit delay: none synchronization delay: delay for 1 message (one last reply from the previous holder) Maekawa’s Voting Algorithm – Main Idea We actually don’t need all N – 1 replies Consider processes A, B, X A needs replies from “A” and X B needs replies from “B” and X If X can only reply to (vote) *one process at a time* A and B cannot have a reply from X at the same time Mutex between A and B--hinges on X Processes in overlapping groups members in the overlap “control” mutex Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Mutual exclusion for all processes A group for every process A pair of groups for A and B overlaps Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 = mutex(A, B) Every possible pair of groups overlaps = mutex(all possible pairs of processes) = mutex(all processes)
  • 4. 4 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Groups and members A group for each process N processes N groups Each group has K members numbers of processes to request for permission Each process is in M groups (M 1) Allows overlapping = mutex If the groups are disjoint, M = 1, no overlapping, no mutex number of processes to grant permission Parameters and group membership N = number of processes K = voting group size M = number of voting groups each process is in Optimal (smallest K, why?) K = M ~= sqrt(N) Non-trivial to construct the groups Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Approximation K = 2 * sqrt(N) - 1 Put process id’s in a sqrt(N) by sqrt(N) table Union the rows and columns where pi is N groups M = 2 * sqrt(N) - 1 A B C D E F G H I Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Group membership A B C D E F G H I •Group for A: {A, B, C, D, G} •Group for B: {A, B, C, E, H} •… •Group for I: {G, H, I, C, F} •Every pair of groups overlap •N groups •Each group has K = 2 * sqrt(N) – 1 members • M = 2 * sqrt(N) – 1 [# of groups each process are in] Group membership (alternative?) A B C D E F G H I Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 A B C D E F G H I •Groups for A, B, C: {A, B, C, D, G} •Groups for D, E, F: {D, E, F, B, H} •Groups for G, H, I: {G, H, I, C, F} •Every pair of groups overlap •N groups [Sqrt(N) unique groups] •Each group has K = 2 * sqrt(N) – 1 members •M = 3 [sqrt(N)] for A, E, I, but M = 6 [2*sqrt(N)] for the rest •Undesirable, why? Sketch of the Voting Algorithm A group of processes Vi for each process pi Each pair of groups overlap X in groups: [A,X] and [B,X] = the groups for any pair of processes overlap To enter the critical region, pi Sends REQUEST’s to all processes in Vi Waits for REPLY’s (VOTE’s) from all processes in Vi To exit the critical region, pi Sends RELEASE’s to all processes in Vi Three types of messages, not two as in the multicast alg. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Figure 12.6 Maekawa’s voting algorithm On initialization state := RELEASED; voted := FALSE; For pi to enter the critical section state := WANTED; Multicast request to all processes in Vi; Wait until (number of replies received = K); state := HELD; On receipt of a request from pi at pj if (state = HELD or voted = TRUE) then queue request from pi without replying; else send reply to pi; voted := TRUE; end if For pi to exit the critical section state := RELEASED; Multicast release to all processes in Vi; On receipt of a release from pi at pj if (queue of requests is non-empty) then remove head of queue – from pk, say; send reply to pk; voted := TRUE; else voted := FALSE; end if Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005
  • 5. 5 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Deadlock? Processes: A, B, C Group A: A, B Group B: B, C Group C: C, A Deadlock A has A’s reply, waiting for B’s reply B has B’s reply, waiting for C’s reply C has C’s reply, waiting for A’s reply Timestamp the requests in HB ordering holding according to the timestamp Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Properties Safety: No process can reply/vote more than once at any time. Liveness: timestamp (HB ordering) HB ordering: above Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Performance Entry overhead [assuming k = sqrt(N)] Sqrt(N) requests + Sqrt(N) replies 2* Sqrt(N) 2(N - 1) [N 4] Exit overhead Sqrt(N) releases Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Elections choosing a unique process for a particular role for example, server in dist. mutex each process can call only one election multiple concurrent elections can be called by different processes participant: engages in an election process with the largest id wins each process pi has variable electedi = ? (don't know) initially Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Elections Properties: [E1] electedi of a ``participant'' process must be Pmax (elected process---largest id) or ? [E2] liveness: all processes participate and eventually set electedi != ? (or crash) Performance: overhead (bandwidth consumption): # of messages turnaround time: # of messages to complete an election Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 A ring-based algorithm logical ring, could be unrelated to the physical configuration pi sends messages to p(i+1) mod N no failures elect the coordinator with the largest id initially, every process is a non-participant any process can call an election: marks itself as participant places its id in an election message sends the message to its neighbor
  • 6. 6 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Ring-based algorithm receiving an election message: if id myid, forward the msg, mark participant if id myid non-participant: replace id with myid: forward the msg, mark participant participant: stop forwarding (why? Later, multiple elections) if id = myid, coordinator found, mark non-participant, electedi := id, send elected message with myid receiving an elected message: id != myid, mark non-participant, electedi := id forward the msg if id = myid, stop forwarding A ring-based election in progress 17 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 24 15 9 4 3 28 24 1 Note: The election was started by process 17. The highest process identifier encountered so far is 24. Participant processes are shown darkened Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Ring-based algorithm Properties: safety: only the process with the largest id can send an elected message liveness: every process in the ring eventually participates in the election; extra elections are stopped Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 Ring-based algorithm Performance: one election, best case, when? N election messages N elected messages turnaround: 2N messages one election, worst case, when? 2N - 1 election messages N elected messages turnaround: 3N - 1 messages can't tolerate failures, not very practical Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 The bully algorithm processes can crash and can be detected by other processes timeout T = 2Ttransmitting + Tprocessing each process knows all the other processes and can communicate with them Messages: election, answer, coordinator Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 The bully algorithm start an election detects the coordinator has failed sends an election message to all processes with higher id's and waits for answers (except the failed coordinator/process) if no answers in time T, it is the coordinator sends coordinator message (with its id) to all processes with lower id's else waits for a coordinator message starts an election if timeout
  • 7. 7 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 The bully algorithm p1 p 2 p 3 C p 4 p 1 p 2 p 3 p 4 C coordinator Stage 4 C election election Stage 2 p 1 p 2 p 3 p 4 election answer answer election Stage 1 timeout Stage 3 Eventually..... p 1 p 2 p 3 p 4 election answer The election of coordinator p2, after the failure of p4 and then p3 Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 The bully algorithm receiving an election message sends an answer message back starts an election if it hasn't started one—send election messages to all higher-id processes (including the “failed” coordinator—the coordinator might be up by now) receiving a coordinator message set electedi to the new coordinator to be a coordinator, it has to start an election when a crashed process is replaced the new process starts an election and can replace the current coordinator (hence ``bully'') Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 The bully algorithm properties: safety: a lower-id process always yields to a higher-id process However, during an election, if a failed process is replaced • the low-id processes might have two different coordinators: the newly elected coordinator and the new process, why? failure detection might be unreliable liveness: all processes participate and know the coordinator at the end Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4 © Pearson Education 2005 The bully algorithm Performance best case: when? overhead: N-2 coordinator messages turnaround delay: no election/answer messages worst case: when? overhead: 1+ 2 + ...+ (N-2) + (N-2)= (N-1)(N-2)/2 + (N-2) election messages, 1+...+ (N-2) answer messages, N-2 coordinator messages, total: (N-1)(N-2) + 2(N-2) = (N+1)(N-2) = O(N2) turnaround delay: delay of election and answer messages