Clock synchronization: Clocks, events and process states, Synchronizing physical clocks,
Logical time and logical clocks, Lamport’s Logical Clock, Global states, Distributed mutual
exclusion algorithms: centralized, decentralized, distributed and token ring algorithms,
election algorithms, Multicast communication.
2. Distributed System
Mr. Sagar Pandya
Information Technology Department
sagar.pandya@medicaps.ac.in
Course
Code
Course Name Hours Per
Week
Total Hrs. Total
Credits
L T P
IT3EL04 Distributed System 3 0 0 3 3
3. Reference Books
Text Book:
1. G. Coulouris, J. Dollimore and T. Kindberg, Distributed Systems: Concepts
and design, Pearson.
2. P K Sinha, Distributed Operating Systems: Concepts and design, PHI
Learning.
3. Sukumar Ghosh, Distributed Systems - An Algorithmic approach, Chapman
and Hall/CRC
Reference Books:
1. Tanenbaum and Steen, Distributed systems: Principles and Paradigms,
Pearson.
2. Sunita Mahajan & Shah, Distributed Computing, Oxford Press.
3. Distributed Algorithms by Nancy Lynch, Morgan Kaufmann.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
4. Unit-3
Clock synchronization:
Clocks, events and process states,
Synchronizing physical clocks,
Logical time and logical clocks,
Lamport’s Logical Clock,
Global states,
Distributed mutual
exclusion algorithms: centralized, decentralized,
distributed and token ring algorithms,
election algorithms,
Multicast communication.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
5. INTRODUCTION
Distributed System is a collection of computers connected via the
high speed communication network.
In the distributed system, the hardware and software components
communicate and coordinate their actions by message passing.
Each node in distributed systems can share their resources with other
nodes.
So, there is need of proper allocation of resources to preserve the
state of resources and help coordinate between the several processes.
To resolve such conflicts, synchronization is used.
Synchronization in distributed systems is achieved via clocks.
The physical clocks are used to adjust the time of nodes.
Each node in the system can share its local time with other nodes in
the system.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
6. INTRODUCTION
In this chapter, we mainly concentrate on how processes can
synchronize.
For example, it is important that multiple processes do not
simultaneously access a shared resource, such as printer, but instead
cooperate in granting each other temporary exclusive access.
Another example is that multiple processes may some times need to
agree on the ordering of events, such as whether message ml from
process P was sent before or after message m2 from process.
Every computer needs a timer mechanism (called a computer clock)
to keep track of current time and also for various accounting
purposes such as calculating the time spent by a process in CPU
utilization, disk I/(), and so on, so that the corresponding user can be
charged properly.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
7. INTRODUCTION
In a distributed system, an application may have processes that
concurrently run on multiple nodes of the system.
For correct results, several such distributed applications require that
the clocks of the nodes are synchronized with each other.
For example, for a distributed on-line reservation system to be fair,
the only remaining seat booked almost simultaneously from two
different nodes should be offered to the client who booked first, even
if the time difference between the two bookings is very small.
It may not be possible to guarantee this if the clocks of the nodes of
the system are not synchronized.
In a distributed system, synchronized clocks also enable one to
measure the duration of distributed activities that start on one node
and terminate on another node.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
8. INTRODUCTION
It is difficult to get the correct result in this case if the clocks of the
sender and receiver nodes are not synchronized. There are several
other applications of synchronized clocks in distributed systems.
Synchronization is coordination with respect to time, and refers to
the ordering of events and execution of instructions in time.
It is often important to know when events occurred and in what order
they occurred.
Clock synchronization deals with understanding the temporal
ordering of events produced by concurrent processes.
It is useful for synchronizing senders and receivers of messages,
controlling joint activity, and the serializing concurrent access to
shared objects.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
9. INTRODUCTION
The goal is that multiple unrelated processes running on different
machines should be in agreement with and be able to make consistent
decisions about the ordering of events in a system.
Another aspect of clock synchronization deals with synchronizing
time-of-day clocks among groups of machines.
In this case, we want to ensure that all machines can report the same
time, regardless of how imprecise their clocks may be or what the
network latencies are between the machines.
A computer clock usually consists of three components-a quartz
crystal that oscillates at a well- defined frequency, a counter register,
and a constant register.
The constant register is used to store a constant value that is decided
based on the frequency of oscillation of the quartz crystal.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
10. INTRODUCTION
The counter register is used to keep track of the oscillations of the
quartz crystal.
That is, the value in the counter register is decremented by 1 for each
oscillation of the quartz crystal.
When the value of the counter register becomes zero, an interrupt is
generated and its value is reinitialized to the value in the constant
register.
Each interrupt is called a clock tick.
To make the computer clock function as an ordinary clock used by us
in our day-today life, the following things are done:
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
11. INTRODUCTION
The value in the constant register is chosen so that 60 clock ticks
occur in a second.
The computer clock is synchronized with real time (external clock).
For this, two more values are stored in the system-a fixed starting
date and time and the number of ticks.
For example, in UNIX, time begins at 0000 on January 1, 1970.
At the time of initial booting, the system asks the operator to enter
the current date and time.
The system converts the entered value to the number of ticks after
the fixed starting date and time.
At every clock tick, the interrupt service routine increments the value
of the number of ticks to keep the clock running.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
12. INTRODUCTION
Explain Drifting of Clock
A clock always runs at a constant rate because its quartz crystal
oscillates at a well-defined frequency.
However, due to differences in the crystals, the rates at which two
clocks run are normally different from each other.
The difference in the oscillation period between two clocks might be
extremely small, but the difference accumulated over many
oscillations leads to an observable difference in the times of the two
clocks, no matter how accurately they were initialized to the same
value.
Therefore, with the passage of time, a computer clock drifts from the
real-time clock that was used for its initial setting.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
13. INTRODUCTION
For clocks based on a quartz crystal, the drift rate is approximately 10-
6, giving a difference of 1 second every 1,000,000 seconds, or 11.6
days.
Hence a computer clock must be periodically resynchronized with the
real-time clock to keep it non faulty.
Even non faulty clocks do not always maintain perfect time.
A clock is considered non faulty if there is a bound on the amount of
drift from real time for any given finite time interval.
As shown in Figure, after synchronization with a perfect clock, slow
and fast clocks drift in opposite directions from the perfect clock.
This is because for slow clocks dC/dt < 1 and for fast clocks dC/dt > 1
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
15. INTRODUCTION
A distributed system requires the following types of clock
synchronization:
(1) Synchronization of the computer clocks with real-time (or
external) clocks.
(2) This type of synchronization is mainly required for real-time
applications.
(3) That is, external clock synchronization allows the system to
exchange information about the timing of events with other systems
and users.
(4) An external time source that is often used as a reference for
synchronizing computer clocks with real time is the Coordinated
Universal Time (UTC).
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
16. INTRODUCTION
(5) Mutual (or internal) synchronization of the clocks of different
nodes of the system.
(6) This type of synchronization is mainly required for those
applications that require a consistent view of time across all nodes of
a distributed system as well as for the measurement of the duration of
distributed activities that terminate on anode different from the one
on which they start.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
17. Clock Synchronization
The time is set based on UTC (Universal Time Coordination).
UTC is used as a reference time clock for the nodes in the system.
The clock synchronization can be achieved by 2 ways:
External and Internal Clock Synchronization.
1. External clock synchronization is the one in which an external
reference clock is present. It is used as a reference and the nodes in
the system can set and adjust their time accordingly.
2. Internal clock synchronization is the one in which each node
shares its time with other nodes and all the nodes set and adjust their
times accordingly.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
18. Clock Synchronization
There are 2 types of clock synchronization algorithms:
1. Centralized and
2. Distributed.
1. Centralized is the one in which a time server is used as a
reference.
The single time server propagates its time to the nodes and all the
nodes adjust the time accordingly.
It is dependent on single time server so if that node fails, the whole
system will lose synchronization.
Examples of centralized are- Berkeley Algorithm, Passive Time
Server, Active Time Server etc.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
19. Clock Synchronization
2. Distributed is the one in which there is no centralized time server
present.
Instead the nodes adjust their time by using their local time and then,
taking the average of the differences of time with other nodes.
Distributed algorithms overcome the issue of centralized algorithms
like the scalability and single point failure.
Examples of Distributed algorithms are – Global Averaging
Algorithm, Localized Averaging Algorithm, NTP (Network time
protocol) etc.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
21. Clock Synchronization
Physical Clocks
Nearly all computers have a circuit for keeping track of time.
Despite the widespread use of the word "clock" to refer to these
devices, they are not actually clocks in the usual sense.
Timer is perhaps a better word.
A computer timer is usually a precisely machined quartz crystal.
When kept under tension, quartz crystals oscillate at a well-defined
frequency that depends on the kind of crystal, how it is cut, and the
amount of tension.
Associated with each crystal are two registers, a counter and a
holding register.
Each oscillation of the crystal decrements the counter by one.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
22. Clock Synchronization
When the counter gets to zero, an interrupt is generated and the
counter is reloaded from the holding register.
In this way, it is possible to program a timer to generate an interrupt
60 times a second, or at any other desired frequency.
Each interrupt is called one clock tick.
When the system is booted, it usually asks the user to enter the date
and time, which is then converted to the number of ticks after some
known starting date and stored in memory.
Most computers have a special battery-backed up CMOS RAM so
that the date and time need not be entered on subsequent boots.
At every clock tick, the interrupt service procedure adds one to the
time stored in memory.
In this way, the (software) clock is kept up to date.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
23. Clock Synchronization
As soon as multiple CPUs are introduced, each with its own clock,
the situation changes radically.
Although the frequency at which a crystal oscillator runs is usually
fairly stable, it is impossible to guarantee that the crystals in different
computers all run at exactly the same frequency.
In practice, when a system has n computers, all n crystals will run at
slightly different rates, causing the (software) clocks gradually to get
out of synch and give different values when read out. This difference
in time values is called clock skew.
UTC is the basis of all modern civil timekeeping.
It has essentially replaced the old standard, Greenwich Mean Time.
which is astronomical time.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
24. Clock Synchronization
How Often Do We Need To Resynchronize Clocks?
Coordinating physical clocks among several systems is possible, but
it can never be exact.
In distributed systems, we must be willing to accept some drift away
from the "real" time on each clock.
A typical real-time clock within a computer has a relative error of
approximately 10-5. T
his means that if the clock is configured for 100 ticks/second, we
should expect 360,000 +/- 4 ticks per hour.
The maximum drift rate of a clock is the maximum rate at which it
can drift.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
25. Clock Synchronization
Since different clocks can drift in different directions, the worst case
is that two clocks in a system will drift in opposite directions.
In this case the difference between these clocks can be twice the
relative error.
Use our example relative error of 10-5, this suggests that clocks can
drift apart by up to 8 ticks/hour.
So, for example, if we want all clocks in this system to be within 4
ticks, we must synchronize them twice per hour.
A general formula expressing these ideas follows:
Largest synchronization interval = maximum acceptable difference /
maximum drift rate/2
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
26. Clock Synchronization
Logical Clock in Distributed System
Logical Clocks refer to implementing a protocol on all machines
within your distributed system, so that the machines are able to
maintain consistent ordering of events within some virtual timespan.
A logical clock is a mechanism for capturing chronological and
causal relationships in a distributed system.
Distributed systems may have no physically synchronous global
clock, so a logical clock allows global ordering on events from
different processes in such systems.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
27. Clock Synchronization
Example :
If we go outside then we have made a full plan that at which place
we have to go first, second and so on.
We don’t go to second place at first and then the first place.
We always maintain the procedure or an organization that is planned
before.
In a similar way, we should do the operations on our PCs one by one
in an organized way.
Suppose, we have more than 10 PCs in a distributed system and
every PC is doing it’s own work but then how we make them work
together.
There comes a solution to this i.e. LOGICAL CLOCK.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
28. Clock Synchronization
Method-1:
To order events across process, try to sync clocks in one approach.
This means that if one PC has a time 2:00 pm then every PC should
have the same time which is quite not possible.
Not every clock can sync at one time. Then we can’t follow this
method.
Method-2:
Another approach is to assign Timestamps to events.
Taking the example into consideration, this means if we assign the
first place as 1, second place as 2, third place as 3 and so on.
Then we always know that the first place will always come first and
then so on.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
29. Clock Synchronization
Similarly, If we give each PC their individual number than it will be
organized in a way that 1st PC will complete its process first and
then second and so on.
BUT, Timestamps will only work as long as they obey causality.
What is causality ?
Causality is fully based on HAPPEN BEFORE RELATIONSHIP.
Taking single PC only if 2 events A and B are occurring one by one
then TS(A) < TS(B).
If A has timestamp of 1, then B should have timestamp more than 1,
then only happen before relationship occurs.
Taking 2 PCs and event A in P1 (PC.1) and event B in P2 (PC.2) then
also the condition will be TS(A) < TS(B).
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
30. Clock Synchronization
Taking example- suppose you are sending message to someone at
2:00:00 pm, and the other person is receiving it at 2:00:02 pm.
Then it’s obvious that TS(sender) < TS(receiver).
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
31. Clock Synchronization
Properties Derived from Happen Before Relationship –
Transitive Relation –
If, TS(A) <TS(B)
and TS(B) <TS(C), then
TS(A) < TS(C)
Causally Ordered Relation –
a->b, this means that a is occurring before b and if there is any
changes in a it will surely reflect on b.
Concurrent Event –
This means that not every process occurs one by one, some processes
are made to happen simultaneously i.e., A || B.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
32. Clock Synchronization
1.) Passive time server algorithms –
Each node periodically sends a message called ‘time=?’ to the time
server.
When the time server receives the message, it responds with
‘time=T’ message.
Assume that client node has a clock time of T0 when it sends
‘time=?’ and time T1 when it receives the ‘time=T’ message.
T0 and T1 are measured using the same clock, thusthe time needed in
propagation of message from time server to client node would be
(T1- T0)/2
When client node receives the reply from the time server, client node
is readjusted to Tserver+(T1-T0)/2.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
33. Clock Synchronization
Two methods have been proposed to improve estimated value
Let the approximate time taken by the time server to handle the
interrupt and process the message request message ‘time=?’ is equal
to I.
Hence, a better estimate for time taken for propagation of response
message from time server node to client node is taken as (T1- T0-I)/2
Clock is adjusted to the value Tserver+(T1- T0-I)/2
Christian method:-
This method assumes that a certain machine, the time server is
synchronized to the UTC in some fashion called T.
Periodically all clock is synchronized with time server.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
34. Clock Synchronization
Other machines send a message to the time server, which responds
with T in a response, as fast as possible.
The interval (T1- T0) is measured many times.
Those measurements in which (T1- T0) exceeds a specific threshold
values are considered to be unreliable and are discarded.
Only those values that fall in the range (T1- T0-2Tmin) are
considered for calculating the correct time.
For all the remaining measurements, an average is calculated which
is added to T.
Alternatively, measurement for which value of (T1- T0) is minimum,
is considered most accurate and half of its value is added to T.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
36. Clock Synchronization - Cristian’s Algorithm
A.) Cristian’s Method
Cristian’s Algorithm is a clock synchronization algorithm is used to
synchronize time with a time server by client processes.
This algorithm works well with low-latency networks where Round
Trip Time is short as compared to accuracy while redundancy prone
distributed systems/applications do not go hand in hand with this
algorithm.
Here Round Trip Time refers to the time duration between start of a
Request and end of corresponding Response.
This method assumes that a certain machine, the time serveris
synchronized to the UTC in some fashion called T.
Periodically all clock is synchronized with time server.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
37. Clock Synchronization - Cristian’s Algorithm
Other machines send a message to the time server, which responds
with T in a response, as fast as possible.
The interval (T1- T0) is measured many times.
Those measurements in which (T1- T0) exceeds a specific threshold
values are considered to be unreliable and are discarded.
Only those values that fall in the range (T1- T0)/2-2Tmin are
considered for calculating the correct time.
For all the remaining measurements, an average is calculated which
is added to T.
Alternatively, measurement for which value of (T1- T0) is minimum,
is considered most accurate and half of its value is added to T.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
39. Clock Synchronization - Cristian’s Algorithm
Algorithm:
1) The process on the client machine sends the request for fetching
clock time(time at server) to the Clock Server at time T0 .
2) The Clock Server listens to the request made by the client process
and returns the response in form of clock server time.
3) The client process fetches the response from the Clock Server at
time T1 and calculates the synchronized client clock time using the
formula given below.
TCLIENT = TSERVER+ (T1 - T0)/2
where TCLIENT refers to the synchronized clock time,
TSERVER refers to the clock time returned by the server,
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
40. Clock Synchronization - Cristian’s Algorithm
T0 refers to the time at which request was sent by the client process,
T1 refers to the time at which response was received by the client
process
Working/Reliability of above formula:
T1 - T0 refers to the combined time taken by the network and the
server to transfer request to the server, process the request and
returning the response back to the client process, assuming that the
network latency T0 and T1 are approximately equal.
The time at client side differs from actual time by at most (T1 - T0)/2
seconds. Using above statement we can draw a conclusion that the
error in synchronization can be at most (T1 - T0)/2 seconds.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
41. Clock Synchronization - Active time server
Active time server algorithms
It is also called Berkeley Algorithm.
An algorithm for internal synchronization of a group of computers.
A master polls to collect clock values from the others (slaves).
The master uses round trip times to estimate the slaves’ clock values.
It obtains average from participating computers.
It sends the required adjustment to the slaves.
If master fails, can elect a new master to take over.
It synchronizes all clocks to average.
Time server periodically broadcasts its clock time as ‘time=t’.
Other nodes receive message and readjust their local clock
accordingly.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
42. Clock Synchronization - Berkeley Algorithm
Each node assumes message propagation time = ta, and readjust
clock time = t + ta.
There are some limitations as follows [5]:
i. Due to communication link failure message may be delayed and
clock readjusted to incorrect time.
ii. Network should support broadcast facility.
B.) Berkeley Algorithm
This algorithm overcomes limitation of faulty clock and malicious
interference in passive time server and also overcomes limitation of
active time server algorithm
Time server periodically sends a request message ‘time=?’ to all
nodes in the system.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
43. Clock Synchronization - Berkeley Algorithm
Each node sends back its time value to the time server.
Time server has an idea of message propagation to each node and
readjust the clock values in reply message based on it.
Time server takes an average of other computer clock’s value
including its own clock value and readjusts its own clock
accordingly.
It avoids reading from unreliable clocks.
For readjustment, time server sends the factor by which other nodes
require adjustment.
The readjustment value can either be +ve or –ve.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
45. Clock Synchronization - Berkeley Algorithm
Here the time server (actually, a time daemon) is active, polling
every machine periodically to ask what time it is there.
Based on the answers, it computes an average time and tells all the
other machines to advance their clocks
to the new time or slow their clocks down until some specified
reduction has been achieved.
In Figure (a), at 3:00, the time daemon tells the other machines its
time and asks for theirs.
In Figure (b), they respond with how far ahead or behind the time
daemon they are.
Armed with these numbers, the time daemon computes the average
and tells each machine how to adjust its clock Figure (c).
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
46. Clock Synchronization
C.) Network Time Protocol (NTP)
Cristian’s and Berkeley algorithms synch within Intranet
NTP – defines an architecture for a time service and a protocol to
distribute time information over the Internet.
Provides a service enabling clients across the Internet to be synchronized
accurately to UTC
Provides a reliable service that can survive lengthy losses of connectivity
Enables client to resynchronize sufficiently frequently to offset clock
drifts
Provides protection against interference with the time service.
Uses a network of time servers to synchronize all processes on a
network.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
48. Clock Synchronization
Centralized clock synchronization algorithms suffer from two
major drawbacks:
1. They are subject to single-point failure. If the time server node
fails, the clock synchronization operation cannot be performed. This
makes the system unreliable.
Ideally, a distributed system should be more reliable than its
individual nodes.
If one goes down, the rest should continue to function correctly.
2. From a scalability point of view it is generally not acceptable to
get all the time requests serviced by a single time server.
In a large system, such a solution puts a heavy burden on that one
process.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
49. Clock Synchronization
2.Distributed algorithms
Distributed is the one in which there is no centralized time server
present.
Instead the nodes adjust their time by using their local time and then,
taking the average of the differences of time with other nodes.
Distributed algorithms overcome the issue of centralized algorithms
like the scalability and single point failure.
Distributed algorithms overcome the problems of centralized by
internally synchronizing for better accuracy.
One of the two approaches can be used:
I. Global Averaging Distributed Algorithms
II. Localized Averaging Distributes Algorithms
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
50. Clock Synchronization
I. Global Averaging Distributed Algorithms
One class of decentralized clock synchronization algorithms works by
dividing time into fixed- length resynchronization intervals.
In this approach, the clock process at each node broadcasts its local clock
time in the form of a special "resync" message.
When its local time equals To + iR for some integer i, where To is a fixed
time in the past agreed upon by all nodes and R is a system parameter
that depends on such factors as the total number of nodes in the system,
the maximum allowable drift rate, and so on.
That is, a resync message is broadcast from each node at the beginning of
every fixed-length resynchronization interval.
However, since the clocks of different nodes run at slightly different
rates, these broadcasts will not happen simultaneously from all nodes.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
51. Clock Synchronization
After broadcasting the clock value, the clock process of a node waits
for time T, where T is a parameter to be determined by the algorithm.
During this waiting period, the clock process collects the resync
messages broadcast by other nodes.
For each resync message, the clock process records the time,
according to its own clock, when the message was received.
At the end of the waiting period, the clock process estimates the
skew of its clock with respect to each of the other nodes on the basis
of the times at which it received resync messages.
It then computes a fault-tolerant average of the estimated skews and
uses it to correct the local clock before the start of the next
resynchronization interval.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
52. Clock Synchronization
II. Localized Averaging Distributed Algorithms
The global averaging algorithms do not scale well because they require the
network to support broadcast facility and also because of the large amount of
message traffic generated.
Therefore, they are suitable for small networks, especially for those that have
fully connected topology (in which each node has a direct communication
link to every other node). The localized averaging algorithms attempt to
overcome these drawbacks of the global averaging algorithms.
In this approach, the nodes of a distributed system are logically arranged in
some kind of pattern, such as a ring or a grid.
Periodically, each node exchanges its clock time with its neighbors in the
ring, grid, or other structure and then sets its clock time to the average of its
own clock time and the clock times of its neighbors.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
53. Clocks, events and process states
A distributed system consists of a collection P of N processes pi, i =
1,2,… N
Each process pi has a state si consisting of its variables (which it
transforms as it executes)
Processes communicate only by messages (via a network)
Actions of processes: Send, Receive, change own state
Event: the occurrence of a single action that a process carries out as it
executes e.g. Send, Receive, change state
Events at a single process pi, can be placed in a total ordering
denoted by the relation →i between the events. i.e.
e →i e’ if and only if event e occurs before event e’ at process pi
A history of process pi: is a series of events ordered by →i
history(pi) = hi =<ei0, ei1, ei2, …>
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
54. Lamport’s Logical Clock
As we already know, logical clocks, also sometimes called Lamport
timestamps, are counters.
But how do those counters work under the hood?
The answer may be surprisingly simple: the clocks are functions, and
its the function that does the work of “counting” for us!
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
55. Lamport’s Logical Clock
We can think of logical clocks
as functions, which take in an
event as their input, and
returns a timestamp, which
acts as the “counter”.
Each node — which is often
just a process — in a
distributed system has its own
local clock, and each process
needs to have its own logical
clock.
an event can be something
that occurs within a process
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
56. Lamport’s Logical Clock
Lamport’s clock algorithm
In many cases, Lamport’s algorithm for determining the time of an
event can be very straightforward.
If an event occurs on a single process, then we intuitively can guess
that the timestamp of the event will be greater than the event before
it, and that the difference between the timestamps of the two events
on the same process will depend only on how much the clock
increments by (remember, that incrementation can be totally
arbitrary!).
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
57. Lamport’s Logical Clock
To synchronize logical clocks, Lamport defined a relation called
happens-before.
The expression a ~ b is read "a happens before b" and means that all
processes agree that first event a occurs, then afterward, event b
occurs.
The happens-before relation can be observed directly in two situations:
1. If a and b are events in the same process, and a occurs before b, then
a ~ b is true.
2. If a is the event of a message being sent by one process, and b is the
event of the message being received by another process, then a ~ b is
also true.
A message cannot be received before it is sent, or even at the same
time it is sent, since it takes a finite, nonzero amount of time to arrive.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
58. Lamport’s Logical Clock
Happens-before is a transitive relation, so if a ~ band b ~ c, then a ~ c.
If two events, x and y, happen in different processes that do not
exchange messages (not even indirectly via third parties), then x ~ y is
not true, but neither is y ~ x.
These events are said to be concurrent, which simply means that
nothing can be said (or need be said) about when the events happened
or which event happened first.
What we need is a way of measuring a notion of time such that for
every event, a, we can assign it a time value C(a) on which all
processes agree.
These time values must have the property that if a ~ b, then C(a) <
C(b).
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
59. Lamport’s Logical Clock
To rephrase the conditions we stated earlier, if a and b are two
events within the same process and a occurs before b, then C(a) <
C(b).
Similarly, if a is the sending of a message by one process and b is
the reception of that message by another process, then C (a) and
C(b) must be assigned in such a way that everyone agrees on the
values of C (a) and C(b) with C(a) < C(b).
In addition, the clock time, C, must always go forward (increasing),
never backward (decreasing).
Corrections to time can be made by adding a positive value, never
by subtracting one.
Now let us look at the algorithm Lamport proposed for assigning
times to events.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
60. Lamport’s Logical Clock
The processes run on different machines, each with its own clock,
running at its own speed.
As can be seen from the figure, when the clock has ticked 6 times in
process P1, it has ticked 8 times in process P2 and 10 times in process
P3 Each clock runs at a constant rate, but the rates are different due to
differences in the crystals.
At time 6, process P, sends message 111 I to process P2
How long this message takes to arrive depends on whose clock you
believe.
In any event, the clock in process P2 reads 16 when it arrives.
If the message carries the starting time, 6, in it, process P2 will
conclude that it took 10 ticks to make the journey.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
62. Lamport’s Logical Clock
This value is certainly possible. According to this reasoning, message
m2 from P2 to R takes 16 ticks, again a plausible value.
Now consider message m3- It leaves process P3 at 60 and arrives at
P2 at '56.
Similarly, message m4 from P2 to P1 leaves at 64 and arrives at 54.
These values are clearly impossible. It is this situation that must be
prevented.
Lamport's solution follows directly from the happens-before relation.
Since m 3 left at 60, it must arrive at 61 or later.
Therefore, each message carries the sending time according to the
sender's clock.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
63. Lamport’s Logical Clock
When a message arrives and the receiver's clock shows a value prior
to the time the message was sent, the receiver fast forwards its clock
to be one more than the sending time.
In Fig. 6-9(b) we see that 1113 now arrives at 61.
Similarly, m4 arrives at 70.
To prepare for our discussion on vector clocks, let us formulate this
procedure more precisely.
At this point, it is important to distinguish three different layers of
software as we already encountered in Chap. 1: the network, a
middleware layer, and an application layer, as shown in Fig. 6-10.
What follows is typically part of the middleware layer.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
66. Lamport’s Logical Clock
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Algorithm:
Happened before relation(): a b, means ‘a’ happened before ‘b’.
Logical Clock: The criteria for the logical clocks are:
[C1]: Ci (a) < Ci(b), [ Ci Logical Clock, If ‘a’ happened before
‘b’, then time of ‘a’ will be less than ‘b’ in a particular process. ]
[C2]: Ci(a) < Cj(b), [ Clock value of Ci(a) is less than Cj(b) ]
Reference:
Process: Pi
Event: Eij, where i is the process in number and j: jth event in the ith
process.
tm: vector time span for message m.
67. Lamport’s Logical Clock
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
Ci vector clock associated with process Pi, the jth element is Ci[j]
and contains Pi‘s latest value for the current time in process Pj.
d: drift time, generally d is 1.
Implementation Rules[IR]:
[IR1]: If a -> b [‘a’ happened before ‘b’ within the same process]
then, Ci(b) =Ci(a) + d
[IR2]: Cj = max(Cj, tm + d) [If there’s more number of processes,
then tm = value of Ci(a), Cj = max value between Cj and tm + d].
Take the starting value as 1, since it is the 1st event and there is no
incoming value at the starting point:
e11 = 1
e21 = 1
69. Lamport’s Logical Clock
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
The value of the next point will go on increasing by d (d = 1), if
there is no incoming value i.e., to follow [IR1].
e12 = e11 + d = 1 + 1 = 2
e13 = e12 + d = 2 + 1 = 3
e14 = e13 + d = 3 + 1 = 4
e15 = e14 + d = 4 + 1 = 5
e16 = e15 + d = 5 + 1 = 6
e22 = e21 + d = 1 + 1 = 2
e24 = e23 + d = 3 + 1 = 4
e26 = e25 + d = 6 + 1 = 7
70. Lamport’s Logical Clock
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
When there will be incoming value, then follow [IR2] i.e., take the
maximum value between Cj and Tm + d.
e17 = max(7, 5) = 7, [e16 + d = 6 + 1 = 7, e24 + d = 4 + 1 = 5,
maximum among 7 and 5 is 7]
e23 = max(3, 3) = 3, [e22 + d = 2 + 1 = 3, e12 + d = 2 + 1 = 3,
maximum among 3 and 3 is 3]
e25 = max(5, 6) = 6, [e24 + 1 = 4 + 1 = 5, e15 + d = 5 + 1 = 6,
maximum among 5 and 6 is 6]
Limitation:
In case of [IR1], if a -> b, then C(a) < C(b) -> true.
In case of [IR2], if a -> b, then C(a) < C(b) -> May be true or may
not be true.
72. Vector Clock
Vector Clock is an algorithm that generates partial ordering of events
and detects causality violations in a distributed system.
These clocks expand on Scalar time to facilitate a causally consistent
view of the distributed system, they detect whether a contributed event
has caused another event in the distributed system.
It essentially captures all the causal relationships.
This algorithm helps us label every process with a vector(a list of
integers) with an integer for each local clock of every process within
the system.
So for N given processes, there will be vector/ array of size N.
How does the vector clock algorithm work :
Initially, all the clocks are set to zero.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
73. Vector Clock
Every time, an Internal event occurs in a process, the value of the
processes’s logical clock in the vector is incremented by 1
Also, every time a process sends a message, the value of the
processes’s logical clock in the vector is incremented by 1.
Every time, a process receives a message, the value of the processes’s
logical clock in the vector is incremented by 1, and moreover, each
element is updated by taking the maximum of the value in its own
vector clock and the value in the vector in the received message (for
every element).
Example-1:
Consider a process (P) with a vector size N for each process: the
above set of rules mentioned are to be executed by the vector clock:
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
75. Vector Clock
The above example depicts the vector clocks mechanism in which the
vector clocks are updated after execution of internal events, the arrows
indicate how the values of vectors are sent in between the processes
(P1, P2, P3).
Example-2:
To sum up, Vector clocks algorithms are used in distributed systems to
provide a causally consistent ordering of events but the entire Vector
is sent to each process for every message sent, in order to keep the
vector clocks in sync.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
76. Mutual Exclusion
Mutual exclusion is a concurrency control property which is
introduced to prevent race conditions.
It is the requirement that a process can not enter its critical section
while another concurrent process is currently present or executing in its
critical section i.e only one process is allowed to execute the critical
section at any given instance of time.
Mutual exclusion in single computer system Vs. distributed
system:
In single computer system, memory and other resources are shared
between different processes.
The status of shared resources and the status of users is easily available
in the shared memory so with the help of shared variable (For example:
Semaphores) mutual exclusion problem can be easily solved.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
77. Mutual Exclusion
In Distributed systems, we neither have shared memory nor a
common physical clock and there for we can not solve mutual
exclusion problem using shared variables.
To eliminate the mutual exclusion problem in distributed system
approach based on message passing is used.
A site in distributed system do not have complete information of
state of the system due to lack of shared memory and a common
physical clock.
Requirements of Mutual exclusion Algorithm:
No Deadlock:
Two or more site should not endlessly wait for any message that will
never arrive.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
78. Mutual Exclusion
No Starvation:
Every site who wants to execute critical section should get an
opportunity to execute it in finite time. Any site should not wait
indefinitely to execute critical section while other site are repeatedly
executing critical section
Fairness:
Each site should get a fair chance to execute critical section. Any
request to execute critical section must be executed in the order they
are made i.e Critical section execution requests should be executed
in the order of their arrival in the system.
Fault Tolerance:
In case of failure, it should be able to recognize it by itself in order
to continue functioning without any disruption.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
79. Mutual Exclusion
To understand mutual exclusion, let’s take an example
The changing room is nothing but the critical section, boy A and girl B
are two different processes, while the sign outside the changing room
indicates the process synchronization mechanism being used.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
80. Mutual Exclusion
Boy A decides upon some clothes to buy and heads to the changing
room to try them out. Now, while boy A is inside the changing room,
there is an ‘occupied’ sign on it – indicating that no one else can
come in. Girl B has to use the changing room too, so she has to wait
till boy A is done using the changing room.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
81. Mutual Exclusion
Once boy A comes out of the changing room, the sign on it changes
from ‘occupied’ to ‘vacant’ – indicating that another person can use
it. Hence, girl B proceeds to use the changing room, while the sign
displays ‘occupied’ again.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
82. Mutual Exclusion
Solution to distributed mutual exclusion:
As we know shared variables or a local kernel can not be used to
implement mutual exclusion in distributed systems. Message passing
is a way to implement mutual exclusion. Below are the three
approaches based on message passing to implement mutual
exclusion in distributed systems:
1. Token Based Algorithm:
A unique token is shared among all the sites.
If a site possesses the unique token, it is allowed to enter its critical
section
This approach uses sequence number to order requests for the
critical section.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
83. Mutual Exclusion
Each requests for critical section contains a sequence number. This
sequence number is used to distinguish old and current requests.
This approach insures Mutual exclusion as the token is unique
Example: Suzuki-Kasami’s Broadcast Algorithm
2. Non-token based approach:
A site communicates with other sites in order to determine which
sites should execute critical section next. This requires exchange of
two or more successive round of messages among sites.
This approach use timestamps instead of sequence number to order
requests for the critical section.
When ever a site make request for critical section, it gets a
timestamp. Timestamp is also used to resolve any conflict between
critical section requests.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
84. Mutual Exclusion
All algorithm which follows non-token based approach maintains a
logical clock. Logical clocks get updated according to Lamport’s
scheme
Example: Lamport's algorithm, Ricart–Agrawala algorithm
3. Quorum based approach:
Instead of requesting permission to execute the critical section from
all other sites, Each site requests only a subset of sites which is
called a quorum.
Any two subsets of sites or Quorum contains a common site.
This common site is responsible to ensure mutual exclusion
Example: Maekawa’s Algorithm
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
85. Mutual Exclusion - A Centralized Algorithm
A Centralized Algorithm:
The most straightforward way to achieve mutual exclusion in a
distributed system is to simulate how it is done in a one-processor
system.
One process is elected as the coordinator.
Whenever a process wants to access a shared resource, it sends a
request message to the coordinator stating which resource it wants to
access and asking for permission.
If no other process is currently accessing that resource, the
coordinator sends back a reply granting permission.
When the reply arrives, the requesting process can go ahead.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
86. Mutual Exclusion - A Centralized Algorithm
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
87. Mutual Exclusion - A Centralized Algorithm
Now suppose that another process, 2 in Fig. 6-14(b), asks for
permission to access the resource.
The coordinator knows that a different process is already at the
resource, so it cannot grant permission.
The exact method used to deny permission is system dependent.
In Fig. 6-14(b), the coordinator just refrains from replying, thus
blocking process 2, which is waiting for a reply. Alternatively, it
could send a reply saying "permission denied."
Either way, it queues the request from 2 for the time being and waits
for more messages.
When process 1 is finished with the resource, it sends a message to
the coordinator releasing its exclusive access, as shown in Fig.6-
14(c).
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
88. Mutual Exclusion - A Centralized Algorithm
The coordinator takes the first item off the queue of deferred
requests and sends that process a grant message.
If the process was still blocked (i.e., this is the first message to it), it
unblocks and accesses the resource.
If an explicit message has already been sent denying permission, the
process will have to poll for incoming traffic or block later.
Either way, when it sees the grant, it can go ahead as well.
It is easy to see that the algorithm guarantees mutual exclusion: the
coordinator only lets one process at a time to the resource.
It is also fair, since requests are granted in the order in which they
are received.
No process ever waits forever (no starvation).
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
89. Mutual Exclusion - A Decentralized Algorithm
A Decentralized Algorithm:
Having a single coordinator is often a poor approach. Let us take a
look at fully decentralized solution.
Each resource is assumed to be replicated n times. Every replica has
its own coordinator for controlling the access by concurrent
processes.
However, whenever a process wants to access the resource, it will
simply need to get a majority vote coordinators.
This scheme essentially makes the original centralized solution less
vulnerable to failures of a single coordinator.
The assumption is that when a coordinator crashes, it recovers
quickly but will have forgotten any vote it gave before it crashed.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
90. Mutual Exclusion - A Distributed Algorithm
To many, having a probabilistic ally correct algorithm is just not
good enough.
So researchers have looked for deterministic distributed mutual
exclusion algorithms. Lamport's 1978 paper on clock
synchronization presented the first one.
Ricart and Agrawala (1981) made it more efficient.
The algorithm works as follows. When a process wants to access a
shared resource, it builds a message containing the name of the
resource, its process number, and the current (logical) time.
It then sends the message to all other processes, conceptually
including itself.
The sending of messages is assumed to be reliable; that is, no
message is lost.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
91. Mutual Exclusion - A Distributed Algorithm
When a process receives a request message from another process,
the action it takes depends on its own state with respect to the
resource named in the message.
Three different cases have to be clearly distinguished:
1. If the receiver is not accessing the resource and does not want to
access it, it sends back an OK message to the sender.
2. If the receiver already has access to the resource, it simply does
not reply. Instead, it queues the request.
3. If the receiver wants to access the resource as well but has not yet
done so, it compares the timestamp of the incoming message with
me. one contained in the message that it has sent everyone. The
lowest one wins. If the incoming message has a lower timestamp,
the receiver sends back an OK message.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
92. Mutual Exclusion - A Distributed Algorithm
If its own message has a lower timestamp, the receiver queues the
incoming request and sends nothing.
After sending out requests asking permission, a process sits back and
waits until everyone else has given permission.
As soon as all the permissions are in, it may go ahead.
When it is finished, it sends OK messages to all processes on its
queue and deletes them all from the queue.
Let us try to understand why the algorithm works. If there is no
conflict, it clearly works.
However, suppose that two processes try to simultaneously access
the resource, as shown in Fig. 6-15(a).
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
93. Mutual Exclusion - A Distributed Algorithm
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
94. Mutual Exclusion - A Distributed Algorithm
Process 0 sends everyone a request with timestamp 8, while at the same
time, process 2 sends everyone a request with timestamp 12.
Process 1 is not interested in the resource, so it sends OK to both senders.
Processes 0 and 2 both see the conflict and compare timestamps.
Process 2 sees that it has lost, so it grants permission to 0 by sending OK.
Process 0 now queues the request from 2 for later processing and access
the resource, as shown in Fig. 6-15(b).
When it is finished, it removes the request from 2 from its queue and
sends an OK message to process 2, allowing the latter to go ahead, as
shown in Fig. 6-15(c).
The algorithm works because in the case of a conflict, the lowest
timestamp wins and everyone agrees on the ordering of the timestamps.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
95. Mutual Exclusion - A Token Ring Algorithm
A Token Ring Algorithm:
A completely different approach to deterministically achieving
mutual exclusion in a distributed system is illustrated in Fig. 6-16.
Here we have a bus network, as shown in Fig. 6-16(a), (e.g.,
Ethernet), with no inherent ordering of the processes.
In software, a logical ring is constructed in which each process is
assigned a position in the ring, as shown in Fig. 6-16(b).
The ring positions may be allocated in numerical order of network
addresses or some other means.
It does not matter what the ordering is. All that matters is that each
process knows who is next in line after itself.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
96. Mutual Exclusion - A Token Ring Algorithm
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
97. Mutual Exclusion - A Token Ring Algorithm
When the ring is initialized, process 0 is given a token. The token
circulates around the ring.
It is passed from process k to process k +1 (modulo the ring size) in
point-to-point messages.
When a process acquires the token from its neighbor, it checks to see
if it needs to access the shared resource.
If so, the process goes ahead, does all the work it needs to, and
releases the resources.
After it has finished, it passes the token along the ring.
It is not permitted to immediately enter the resource again using the
same token.
If a process is handed the token by its neighbor and is not interested
in the resource, it just passes the token along.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
98. Mutual Exclusion - A Token Ring Algorithm
As a consequence, when no processes need the resource, the token just
circulates at high speed around the ring.
The correctness of this algorithm is easy to see. Only one process has the
token at any instant, so only one process can actually get to the resource.
Since the token circulates among the processes in a well-defined order,
starvation cannot occur.
Once a process decides it wants to have access to the resource, at worst it
will have to wait for every other process to use the resource.
As usual, this algorithm has problems too. If the token is ever lost, it
must be regenerated.
In fact, detecting that it is lost is difficult, since the amount of time
between successive appearances of the token on the network is
unbounded.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
99. Mutual Exclusion - A Token Ring Algorithm
The fact that the token has not been spotted for an hour does not mean
that it has been lost; somebody may still be using it.
The algorithm also runs into trouble if a process crashes, but recovery is
easier than in the other cases.
If we require a process receiving the token to acknowledge receipt, a
dead process will be detected when its neighbor tries to give it the token
and fails.
At that point the dead process can be removed from the group, and the
token holder can throw the token over the head of the dead process to the
next member down the line, or the one after that, if necessary.
Of course, doing so requires that everyone maintain the current ring
configuration.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
100. A Comparison of the Four Algorithms
A brief comparison of the four mutual exclusion algorithms we have
looked at is instructive.
In Fig. 6-17 we have listed the algorithms and three key properties:
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
101. A Comparison of the Four Algorithms
The number of messages required for a process to access and release
a shared resource, the delay before access can occur (assuming
messages are passed sequentially over a network), and some
problems associated with each algorithm.
The centralized algorithm is simplest and also most efficient.
It requires only three messages to enter and leave a critical region: a
request, a grant to enter, and a release to exit.
In the decentralized case, we see that these messages need to be
carried out for each of the m coordinators, but now it is possible that
several attempts need to be made (for which we introduce the
variable k).
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
102. A Comparison of the Four Algorithms
The distributed algorithm requires n - 1 request messages, one to
each of the other processes, and an additional n - 1 grant messages,
for a total of 2(n- 1). (We assume that only point-to-point
communication channels are used.) With the token ring algorithm,
the number is variable.
If every process constantly wants to enter a critical region, then each
token pass will result in one entry and exit, for an average of one
message per critical region entered.
At the other extreme, the token may sometimes circulate for hours
without anyone being interested in it.
In this case, the number of messages per entry into a critical region
is unbounded.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
103. ELECTION ALGORITHMS
Election algorithms choose a process from group of processors to
act as a coordinator.
If the coordinator process crashes due to some reasons, then a new
coordinator is elected on other processor.
Election algorithm basically determines where a new copy of
coordinator should be restarted.
Election algorithm assumes that every active process in the system
has a unique priority number. The process with highest priority will
be chosen as a new coordinator.
Hence, when a coordinator fails, this algorithm elects that active
process which has highest priority number.
Then this number is send to every active process in the distributed
system.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
104. Election Algorithms - The Bully Algorithm
We have two election algorithms for two different configurations of
distributed system.
1.) The Bully Algorithm
2.) The Ring Algorithm
1.) The Bully Algorithm-
As a first example, consider the bully algorithm devised by Garcia-
Molina (1982).
When any process notices that the coordinator is no longer
responding to requests, it initiates an election.
A process, P, holds an election as follows:
1. P sends an ELECTION message to all processes with higher
numbers.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
105. Election Algorithms - The Bully Algorithm
2. If no one responds, P wins the election and becomes coordinator.
3. If one of the higher-ups answers, it takes over. P's job is done.
At any moment, a process can get an ELECTION message from one
of its lower-numbered colleagues.
When such a message arrives, the receiver sends an OK message
back to the sender to indicate that he is alive and will take over.
The receiver then holds an election, unless it is already holding one.
Eventually, all processes give up but one, and that one is the new
coordinator.
It announces its victory by sending all processes a message telling
them that starting immediately it is the new coordinator.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
106. Election Algorithms - The Bully Algorithm
If a process that was previously down comes back up, it holds an
election.
If it happens to be the highest-numbered process currently running,
it will win the election and take over the coordinator's job.
Thus the biggest guy in town always wins, hence the name "bully
algorithm."
In Fig. 6-20 we see an example of how the bully algorithm works.
The group consists of eight processes, numbered from 0 to 7.
Previously process 7 was the coordinator, but it has just crashed.
Process 4 is the first one to notice this, so it sends ELECTION
messages to all the processes higher than it, namely 5, 6, and 7. as
shown in Fig. 6-20(a).
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
107. Election Algorithms - The Bully Algorithm
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
108. Election Algorithms - The Bully Algorithm
Processes 5 and 6 both respond with OK, as shown in Fig. 6-20(b).
Upon getting the first of these responses, 4 knows that its job is over.
It knows that one of these bigwigs will take over and become
coordinator.
It just sits back and waits to see who the winner will be (although at
this point it can make a pretty good guess).
In Fig. 6-20(c), both 5 and 6 hold elections, each one only sending
messages to those processes higher than itself.
In Fig. 6-20(d) process 6 tells 5 that it will take over.
At this point 6 knows that 7 is dead and that it (6) is the winner.
If there is state information to be collected from disk or elsewhere to
pick up where the old coordinator left off, 6 must now do what is
needed.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
109. Election Algorithms - The Bully Algorithm
When it is ready to take over, 6 announces this by sending a
COORDINATOR message to all running processes.
When 4 gets this message, it can now continue with the operation it
was trying to do when it discovered that 7 was dead, but using 6 as
the coordinator this time.
In this way the failure of 7 is handled and the work can continue.
If process 7 is ever restarted, it will just send an the others a
COORDINATOR message and bully them into submission.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
110. Election Algorithms - A Ring Algorithm
2.) A Ring Algorithm-
Another election algorithm is based on the use of a ring.
Unlike some ring algorithms, this one does not use a token.
We assume that the processes are physically or logically ordered, so
that each process knows who its successor is.
When any process notices that the coordinator is not functioning, it
builds an ELECTION message containing its own process number
and sends the message to' its successor.
If the successor is down, the sender skips over the successor and
goes to the next member along the ring. or the one after that, until a
running process is located.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
111. Election Algorithms - A Ring Algorithm
At each step along the way, the sender adds its own process number
to the list in the message effectively making itself a candidate to be
elected as coordinator.
Eventually, the message gets back to the process that started it all.
That process recognizes this event when it receives an incoming
message containing its own process number.
At that point, the message type is changed to COORDINATOR and
circulated once again, this time to inform everyone else who the
coordinator is (the list member with the highest number) and who
the members of the new ring are.
When this message has circulated once, it is removed and everyone
goes back to work.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
112. Election Algorithms - A Ring Algorithm
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
113. Election Algorithms - A Ring Algorithm
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
114. Election Algorithms - A Ring Algorithm
In Fig. 6-21 we see what happens if two processes, 2 and 5, discover
simultaneously that the previous coordinator, process 7, has crashed.
Each of these builds an ELECTION message and each of them starts
circulating its message, independent of the other one.
Eventually, both messages will go all the way around, and both 2
and 5 will convert them into COORDINATOR messages, with
exactly the same members and in the same order.
When both have gone around again, both will be removed.
It does no harm to have extra messages circulating; at worst it
consumes a little bandwidth, but this not considered wasteful.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
115. Multicast Communication
Communication between two processes in a distributed system is
required to exchange various data, such as code or a file, between
the processes.
When one source process tries to communicate with multiple
processes at once, it is called Group Communication.
A group is a collection of interconnected processes with abstraction.
This abstraction is to hide the message passing so that the
communication looks like a normal procedure call.
Group communication also helps the processes from different hosts
to work together and perform operations in a synchronized manner,
therefore increases the overall performance of the system.
Types of Group Communication in a Distributed System :
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
117. Multicast Communication
1. Broadcast Communication : When the host process tries to
communicate with every process in a distributed system at same time.
Broadcast communication comes in handy when a common stream of
information is to be delivered to each and every process in most
efficient manner possible.
Since it does not require any processing whatsoever, communication
is very fast in comparison to other modes of communication.
However, it does not support a large number of processes and cannot
treat a specific process individually.
Broadcast is the term used to describe communication where a piece
of information is sent from one point to all other points.
In this case there is just one sender, but the information is sent to all
connected receivers.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
119. Multicast Communication
2. Multicast Communication :
When the host process tries to communicate with a designated group
of processes in a distributed system at the same time.
This technique is mainly used to find a way to address problem of a
high workload on host system and redundant information from
process in system.
Multitasking can significantly decrease time taken for message
handling.
Multicast is the term used to describe communication where a piece
of information is sent from one or more points to a set of other points.
In this case there is may be one or more senders, and the information
is distributed to a set of receivers (there may be no receivers, or any
other number of receivers).
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
120. Multicast Communication
One example of an application which may use multicast is a video
server sending out networked TV channels.
Simultaneous delivery of high quality video to each of a large number
of delivery platforms will exhaust the capability of even a high
bandwidth network with a powerful video clip server.
This poses a major salability issue for applications which required
sustained high bandwidth.
One way to significantly ease scaling to larger groups of clients is to
employ multicast networking.
Multicasting is the networking technique of delivering the same
packet simultaneously to a group of clients.
IP multicast provides dynamic many-to-many connectivity between a
set of senders (at least 1) and a group of receivers.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
122. Multicast Communication
3. Unicast Communication :
When the host process tries to communicate with a single process in a
distributed system at the same time.
Although, same information may be passed to multiple processes.
This works best for two processes communicating as only it has to
treat a specific process only.
However, it leads to overheads as it has to find exact process and then
exchange information/data.
Unicast is the term used to describe communication where a piece of
information is sent from one point to another point.
In this case there is just one sender, and one receiver.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
124. Multicast Communication
IP Multicast
IP Multicast enables one-to-many communication at the network
layer.
A source sends a packet destined for a group of other hosts and the
intermediate routers take care of replicating the packet as necessary.
The intermediate routers are also responsible for determining which
hosts are members of the group.
IP Multicast uses UDP for communication, therefore it is unreliable.
In order to deliver a single message to several destinations, the
routers that connect the members of the group organize into a tree.
There are several algorithms for determining the edges of the tree.
A shared tree is formed by selecting a center node.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
125. Summary
Mr. Sagar Pandya sagar.pandya@medicaps.ac.in
Strongly related to communication between processes is the issue of
how processes in distributed systems synchronize.
Synchronization is all about doing the right thing at the right time.
A problem in distributed systems, and computer networks in general, is
that there is no notion of a globally shared clock.
In other words, processes on different machines have their own idea of
what time it is.
There are various way to synchronize clocks in a distributed system, but
all methods are essentially based on exchanging clock values, while
taking into account the time it takes to send and receive messages.
Variations in communication delays and the way those variations are
dealt with, largely determine the accuracy of clock synchronization
algorithms
126. Summary
Mr. Sagar Pandya sagar.pandya@medicaps.ac.in
An important class of synchronization algorithms is that of distributed
mutual exclusion.
These algorithms ensure that in a distributed collection of processes, at
most one process at a time has access to a shared resource.
Synchronization between processes often requires that one process acts
as a coordinator.
In those cases where the coordinator is not fixed, it is necessary that
processes in a distributed computation decide on who is going to be that
coordinator. Such a decision is taken by means of election algorithms.
Election algorithms are primarily used in cases where the coordinator
can crash.
However, they can also be applied for the selection of super peers in
peer-to-peer systems.
127. Unit – 3
Assignment Questions Marks:-20
Mr. Sagar Pandya sagar.pandya@medicaps.ac.in
Q.1
129. Thank You
Great God, Medi-Caps, All the attendees
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
www.sagarpandya.tk
LinkedIn: /in/seapandya
Twitter: @seapandya
Facebook: /seapandya