Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Coordination and Agreement

74 views

Published on

Coordination and Agreement

Published in: Education
  • Be the first to comment

  • Be the first to like this

Coordination and Agreement

  1. 1. DISTRIBUTED SYSTEMS B. Tech IV/IV, II Semester COMPUTER SCIENCE & ENGINEERING 2/6/2017 1@Copyright Material
  2. 2. UNIT VII 2/6/2017 2 Distributed Systems Foundation Middleware Support Systems Algorithms & Shared Data Characterization of Distributed Systems System Models Inter Process Communication Distributed Objects Remote Invocation Operating System Distributed File System Coordination & Agreement Transactions & Replications @Copyright Material
  3. 3. What we learnt till NOW 1. Characterization of a Distributed System 1. Need for a Distributed System 2. Evolution of a Distributed System. 3. Examples of it & details on Resource Sharing 4. Challenges of it 2. How to build a Distributed System Model 1. Introduction to a System Model 2. Types of System models 3. Functionalities of each of these models 3. How to achieve Inter process communication 1. Learned sockets for establishing communication 2. Fundamentals of Protocols required for communication 3. Learnt about XDR for handling heterogeneity 4. Handled Multicast communication 4. Various programming models 1. Object model & Distributed Object Model 2. RMI, CORBA & XML 3. Role of Events & Notification 4. Case study of RMI 2/6/2017 3@Copyright Material
  4. 4. What we learnt till NOW 5. Operating System, Types & Layers 1. Types of Operating System 2. Various Layers of OS 3. Protection 4. Processes & Threads 5. Communication & Invocation 6. Operating System Architecture 2/6/2017 4@Copyright Material
  5. 5. What we learnt in Unit VI 1. Introduction to File System – Characteristics of a File System – Requirements of a File System 2. Architecture of a File Service 3. Implementations of a Distributed File System – Sun Network File System • NFS Protocol, Architecture, Operations with an Example, Goals 4. Introduction to Peer-to-Peer Systems – Characteristics of Peer-to-Peer Systems – Routing Mechanisms 5. Napster & its Legacy 6. Need for Peer-to-Peer Middleware – Functional & Non-Functional Requirements 7. Routing Overlays – Tasks of Routing Overlays 2/6/2017 5@Copyright Material
  6. 6. Topic of Content 1. Introduction 1. Failure Assumption 2. Failure Detectors 2. Distributed Mutual Exclusion 1. Requirements for Mutual Exclusion 2. Algorithms for Mutual Exclusion 1. Central Server Algorithm 2. Ring Based Algorithm 3. Algorithm using Multicast & Logical Clock 4. Voting Algorithm 3. Fault tolerance of these algorithms 4. Considerations 3. Elections 1. Ring Based Election 2. Bully Algorithm 3. Performance 4. Coordination & Agreement in group Communication 1. Basic Multicast 2. Reliable Multicast 3. Ordered Multicast 2/6/2017 6@Copyright Material
  7. 7. Learning Objective Upon completion of this unit, students will be able to: • Understanding of the goals of the problems of coordination and agreement in distributed systems • Learn the various algorithmic techniques for addressing the concerns of Coordination & Agreement • Understanding of the theoretical and practical limits to solving them • Introduce algorithms for distributed mutual exclusion • Introduce election algorithms, for multicast communication • To appreciate the impact of whether we use a synchronous or asynchronous system model on the algorithms we construct. 2/6/2017 7@Copyright Material
  8. 8. 1. Introduction Answer the following questions 1. How to consider and deal with failures when designing algorithms 2. How to coordinate their actions or to agree on one or more values?? 3. Is it generally important that the processes within a distributed system have some sort of agreement 4. How distributed processes can agree on particular values 2/6/2017 8@Copyright Material Mutual Exclusion Shared Memory Failure Assumptions & Failure Detectors Yes, it is very important!! Consensus algorithms
  9. 9. 1. Introduction 2/6/2017 9@Copyright Material How to consider and deal with failures when designing algorithms FAILURE ASSUMPTIONS: For the sake of simplicity, we assumes that each pair of processes is connected by reliable channels Ex: by retransmitting missing or corrupted messages, even in the case of communication channel failure Failure assumptions, in brief 1. Any communication uses, Reliable communication channels 2. Processes would only fail by crashing unless state otherwise To detect a process failure with the said assumptions, we have 2 types of Failure Detectors 1. Unreliable failure detector 2. Reliable failure detector
  10. 10. 1. Introduction 2/6/2017 10@Copyright Material Reliable failure detector • It is one that is always accurate in detecting a process’s failure. It answers processes’ queries with either a response of • Unsuspected – signifies that the detector has recently received evidence suggesting that the process has not failed • Failed -- signifies that the detector has determined that the process has crashed • Failure detectors may sometimes give different responses to different processes, since communication conditions vary from process to process • One can implement a Reliable Failure Detector
  11. 11. 1. Introduction 2/6/2017 11@Copyright Material Unreliable failure detector • One of two values: unsuspected or suspected • Evidence of possible failures Ex: Most of the practical systems 1. Each process sends “alive/I’m here” message to everyone 2. If not receiving “alive” message after timeout, it’s suspected • may function correctly, but network might have partitioned Implementation A simple algorithm • If we assume that all messages are delivered within some bound, say ‘D’ seconds. • Then we can implement a simple failure detector as: • Every process ‘P’ sends a “P is still alive" message to all failure detector processes, periodically, once every T seconds • If a failure detector process does not receive a message from any process ‘Q’ within ‘T + D’ seconds of the previous one then it marks ‘Q’ as Suspected
  12. 12. 2. Distributed Mutual Exclusion 2/6/2017 12@Copyright Material WHY MUTUAL EXCLUSION Consider an example, where 2 processes try to remove 2 nodes (which are not the head or the tail) from a shared Single linked list. • From the figure, let us assume i & i+1 be removed • Let process A remove i & process B remove i+1 Process A Process B newNode = i.next newNode = (i+1).next i.next = newNode (i-1).next = newNode Linked list after removal operations should have been like But we see the final Linked List as This is a situation called as ”Race Condition”
  13. 13. 2. Distributed Mutual Exclusion 2/6/2017 13@Copyright Material Mutual Exclusion • Distributed processes often need to coordinate their activities. • If a collection of processes share a resource or collection of resources, then often mutual exclusion is required to prevent interference and ensure consistency when accessing the resources. • Consider an example where • Processes A and B both wish to add some value to a shared variable `a'. • To do so they must store the temporary result of the current value for the shared variable `a' and the value to be added. • The intended increment for ‘a’ was 30(A adding 10 & B adding 20) but B's increment is rewritten, ensuring correct operation being performed at A & B. • This way of achieving Mutual Exclusion is Source Code Control System Time(secs) Process A Process B Remarks 1 t = a +10 A stores temporary value in t 2 t' = a + 20 B stores temporary value in t' 3 a = t' a = 25 4 a = t a = 15
  14. 14. 2. Distributed Mutual Exclusion 2/6/2017 14@Copyright Material Mutual Exclusion  On a local system mutual exclusion is usually a service offered by the operating system's kernel.  But for a distributed system we require a solution that operates only via message passing  Assumptions 1. The system is asynchronous 2. Processes do not fail 3. Message delivery is reliable: all messages are eventually delivered exactly once.
  15. 15. 2. Distributed Mutual Exclusion 2/6/2017 15@Copyright Material  Quick recap on rules of Mutual Exclusion • Safety: At most one process may execute in the critical section at a time • Liveness: Requests to enter and exit the critical section(CS) eventually succeed . This property is essential for processes to avoid deadlock and starvation • Ordering: If one request to enter the CS happened-before another, then entry to the CS is granted in that order  The Liveness property assures that we are free from both deadlock and starvation. Freedom from starvation is referred to as a “fairness" property  Another fairness property is the order in which processes are granted access to the critical section  Given that we cannot ascertain which event of a set occurred first we instead appeal to the “happened-before" logical ordering of events  We define the Fairness property as: If e1 and e2 are requests to enter the critical section and e1 ! e2, then the requests should be granted in that order.  Note: our assumption of request-enter-exit means that process will not request a second access until after the first is granted
  16. 16. 2. Distributed Mutual Exclusion 2/6/2017 16@Copyright Material  Assumptions 1. The system is asynchronous 2. Processes do not fail 3. Message delivery is reliable: all messages are eventually delivered exactly once.  The application-level protocol for executing a critical section is as follows:  enter() // enter critical section – block if necessary  resourceAccesses()// access shared resources in critical section  exit() // leave critical section – other processes may now enter  To evaluate the performance of the algorithms 1. Bandwidth consumed The number of messages sent in each entry and exit operation 2. Client Delay Incurred by a process at each entry and exit operations 3. Throughput Synchronization delay: delay between one process exiting the critical section and the next process entering it
  17. 17. 2. Distributed Mutual Exclusion 2/6/2017 17@Copyright Material Generic algorithms for Mutual Exclusion We will look at the following algorithms which provide mutual exclusion to a shared resource: 1. The central-server algorithm 2. The ring-based algorithm 3. Ricart and Agrawala - based on multicast and logical clocks 4. Maekawa’s voting algorithm We will compare these algorithms with respect to: 1. Their ability to satisfy three desired properties 2. Their performance characteristics 3. How fault tolerant they are
  18. 18. 2. Distributed Mutual Exclusion 2/6/2017 18@Copyright Material Generic algorithms for Mutual Exclusion 1. CENTRAL SERVER ALGORITHM • The simplest way to ensure mutual exclusion is through the use of a centralized server • This is analogous to the operating system acting as an arbiter • There is a conceptual token, processes must be in possession of the token in order to execute the critical section • The centralized server maintains ownership of the token • To request the token; a process sends a request to the server 1. If the server currently has the token it immediately responds with a message, granting the token to the requesting process 2. When the process completes the critical section it sends a message back to the server, relinquishing the token 3. If the server doesn't have the token, some other process is “currently" in the critical section 4. In this case the server queues the incoming request for the token and responds only when the token is returned by the process directly ahead of the requesting process in the queue (which may be the process currently using the token)
  19. 19. 2. Distributed Mutual Exclusion 2/6/2017 19@Copyright Material Generic algorithms for Mutual Exclusion 1. CENTRAL SERVER ALGORITHM • Server is performance bottleneck & if server crashes, system crashes • It satisfies Safety, Liveness but not Ordering P1 P2 P3 P4 42 1.Token Acquired 2.Request for token 3.Request for token CENTRAL SERVER 4. Token Released 2 5. Token Acquired
  20. 20. 2. Distributed Mutual Exclusion 2/6/2017 20@Copyright Material Generic algorithms for Mutual Exclusion 2. RING ALGORITHM • A simple way to arrange for mutual exclusion without the need for a master process, is to arrange the processes in a logical ring. • The ring may of course bear little resemblance to the physical network or even the direct links between processes. • The token passes around the ring continuously. • When a process receives the token from its neighbour: • If it does not require access to the critical section it immediately forwards on the token to the next neighbour in the ring • If it requires access to the critical section, the process, pi sends messages to p(i+1) mod N • Retains the token • Performs the critical section and then • To relinquish access to the critical section • Forwards the token on to the next neighbor in the ring
  21. 21. 2. Distributed Mutual Exclusion 2/6/2017 21@Copyright Material Generic algorithms for Mutual Exclusion 2. RING ALGORITHM P1 P2 P3 P4 P5 P6P9 P8 P7 P10 Token RetainedToken Released Token ReleasedToken Retained
  22. 22. 2. Distributed Mutual Exclusion 2/6/2017 22@Copyright Material Generic algorithms for Mutual Exclusion 2. RING ALGORITHM • Once again it is straight forward to determine that this algorithm satisfies the Safety and Liveness properties. • However once again we fail to satisfy the Ordering property • Suppose again we have two processes P1 and P2 consider the following events 1. Process P1 wishes to enter the critical section but must wait for the token to reach it. 2. Process P1 sends a message m to process P2. 3. The token is currently between process P1 and P2 within the ring, but the message m reaches process P2 before the token. 4. Process P2 after receiving message m wishes to enter the critical section 5. The token reaches process P2 which uses it to enter the critical section before process P1
  23. 23. 2. Distributed Mutual Exclusion 2/6/2017 23@Copyright Material Generic algorithms for Mutual Exclusion 3. MULTICAST & LOGICAL CLOCKS • Ricart and Agrawala developed an algorithm for mutual exclusion based upon multicast and logical clocks • The idea is that a process which requires access to the critical section first broadcasts this request to all processes within the group • It may then only actually enter the critical section once each of the other processes have granted their approval • The other processes do not just grant their approval indiscriminately • Instead their approval is based upon whether or not they consider their own request to have been made first
  24. 24. 2. Distributed Mutual Exclusion 2/6/2017 24@Copyright Material Generic algorithms for Mutual Exclusion 3. MULTICAST & LOGICAL CLOCKS • Each process maintains its own Lamport clock • Recall that Lamport clocks provide a partial ordering of events but that this can be made a total ordering by considering the process identifier of the process observing the event • Requests to enter the critical section are multicast to the group of processes and have the form {T; pi} • T is the Lamport time stamp of the request and pi is the process identifier • This provides us with a total ordering of the sending of a request message {T1; pi} < {T2; pj} if: • T1 < T2 or • T1 = T2 and pi < pj
  25. 25. 2. Distributed Mutual Exclusion 2/6/2017 25@Copyright Material Generic algorithms for Mutual Exclusion 3. MULTICAST & LOGICAL CLOCKS Requesting Entry • Each process retains a variable indicating its state, it can be: 1. “Released”  Not in or requiring entry to the critical section 2. “Wanted”  Requiring entry to the critical section 3. “Held”  Acquired entry to the critical section and has not yet relinquished that access. • When a process requires entry to the critical section it updates its state to “Wanted” and multicasts a request to enter the critical section to all other processes. It stores the request message {Ti, pi} • Only if it has received a “permission granted” message from all other processes does it change its state to “Held” and use the critical section
  26. 26. 2. Distributed Mutual Exclusion 2/6/2017 26@Copyright Material Generic algorithms for Mutual Exclusion 3. MULTICAST & LOGICAL CLOCKS Responding to the Requests Upon receiving such a request a process: • Currently in the “Released” state can immediately respond with a permission granted message • A process currently in the “Held” state: • Queues the request and continues to use the critical section • Once finished using the critical section responds to all such queued requests with a permission granted message • Changes its state back to “Released” • A process currently in the “Wanted" state: • Compares the incoming request message {Tj, pi} with its own stored request message {Ti, pi} which it broadcasted • If {Ti; pi} < {Tj; pj} then the incoming request is queued as if the current process was already in the “Held" state • If {Ti; pi} > {Tj; pj} then the incoming request is responded to with a permission granted message as if the current process was in the “Released" state
  27. 27. 2. Distributed Mutual Exclusion 2/6/2017 27@Copyright Material Generic algorithms for Mutual Exclusion 3. MULTICAST & LOGICAL CLOCKS Safety: If two or more processes request entry concurrently then whichever request bares the lowest (totally ordered) timestamp will be the first process to enter the critical section All others will not receive a permission granted message from (at least) that process until it has exited the critical section Liveness: Since the request message timestamps are a total ordering, and all requests are either responded to immediately or queued and eventually responded to, all requests to enter the critical section are eventually granted Ordering: Since lamport clocks assure us that e1  e2 implies L(e1) < L(e2): • for any two requests r1; r2 if r1  r2 then the timestamp for r1 will be less than the timestamp for r2 • Hence the process that multicast r1 will not respond to r2 until after it has used the critical section Therefore this algorithm satisfies all three desired properties
  28. 28. 2. Distributed Mutual Exclusion 2/6/2017 28@Copyright Material Generic algorithms for Mutual Exclusion 3. MULTICAST & LOGICAL CLOCKS Ricart & Agrawala’s Algorithm
  29. 29. P2 is buffered with its timestamp 11:45:40AM 2. Distributed Mutual Exclusion 2/6/2017 29@Copyright Material Generic algorithms for Mutual Exclusion 3. MULTICAST & LOGICAL CLOCKS Ricart & Agrawala’s Algorithm P1 P3 P2 <11:45:00AM, P1> Released Released Released Wanted 11:45:00AM GrantedHELD <11:45:40AM, P2> Wanted 11:45:40AM Granted HELD
  30. 30. 2. Distributed Mutual Exclusion 2/6/2017 30@Copyright Material Generic algorithms for Mutual Exclusion 4. MAEKAWA’S VOTING ALGORITHM • Maekawa's voting algorithm improves upon the multicast/logical clock algorithm with the observation that not all the peers of a process need grant it access • A process only requires permission from a subset of all the peers, provided that the subsets associated with any pair of processes overlap • The main idea is that processes vote for which of a group of processes vying for the critical section can be given access • The processes that are within the intersection of two competing processes can ensure that the Safety property is observed
  31. 31. 2. Distributed Mutual Exclusion 2/6/2017 31@Copyright Material Generic algorithms for Mutual Exclusion 4. MAEKAWA’S VOTING ALGORITHM Each process pi is associated with a voting set Vi of processes The set Vi for the process pi is chosen such that: 1. Pi ε Vi  A process is in its own voting set 2. Vi ∩ Vj ≠ { }  There is at least one process in the overlap between any two voting sets 3. |Vi| = |Vj| All voting sets are the same size 4. Each process pi is contained within M voting sets • The main idea in contrast to the previous algorithm is that each process may only grant access to one process at a time • A process which has already granted access to another process cannot do the same for a subsequent request. In this sense it has already voted • Those subsequent requests are queued • Once a process has used the critical section it sends a release message to its voting set • Once a process in the voting set has received a release message it may once again vote, and does so immediately for the head of the queue of requests if there is one
  32. 32. 2. Distributed Mutual Exclusion 2/6/2017 32@Copyright Material Generic algorithms for Mutual Exclusion 4. MAEKAWA’S VOTING ALGORITHM As before each process maintains a state variable which can be one of the following: 1. “Released”  Does not have access to the critical section and does not require it 2. “Wanted”  Does not have access to the critical section but does require it 3. “Held”  Currently has access to the critical section • In addition each process maintains a boolean variable indicating whether or not the process has “voted” • Of course voting is not a one-time action. This variable really indicates whether some process within the voting set has access to the critical section and has yet to release it • To begin with, these variables are set to “Released” and False respectively
  33. 33. 2. Distributed Mutual Exclusion 2/6/2017 33@Copyright Material Generic algorithms for Mutual Exclusion 4. MAEKAWA’S VOTING ALGORITHM Requesting Permission To request permission to access the critical section a process pi: 1. Updates its state variable to “Wanted” 2. Multicasts a request to all processes in the associated voting set Vi 3. When the process has received a “permission granted” response from all processes in the voting set Vi : update state to “Held" and use the critical section 4. Once the process is finished using the critical section, it updates its state again to “Released" and multicasts a “release" message to all members of its voting set Vi
  34. 34. 2. Distributed Mutual Exclusion 2/6/2017 34@Copyright Material Generic algorithms for Mutual Exclusion 4. MAEKAWA’S VOTING ALGORITHM Granting Permission  When a process pj receives a request message from a process pi :  If its state variable is “Held” or its voted variable is True: 1. Queue the request from pi without replying  Else: 1. send a “Permission Granted” message to pi 2. set the voted variable to True  When a process pj receives a “release" message:  If there are no queued requests: 1. set the voted variable to False  Else 1. Remove the head of the queue, pq: 2. Send a “permission granted" message to pq 3. The voted variable remains as True
  35. 35. 2. Distributed Mutual Exclusion 2/6/2017 35@Copyright Material Generic algorithms for Mutual Exclusion 4. MAEKAWA’S VOTING ALGORITHM Safety: • Safety is achieved by ensuring that the intersection between any two voting sets is non-empty. A process can only vote (or grant permission) once between each successive “release" message • But for any two processes to have concurrent access to the critical section, the non-empty intersection between their voting sets would have to have voted for both processes Liveness: As described the protocol does not respect the Liveness property. It can however be adapted to use Lamport clocks similar to the previous algorithm Ordering: Similarly the Lamport clocks extension to the algorithm allows it to satisfy the ordering property
  36. 36. 2. Distributed Mutual Exclusion 2/6/2017 36@Copyright Material Fault Tolerance of Algorithms • None of the algorithms described above tolerate loss of messages • The token based algorithms lose the token if such a message is lost meaning no further accesses will be possible • Ricart and Agrawala's method will mean that the requesting process will indefinitely wait for (N - 1) “permission granted” messages that will never come because one or more of them have been lost • Maekawa's algorithm cannot tolerate message loss without it affecting the system, but parts of the system may be able to proceed unhindered
  37. 37. 2. Distributed Mutual Exclusion 2/6/2017 37@Copyright Material Fault Tolerance of Algorithms Process Crashes?? What happens when a Process Crashes? Central server: Any process which crashes if is not the central server, does not hold the token and has not requested the token, then everything else may proceed unhindered. Unrealistic !!! Ring-based algorithm: We may get through up to N-1 critical section accesses in the meantime Ricart and Agrawala: We might get through additional critical section accesses if the failed process has already responded to them. But no subsequent requests will be granted Maekawa's voting algorithm: This can tolerate some process crashes, provided the crashed process is not within the voting set of a process requesting critical section access
  38. 38. 2. Distributed Mutual Exclusion 2/6/2017 38@Copyright Material Considerations Central server • Care must be taken to decide whether the server or the failed process held the token at the time of the failure • If the server itself fails a new one must be elected, and any queued requests must be re-made. Ring-based algorithm • The ring can generally be easily fixed to circumvent the failed process • The failed process may have held or blocked the progress of the token Ricart and Agrawala • Each requesting process should record which processes have granted permission rather than simply how many • The failed process can simply be removed from the list of those required Maekawa's voting algorithm • Trickier, the failed process may have been in the intersection between two voting sets • Even if not, it must be determined whether the failed process
  39. 39. 3. Election 2/6/2017 39@Copyright Material Coordination Algorithms Coordination algorithms are fundamental in a distributed systems: 1. to dynamically re-assign the role of master, assuming master- slave hierarchy, and is done by a. Choosing primary server after crash b. co-ordinate resource access 2. for resource sharing: concurrent updates of • entries in a database (data locking) • Files • a shared bulletin board 3. to agree on actions: whether to • commit/abort database transaction • agree on a readings from a group of sensors
  40. 40. 2/6/2017 40@Copyright Material Coordination Algorithms What are the challenges? 1. Centralized solutions are not always appropriate because of • Communications bottleneck • Single point of failure 2. Varying network topologies • Ring, Tree, Arbitrary • Connectivity issues 3. Failures must be tolerated in case of • Link failures • Process crashes 4. Impossibility results • in presence of failures, more often in asynchronous model • impossibility of “coordinated attack” 3. Election
  41. 41. 2/6/2017 41@Copyright Material Coordination Algorithms 1. ELECTION ALGORITHM ON A RING The problem is: • N processes, may or may not have unique IDs (UIDs) • must choose unique master co-ordinator amongst themselves • one or more processes can call election simultaneously • sometimes, election is called after failure has occurred Safety of the System must have: • Every process having a variable elected, which contains the UID of the leader or is yet undefined Liveness (and safety): • All processes participate and eventually discover the identity of the leader (elected cannot be undefined). 3. Election
  42. 42. 2/6/2017 42@Copyright Material Coordination Algorithms 1. ELECTION ALGORITHM ON A RING Assumptions: • each process has a UID, UIDs are linearly ordered • processes form a unidirectional logical ring, i.e., • each process has channels to two other processes • from one it receives messages, to the other it sends messages Motive: • To elect a Leader & any process with highest UID becomes a Leader Every Process • Send two kinds of messages: elect(UID), elected(UID) • can be in two states: non-participant, participant Two phases of the algorithm • Determine leader • Announce winner Initially, each process is non-participant 3. Election
  43. 43. 2/6/2017 43@Copyright Material Coordination Algorithms 1. ELECTION ALGORITHM ON A RING Determine leader: Some process with UID id0 initiates the election by • becoming participant • sending the message elect(id0) to its neighbour When a non-participant receives a message elect(id) • it forwards elect(idmax), where idmax is the maximum of its own and the received UID • becomes participant When a participant receives a message elect(id) • it forwards the message if id is greater than its own UID • it ignores the message if id is less than its own UID 3. Election
  44. 44. 2/6/2017 44@Copyright Material Coordination Algorithms 1. ELECTION ALGORITHM ON A RING Announce leader: When a participant receives a message elect(id) where id is its own UID • it becomes the leader • it becomes non-participant • sends the message elected(id) to its neighbour When a participant receives a message elected(id) • it records id as the leader’s UID • Becomes non-participant • forwards the message elected(id) to its neighbour 3. Election
  45. 45. 2/6/2017 45@Copyright Material Coordination Algorithms 1. ELECTION ALGORITHM ON A RING 3. Election
  46. 46. 2/6/2017 46@Copyright Material Coordination Algorithms 1. ELECTION ALGORITHM ON A RING Properties 1. Safety is ensured 2. Liveness is • clear, if only one election is running • what, if several elections are running at the same time? • Participants do not forward smaller IDs 3. Bandwidth Consumption: • at most 3n – 1 (if a single process starts the election) 4. Turnaround: • at most 3n – 1, since these messages are sent sequentially. It can't tolerate failures, hence not very practical 3. Election
  47. 47. 2/6/2017 47@Copyright Material Coordination Algorithms 2. BULLY ALGORITHM • Assumption • Each process knows which processes have higher identifiers, and that it can communicate with all such processes • Compare with ring-based election • Processes can crash and be detected by timeouts • Synchronous • Timeout T = 2Ttransmitting (max transmission delay) + Tprocessing (max processing delay) • Three types of messages • Election: announce an election • Answer: in response to Election • Coordinator: announce the identity of the elected process 3. Election
  48. 48. 2/6/2017 48@Copyright Material Coordination Algorithms 2. BULLY ALGORITHM 1. Start an election when detect the coordinator has failed or begin to replace the coordinator, which has lower identifier • Send an election message to all processes with higher id's and waits for answers (except the failed coordinator/process) • If no answers in time T • Considers it is the coordinator • sends coordinator message (with its id) to all processes with lower id's • Else • waits for a coordinator message and starts an election if T’ timeout • To be a coordinator, it has to start an election • A higher id process can replace the current coordinator (hence “bully”) • The highest one directly sends a coordinator message to all process with lower identifiers 2. Receiving an election message • sends an answer message back • starts an election if it hasn't started one—send election messages to all higher-id processes (including the “failed” coordinator—the coordinator might be up by now) 3. Receiving a coordinator message • set electedito the new coordinator 3. Election
  49. 49. 2/6/2017 49@Copyright Material Coordination Algorithms 2. BULLY ALGORITHM 3. Election
  50. 50. 2/6/2017 50@Copyright Material Coordination Algorithms 2. BULLY ALGORITHM PROPERTIES Safety: • a lower-id process always yields to a higher-id process • However, it’s guaranteed • if processes that have crashed are replaced by processes with the same identifier since message delivery order might not be guaranteed and • failure detection might be unreliable Liveness: • all processes participate and know the coordinator at the end 3. Election
  51. 51. 2/6/2017 51@Copyright Material Coordination Algorithms 2. BULLY ALGORITHM PERFORMANCE Best Case: When? Process with the 2nd highest identifier notices coordinator failure • overhead: (N-2) coordinator messages • Turnaround Delay: no election/answer messages Worst Case: when? when the process with the lowest identifier first detects the coordinator’s failure Overhead: • Total Election Messages: 1+ 2 + ...+ (N-2) + (N-2)= (N-1)(N-2)/2 + (N-2) messages • Answer Messages: 1+...+ (N-2) • Coordinator messages, (N-2) messages • Total overhead: (N-1)(N-2) + 2(N-2) = (N+1)(N-2) = O(N2) • Turnaround delay: delay of election and answer messages 3. Election
  52. 52. 2/6/2017 52@Copyright Material Coordination Algorithms COMPARISON 3. Election
  53. 53. 2/6/2017 53@Copyright Material WHY MULTICAST? • Previously we encountered group multicast • IP multicast and Xcast both delivered “Maybe” semantics • That is, perhaps some of the recipients of a multicast message receive it and perhaps not • Here we look at ways in which we can ensure that all members of a group have received a message • And also that multiples of such messages are received in the correct order • This is a form of global consensus 4. Multicast Communication
  54. 54. 2/6/2017 54@Copyright Material Assumptions: 1. We will assume a known group of individual processes 2. Communication between processes is • message based • one-to-one • reliable 3. Processes may fail, but only by crashing • That is, we suffer from process omission errors but not process arbitrary errors 4. Our goal is to implement a multicast(g, m) operation • Where m is a message and g is the group of processes which should receive the message m 4. Multicast Communication
  55. 55. 2/6/2017 55@Copyright Material Deliver & Receive • We will use the operation deliver (m) • This delivers the multicast message m to the application layer of the calling process • This is to distinguish it from the receive operation • In order to implement some failure semantics not all multicast messages received at process p are delivered to the application Layer 4. Multicast Communication
  56. 56. 2/6/2017 56@Copyright Material Reliable Multicast Reliable multicast, with respect to a multicast operation multicast(g, m), has three properties: 1. Integrity  A correct process p ε g delivers a message m at most once and m was multicast by some correct process 2. Validity  If a correct process multicasts message m then some correct process in g will eventually deliver m 3. Agreement  If a correct process delivers m then all other correct processes in group g will deliver m Validity and Agreement together give the property that if a correct process which multicasts a message it will eventually be delivered at all correct processes 4. Multicast Communication
  57. 57. 2/6/2017 57@Copyright Material Basic Multicast Suppose we have a reliable one-to-one send(p, m) operation We can implement a Basic Multicast: b-multicast (g, m) with a corresponding Bdeliver operation as: 1. b-multicast (g;m) = for each process p in g: send(p;m) 2. On receive(m) : Bdeliver (m) • This works because we can be sure that all messages will eventually receive the multicast message since send(p, m) is reliable • It does however depend upon the multicasting process not crashing • Therefore b-multicast does not have the Agreement property 4. Multicast Communication
  58. 58. 2/6/2017 58@Copyright Material Reliable Multicast • We will now implement reliable multicast on top of basic Multicast • This is a good example of protocol layering • R-multicast(g, m) and R-deliver (m) which are analogous to their B- multicast(g, m) and B-deliver (m) counterparts but have additionally the Agreement property 4. Multicast Communication
  59. 59. 2/6/2017 59@Copyright Material Reliable Multicast • Note that we insist that the sending process is in the receiving group, hence: • Validity  is satisfied since the sending process p will deliver to itself • Integrity is guaranteed because of the integrity of the underlying B-multicast operation in addition to the rule that m is only added to Received at most once • Agreement  follows from the fact that every correct process that B-delivers(m) then performs a B-multicast(g, m) before it R-delivers(m). • However it is somewhat inefficient since each message is sent to each process |g| times. 4. Multicast Communication
  60. 60. 2/6/2017 60@Copyright Material Reliable Multicast over IP • So far our multicast (and indeed most of our algorithms) have been described in a vacuum devoid of other communication • In a real system of course there is other communication going on • So a reasonable method of implementing reliable multicast is to piggy- back acknowledgements on the back of other messages • Additionally the concept of a “negative acknowledgement" is used • We assume that groups are closed -- not something assumed for the previous algorithm • When a process p performs an R-multicast(g, m) it includes in the message: • a sequence number Sp g • acknowledgements of the form {q; Rq g} • An acknowledgement {q; Rq g} included in message from process p indicates the latest message multicast from process q that p has delivered. • So each process p maintains a sequence number Rq g for every other process q in the group g indicating the messages received from q • Having performed the multicast of a message with an Sp g value and any acknowledgements attached, process p then increments its own stored value of Sp g • In other words: Sp g is a sequence number 4. Multicast Communication
  61. 61. 2/6/2017 61@Copyright Material • The sequence numbers Sp g attached to each multicast message, allows the recipients to learn about messages which they have missed • A process q can R-deliver (m) only if the sequence number Sp g = Rq g + 1. • Immediately following R-deliver (m) the value Rq g is incremented • If an arriving message has a number S ≤ Rq g then process q knows that it has already performed R-deliver on that message and can safely discard it • If S > Rq g then the receiving process q knows that it has missed some message from p destined for the group g • In this case the receiving process q puts the message in a hold-back queue and sends a negative acknowledgement to the sending process p requesting the missing message(s), as shown in figure in the next slide 4. Multicast Communication
  62. 62. 2/6/2017 62@Copyright Material 4. Multicast Communication
  63. 63. 2/6/2017 63@Copyright Material PROPERTIES • The hold-back queue is not strictly necessary but it simplifies things since then a simple number can represent all messages that have been delivered • We assume that IP-multicast can detect message corruption (for which it uses checksums) • Integrity is therefore satisfied since we can detect duplicates and delete them without delivery • Validity property holds again because the sending process is in the group and so at least that will deliver the message • Agreement only holds if messages amongst the group are sent indefinitely and if sent messages are retained (for re-sending) • until all groups have acknowledged receipt of it • Therefore as it stands Agreement does not formally hold, though in practice the simple protocol can be modified to give acceptable guarantees of Agreement 4. Multicast Communication
  64. 64. 2/6/2017 64@Copyright Material UNIFORM AGREEMENT • Our Agreement property species that if any correct process delivers a message m then all correct processes deliver the message m • It says nothing about what happens to a failed process • We can strengthen the condition to Uniform Agreement • Uniform Agreement states that if a process, whether it then fails or not, delivers a message m, then all correct processes also deliver m. • A moment's reflection shows how useful this is, if a process could take some action that put it in an inconsistent state and then fail, recovery would be difficult • For example applying an update that not all other processes receive 4. Multicast Communication
  65. 65. 2/6/2017 65@Copyright Material ORDERED MULTICAST There are several different ordering schemes for multicast The three main distinctions are: 1. FIFO  If a correct process performs multicast(g, m) and then multicast(g, m’) then every correct process which delivers m’ will deliver m before m’ 2. Causal  If multicast(g, m)  multicast(g, m’) then every process which delivers m’ delivers m before m’ 3. Total  If a correct process delivers m before it delivers m’ then every correct process which delivers m' delivers m before m‘ Note That: • FIFO ordering and casual ordering are only partial orders • Not all messages are sent by the same sending process • Some multicasts are concurrent, cannot be ordered by happenedbefore • Total order demands consistency, but not a particular order 4. Multicast Communication
  66. 66. 2/6/2017 66@Copyright Material ORDERED MULTICAST Implementing FIFO Ordering • Our previous algorithm for reliable multicasting • More generally sequence numbers are used to ensure FIFO ordering Implementing Causal Ordering • To implement Causal ordering on top of Basic Multicast (b- multicast) • Each process maintains a vector clock • To send a Causal Ordered multicast a process first uses a b- multicast • When a process pi performs a b-deliver (m) that was multicast by a process pj it places it in the holding queue until: • It has delivered any earlier message sent by pj and • It has delivered any message that had been delivered at pj before pj multicast m • Both of these conditions can be determined by examining the vector timestamps 4. Multicast Communication
  67. 67. 2/6/2017 67@Copyright Material ORDERED MULTICAST Implementing Total Ordering There are two techniques to implementing Total Ordering: 1. Using a sequencer process 2. Using b-multicast to illicit proposed sequence numbers from all receivers 1. Using a sequencer • Using a sequencer process is straight forward • To total-order multicast a message m, a process p first sends the message to the sequencer • The sequencer can determine message sequence numbers based purely on the order in which they arrive at the sequencer • Though it could also use process sequence numbers or Lamport timestamps should we wish to, for example, provide FIFO-Total or Causal-Total ordering • Once determined, the sequencer can either b-multicast the message itself • Or, to reduce the load on the sequencer, it may just respond to process p with the sequence number which then itself performs the b-multicast 4. Multicast Communication
  68. 68. 2/6/2017 68@Copyright Material ORDERED MULTICAST Implementing Total Ordering There are two techniques to implementing Total Ordering: 1. Using a sequencer process 2. Using b-multicast to illicit proposed sequence numbers from all receivers 2. Using Collective Agreement • To total-order multicast a message, the process p first performs a b- multicast to the group • Each process then responds with a proposal for the agreed sequence number • And puts the message in its hold-back queue with the suggested sequence number provisionally in place • Once the process p receives all such responses it selects the largest proposed sequence number and replies to each process (or uses b- multicast ) with the agreed upon value • Each receiving process then uses this agreed sequence number to deliver (that is TO-deliver) the message at the correct point 4. Multicast Communication
  69. 69. Summary • Understanding of the goals of the problems of coordination and agreement in distributed systems • Learn the various algorithmic techniques for addressing the concerns of Coordination & Agreement • Understanding of the theoretical and practical limits to solving them • Introduce algorithms for distributed mutual exclusion • Introduced election algorithms, for processes coordination • Illustrated the need for & various ways of achieving Multicasting 2/6/2017 69@Copyright Material
  70. 70. QUIZ 1. A reliable failure detector doesn’t reply with one of the values a. Suspected & Failed b. Suspected & Unsuspected b. Unsuspected & Failed d. Failed & Alive 2. Which of these is not a property required to be satisfying for Mutual Exclusion a. Safety b. Starvation c. Fairness d. Liveness 3. The generic algorithm for Distribtued Mutual Exclusion which satisfies all the properties of Mutual Exclusion is a. Central Sever b. Ring c. Ricart & Agrawala d. Bully 4. Multicasting using logical clocks has the processes indicating its state as “Not in or requiring entry to the critical section” then, this state is referred as a. Held b. Request c. Wanted d. Release 2/6/2017 70@Copyright Material
  71. 71. QUIZ 5. In Ricart & Agrawala Algorithm, a “Permission granted” is a packet sent when a. All other processes are in “released” state b. All other processes are in “wanted” state c. All other processes are in “held” state d. At least one process in “released” state 6. Which algorithm handles at least fault tolerance among the below a. Central Sever b. Ring c. Ricart & Agrawala d. Maekawa 7. The election algorithm which allows process crashes a. Ring b. Election c. Both d. Maekawa 8. What are the 3 properties to be satisfied for a Multicast Communication a. Integrity, Validity, Agreement b. Safety, Fainess, Liveness c. Integrity, Safety, Fairness d. Safety, Validity, Agreement 2/6/2017 71@Copyright Material
  72. 72. QUIZ 9. Which type of Multicasting allows this - If a correct process performs multicast(g, m) and then multicast(g, m’) then every correct process which delivers m’ will deliver m before m’? a. Causal b. FIFO c. Total d. Ordered 10.Which type of multicasting maintains a Vector Clock? a. Causal b. FIFO c. Total d. Ordered 11. A Multicast communication which has a negative acknowledgements is _______ a. Ordered b. FIFO c. Reliable d. Basic 12. Total ordering can be done through the following ways a. Using a Sequencer b. Using Collective agreement c. Both d. None 2/6/2017 72@Copyright Material
  73. 73. KEY 1. A 2. B 3. C 4. D 5. A 6. D 7. B 8. A 9. B 10. A 11. C 12. C 2/6/2017 73@Copyright Material
  74. 74. Glossary Failure Detector: The object/code in a process that detects failures of other processes & it is not necessarily accurate Unreliable Failure Detector: It may produce one of two values, Unsuspected or Suspected, when given the identity of a process Reliable Failure Detector: It is one that is always accurate in detecting a process’s failure. It answers processes’ queries with either a response of Unsuspected – which, as before, can only be a hint – or Failed Unsuspected: signifies that the detector has recently received evidence suggesting that the process has not failed Suspected: signifies that the detector has received some evidence suggesting that the process may have failed Mutual Exclusion: refers to the requirement of ensuring that no two concurrent processes are in their critical section at the same time; it is a basic requirement in concurrency control, to prevent race conditions 2/6/2017 74@Copyright Material
  75. 75. Glossary Race Condition: is the behavior of an electronic or software system where the output is dependent on the sequence or timing of other uncontrollable events Critical Section: is a piece of code that accesses a shared resource (data structure or device) that must not be concurrently accessed by more than one thread of execution Starvation: is the indefinite postponement of the request to enter the critical section from a given process Safety: At most one process may execute in the critical section at a time Liveness: Requests to enter and exit the critical section(CS) eventually succeed . This property is essential for processes to avoid deadlock and starvation Ordering: If one request to enter the CS happened-before another, then entry to the CS is granted in that order 2/6/2017 75@Copyright Material
  76. 76. Glossary Asynchronous Processes: processes that do not depend on each other's outcome, and can therefore occur on different threads simultaneously Synchronous Processes: A process will wait for one to complete before the next begins Released state: A process is not in or requiring entry to the critical section Wanted state: A process is requiring entry to the critical section Permission granted: A process accesses the critical section when it gets this message from all other processes in the system. Held state: A process is had acquired entry to the critical section and has not yet relinquished that access. Integrity: A correct process p ε g delivers a message m at most once and m was multicast by some correct process Validity: If a correct process multicasts message m then some correct process in g will eventually deliver m 2/6/2017 76@Copyright Material
  77. 77. Glossary Agreement: If a correct process delivers m then all other correct processes in group g will deliver m Negative acknowledgement: is a response indicating that we believe a message has been missed/dropped Uniform Agreement: states that if a process, whether it then fails or not, delivers a message m, then all correct processes also deliver m. FIFO Ordering: If a correct process performs multicast(g, m) and then multicast(g, m’) then every correct process which delivers m’ will deliver m before m’ Causal Ordering: If multicast(g, m)  multicast(g, m’) then every process which delivers m’ delivers m before m’ Total Ordering: If a correct process delivers m before it delivers m’ then every correct process which delivers m' delivers m before m‘ Sequencer: The process sequencer(g) maintains a group-specific sequence number sg , which it uses to assign increasing and consecutive sequence numbers to the messages that it B-delivers 2/6/2017 77@Copyright Material

×