1/66Distributed Leader Election Algorithmsin Synchronous NetworksMitsou ValiaNational Technical University of AthensSchool of Applied Mathematics and Physics
2/66Distributed ComputingDistributed computing is decentralised andparallel computing, using two or more computerscommunicating over a network to accomplish acommon task.The collaborating processes are often identical.One of the central problems is…
3/66Leader ElectionGiven a network of processes, exactly one processshould output the decision that it is the leader.It is usually required that all non-leader processesare informed of the leader’s election.
4/66Networks• The timing model:– Synchronous– Asynchronous– Partially synchronous• The failure model:– Completely reliable– Partly faulty• Stopping failure• Byzantine failure
6/66The Synchronous Network Model(formal)• Alphabet M (null indicates the absence of amessage)• On every i Є V we have a process which consists:– statesi (a not necessarily finite set of states)– starti (the initial state)– msgsi (a message generation function)– transi (a state transition function)• With each edge i, j there is a link that can hold atmost a single message in M.
7/66Complexity measures• Time complexity: the number of the rounds untilall outputs are produced or all the processeshalt.• Communication complexity: the number of non-null messages that are sent during theexecution.
9/66Setting• The network graph is a directed ring (unidirected orbi-directed) consisting of n nodes (n may be unknownto the processes).• Processes run the same deterministic algorithm• The only piece of information supplied to theprocesses is a unique integer identifier (UID).• UIDs may be used– In comparisons only (comparison-based algorithms)– In comparisons and other calculations (non-comparison-based).
10/66Related Work and Important ResultsAlgorithmTimeComplexityMsg Complexity RestrictionsLCR (‘79) O(n) O(n2) -HS (’80) O(n) O(nlogn) BidirectionalML (’06) O(n)O(nlogn) (betterconstant)-TimeSlice O(n · umin) O(n)Non-comparisonbasedLowerboundΩ(n) (trivial) Ω(nlogn) FL (’87)
11/66The LCR Algorithm• Comparison-based Algorithm• The size of the ring is unknown to the processes• Unidirectional Ring• It elects the process with the maximum UID
12/66The LCR AlgorithmDescriptionEach process sends its UID around the ring.When a process receives a UID, it comparesthis one to its own.– If the incoming UID is greater, then itpasses this UID to the next process.– If the incoming UID is smaller, then itdiscards it.– If it is equal, then the process declaresitself the leader.
14/66LCR Complexity Analysis• Time Complexity: O(n)• Message Complexity: O(n2) – worst caseO(nlogn) – average case
15/66The HS Algorithm• Comparison-based Algorithm• The size of the ring is unknown to the processes• Bi-directional Ring• It elects the process with the maximum UID
16/66The HS AlgorithmDescriptionEach process operates in phases 0, 1, 2...In each phase k, process i sends tokens with itsUID in both directions to travel distance 2k andreturn back to it.If both tokens return then process i continues inphase k+1.
18/66The HS Algorithm (continued)When a process receives an outgoing UID, itcompares this one with its own.– If the received UID is smaller, then itdiscards it.– If the received UID is greater then• it passes it to the next process, if it is not theend of its path,• else it returns it back to the previous one (totravel back to the originating process).– If it is equal, then the process declaresitself the leader.
25/66Complexity Analysis• Time Complexity: O(n)• Message Complexity: O(nlogn)
26/66Distributed Algorithms in aGeneral Synchronous Network
27/66Leader Election in a General Network -The FloodMax Algorithm• The diam of the graph is known.• Causes both leader and non-leaders to identifythemselves.• It elects the process with the maximum UID.
28/66FloodMax Algorithm• Every process keeps the maximum UID it hasseen so far (initially its own).• At each round, each process sends thismaximum value to every outgoing neighbor.• After diam rounds if the maximum value is theprocess’s UID then it elects itself the leader,otherwise it is a non-leader.
29/66Complexity Analysis• Time Complexity: diam rounds• Communication Complexity: diam·|E|(|E| messages in every round).
30/66Minimum Spanning TreeSpanning tree of a graph G(V,E): a tree that consistsentirely of edges in E and contains every vertex of G.The problem: Given an undirected graph G(V,E) find aminimum weight (undirected) spanning tree for thenetwork.Distributed output: Each process should determine which ofits incident edges belong to the tree.• Processes know n• Processes have UIDs
31/66Minimum Spanning Tree (continued)General Strategy for MST:• Start with the trivial spanning forest.• For every connected component C select aminimum weight outgoing edge e.• Combine C with the component at the otherend of e, including e.• Stop when the forest has a single component.
32/66Minimum Spanning Tree (continued)Several well-known sequential MST algorithmsare special cases of this general strategy:• Prim (add minimum-weight outgoing edgefrom the current component attaching a newsingle node)• Kruskal (add minimum-weight edge that joinstwo separated parts)
33/66Minimum Spanning Tree (continued)A distributed version could be:Each component determines a minimum-weightoutgoing edge and all these edges are added tothe forest causing combinations of componentsall at once.The above strategy is false in general!!!Example: A cycle could be created.Lemma: If all edges of G have distinct weights,then there is exactly one MST.111
34/66Minimum Spanning Tree (continued)The SynchGHS algorithm(Based on an asynchronous algorithm developedby Gallager, Humblet and Spira in 1983.)The strategy mentioned before is used.Assumption: Edge weights are all distinct.
35/66Minimum Spanning Tree (continued)The Algorithm:• The algorithm builds components in levels.• For each level k, the level k components aresubtrees of the MST that constitute a spanningforest.• Each level k component has at least 2k nodes.• Every component at every level has adistinguished leader node.
36/66Minimum Spanning Tree (continued)101298763451531713410149157126112168
37/66Minimum Spanning Tree (continued)171110129876345153134101491571262168
38/66Minimum Spanning Tree (continued)171110129876345153134101491571262168
39/66Minimum Spanning Tree (continued)171110129876345153134101491571262168
40/66Minimum Spanning Tree (continued)Complexity Analysis• Time Complexity: O(nlogn)[logn levels] x [O(n) time for every level forsynchronization].• Communication Complexity: O((n+|E|)logn)[logn levels] x [O(n) messages along tree edges+ O(|E|) messages for finding the local minimumweight outgoing edges].It can be reduced to O(nlogn + |E|).
41/66Minimum Spanning Tree (continued)Non-unique weight edges:edge identifier: a triple (weighti,j , u, u’)where, u<u’ the UIDs of i, j.Thus, a total ordering is defined among the edgeidentifiers.Example:132(1,1,2)(1,1,3) (1,2,3)
42/66Minimum Spanning Tree (continued)Leader Election:• The leaves of the MST begin a convergecast along thepaths of the tree.• Internal nodes wait to receive messages from all butone neighbor. Then they send a message to theremaining neighbor.• If a node receives messages from every neighborwithout having itself send a message then becomes theleader.• If two neighboring nodes receive messages from eachother at the same round, then the one with the greatestUID becomes the leader.Complexity: n-1 additional time and messages.
44/66GeneralLemma: If the network is symmetric (i.e. a ring)and anonymous (the processes haven’t UIDs)then it is impossible to elect a leader by adeterministic algorithm. [by Angluin (1980)]Probabilistic algorithms are used to breaksymmetry.
45/66Itai and Rodeh AlgorithmAssumption: Processes know n.The Algorithm• The algorithm proceeds in phases, each of themcontaining n rounds.• At every phase, a ≤ n processes are active (initiallyeveryone). During each phase some processes maybecome inactive.• At the beginning of every phase, every active processdecides with probability a-1 whether or not to become acandidate.To do that, it picks a random number r, 0<r<1 and if r<a-1,then it becomes a candidate and initiates a pebble totravel around the ring.
46/66Itai and Rodeh Algorithm• To compute the number of candidates (c), each processcounts the pebbles it has seen.Number of pebbles counted = Number of candidates.• At the end of the phase, every process has calculated c.• If c=1 then sole candidate becomes leader. If c>1 then anew phase begins with the new active processes (thecandidates of the previous phase). If c=0 the phase wasuseless.
47/66Itai and Rodeh Algorithm0.040.350.460.080.370.830.640.22 0.530.93a-1 = 1/10c = 2
48/66Itai and Rodeh Algorithm0.740.88Useless phasea-1 = 1/2c = 2
49/66Itai and Rodeh Algorithm0.320.69a-1 = 1/2c = 1
51/66Itai and Rodeh Algorithm-Complexity Analysisp(a,c) : the probability that c out of a active processesbecome candidates. Thencacaacacap−−⎟⎠⎞⎜⎝⎛−⎟⎟⎠⎞⎜⎜⎝⎛=11),(Proof:Xi a random variable,Xi=1 if i becomes a candidate, else 0 (bernoulli trial)Then X=ΣnXi= the number of processes becomecandidates. X~binomial distribution.Thus [ ] ( ) ),(1 1capaacacXPcac=−⎟⎟⎠⎞⎜⎜⎝⎛==−−−⎩⎨⎧−= −−111,0,1aaXi
52/66Itai and Rodeh Algorithm-Complexity AnalysisAverage Case• Time Complexity: 2.441716 · n• Message Complexity: 2.441716 · nThe number of pebbles initialized per phase is X (thenumber of active processes that become candidates).E[X ] = E[ΣaXi ] = Σa(E[Xi ]) = a · a-1 = 1Thus, the expected message complexity per phase is n.
54/66The ModelFull- or Perfect-Information Model [BL 90]:• There is an adversary that controls t players• The adversary has unlimited computationalpower.• Communication between players is bybroadcast.• Reliable delivery of messages.• The identity of the sender is protected.The adversary has complete knowledge ofthe state of the protocol at any givenmoment.
55/66The NetworkWe assume an asynchronous network withsynchronization points :• Computation proceeds in rounds.• In each round processes send messages.• During a round we can’t force processes to actsimultaneously.• Messages of round i precede those of round i+1.Within a round, all cheaters have the opportunityto wait until they receive messages from allhonest players and then send their own.
56/66ExampleA Leader Election Protocol of n processes (BatonPassing [Saks 89]):• In every round the baton is randomly passed to aprocess that hasn’t yet received it.• The last process left with the baton becomes theleader.If there are cheaters, when they take the baton they giveit to an honest process, in order to increase theprobability of a cheater to be elected.Baton Passing lasts n-2 rounds.nnnnniP121·3212·1)( =−−−= L
57/66Failure probabilityLet P(n,t) be a leader election protocol between nprocesses, t of which are corrupted.failP(n,t): the probability that one of the cheaters iselected.Proposition: For any n, any t ≥ 1 and any leaderelection protocol P :1. failP(n,t) is non-decreasing in t.2. failP(n,t) ≥ t/n3. failP(n,┌n/2┐) =1
58/66ResilienceResilience: How many cheaters are allowed in orderfor the protocol to guarantee that an honest playercan be elected with positive probability.Definition: P is resilient for t=b(n) iff ε>0 such thatfor all suitably large nfailP(n,b(n)) ≤ 1 – ε.However,If t ≥ 1 then failP(n,t)>1/4(P is the Lightest Bin protocolwhich achieves optimal resilience)∃n/2n/2 tt11fail(n,t)fail(n,t)11--εεb(n)b(n)
60/66Zero Edge ProtocolsZero Edge Protocols: Protocols where cheaterscannot increase their probability of election bycheating.• These protocols exist only for t=1.• For t>1 the adversary can find two players thatcan collude.Example: Baton Passing (t=1)Counter Example: Itai - Rodeh
61/66Zero Edge ProtocolsThe selected squaredetermines the leader.A cannot increase theprobability of his election.Same for B and C .A picks a rowB picks a columnC picks a levelD, E are mute.
62/66Related Work and Important ResultsProbabilistic arguments have established that:1. There exists a leader election protocol AN1 withbounded cheaters’ edge for all t ≤ n.[Alon, Naor, ‘93].2. For any β < 1/2, there exists a protocol that isresilient for t = βn. [AN93, BN00] (In terms ofresilience, this is optimal.)Disadvantages• Non-constructive. Exhaustive search may beattempted, but could take time .• O(n) running time (very slow))(22nO
63/66Related Work and Important Results -Reduction via CommitteesLemma: From a leader election protocolP(n,t) executed in r(n) rounds andconstructed in s(n) time, we can obtain aleader election protocol cmt|P(logdn, (t/n +c/logn) logdn), that lasts r(logdn)+1 roundsand is constructible in s(logdn)+poly(n)time.[Russel - Zuckerman ’01]
64/66Related Work and Important ResultsGeneral scheme to overcome this drawback:1. Players pick a small committee.2. Committee members pick a leader among them using asuitably “good” protocol, discovered via exhaustivesearch (so it doesn’t have to be efficiently constructible).After long line of work, achieved (log*n + O(1))-roundprotocols, with optimal resilience.[Russel, Zuckerman ’01], [Feige ‘99].None of them has bounded cheaters’ edge.
65/66Related Work and Important ResultsAntonakopoulos (2006) presented three leader electionprotocols with bounded cheaters’ edge that arepoly(n)-time constructible:Protocol Condition roundsP* t ≤ Θ(n/logn) 5P# t ≤ Θ(n/√(lognloglogn)) 5lognP+ - polylogn