Upcoming SlideShare
Loading in...5




Presentation for the preliminary examination on the topic of Fault-Tolerant Capacity of Distributed Systems

Presentation for the preliminary examination on the topic of Fault-Tolerant Capacity of Distributed Systems



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • A distributed system is a collection of individual computing devices communicate with each other and perform some computational tasks as a single entity.
  • Being composed by a large number of individual nodes, distributed systems are usually failure prone. Nodes may stop working because the operating system crashes, or, in an ad-hoc network setting, the battery dies. Nodes may also inject incorrect information into the system because of buggy codes or corrupted memory states, or may be compromised by an malicious adversary. It is very important that a distributed system can still function correctly when some of the nodes become faulty. In other words, being fault tolerant.
  • The only way to tolerate faults, is to increase redundancy. Redundancy may be in the form of information, like using coding. Redundancy may also be in the form of processing components, an example would be replicas that performs exactly the same task.Intuitively, the more fault tolerant we want to make the system, the more redundancy is needed. And more redundancy requires more system resources. So our goal is to study how much network resource we need to achieve a certain level of fault tolerance. Or given certain amount of network resources, how fault-tolerant the system can be made.
  • The main topic of this proposal is about the capacity of byzantine agreement. We considered the byzantine agreement in 3 network models.The first one is what we call the ethernet model, in which the network capacity is shared by the nodes. The byzantine agreement problem in this model is closely related to mutiparty equality computation problem. I will also talk about it briefly.The second model we consider is the point-to-point model, in which each link has its own capacity constraints and is independent with all other links.The first two models are for wired networks. We will also look at the problem of byzantine agreement in the wireless setting.
  • Let me first get you familiar with the byzantine agreement problem. I will focus on the broadcast version of byzantine agreement in this talk.The goal of the broadcast problem is for source to broadcast a message to all the other nodes in the presence of up to f failures. The requirement is that all fault-free peers must agree – by which we mean the must decide on the same message. In the case the source is in fact good, it’s a natural requirement that the decided message must be the same as the one sent by the good source.
  • The failure is modeled as adversary controlling any subset of no more than f nodes. The adversary is omniscient, by which we mean the adversary has complete knowledge of the system, the algorithm being used, and the message the source trying to send.the node being controlled by the adversary can behave in any arbitrary way. So we are going for the worst case analysis here.The adversary is also assume to be computationally unbounded. Since there is no secret hidden from the adversary, it can break any encryption being used. So cryptographic techniques won’t help, hence there is no need to use encryption.
  • In the distributed algorithm literature, no one has considered the capacity of the underlying communication network. But in reality, every network has its own rate region that specifies how fast the node can communicate with each other. We are interested in how the rate region affect the performance of byzantine agreement algorithms, and how a good algorithm should look like.
  • To measure the performance of an algorithm, we first defined the throughput of it. Similar to the throughput in the communication literature, the throughput of agreement is defined as the long term average of number of bits being agreed on per unit time. Then the capacity of agreement is naturally defined as the supremum of achievable throughputs given a rate region.
  • We have defined capacity of byzantine agreement. Now lets look at how it is affected by different rate regions.
  • In the ethernet model, two nodes are communicated with a point-to-point private link. There is node constraint on the capacity of individual links. The only constraint is the total traffic must be no more than C.In this model, the capacity of agreement is simply C divided by the average communication complexity to agree on 1 bit.
  • the complexity per agreed bit is defined as the number of bits required to communicate to agree on L bits divided b L. This is a measure of complexity that has been studied for decades.
  • In the literature, the best per-bit complexity to reach agreement in all possible cases, which we call deterministic agreement, is at least n^2.The complexity can be reduced to order of n is the problem is relaxed a little bit by allowing a small probability of error, which we call probabilistic agreement.The main contribution of our work is that we design the first deterministic agreement algorithm with per bit complexity linear in n, for long messages.
  • This is the structure of our algorithmThe messaging being broadcast is first divided into generations and the algorithm tries to reach agreement on the generations one after one.For each generation of data, an network code is used so that any misbehavior by the faulty nodes will be detected by at least 1 good nodes.If by the end of a generation, node failure is detected, then it goes to the next generation using the same code.On the other hand, if failure is detected, an extra communication phase is performance to help the good nodes to learn the location of the faulty nodes. After this, the code is adapted to the locations of the faulty nodes for the future generations. We will see that the extra communication phase won’t occur too many times. So the overhead is comparably small.
  • This is an example. in every generation, the algorithm try to agree on 3 packets of data. In step 1, source encoded the 3 packets of data with a (6,3) code, so that the data can be reconstructed with any 3 coded packets. Source sends 2 coded packets to each peer.
  • In the 2nd step, every peer forward the first coded packet to the other peers. So by the end of step 2, every peer has receive a (4,3) code that can detect any single failure. It’s easy to see that if a peer is bad, it will be detected
  • How about when source is bad? When a source is bad, the best it can do is to try to make the peers decide on different messages by sending packets that don’t consist a valid codeword. However, as we can see, when the peers are all good, they will share 3 coded packets identically. So it’s impossible for them to decide on different data packets. In this case, all good peers detect a failure.
  • after a failure detected, the peers can’t tell which node is really bad. In this example, nodes 1 and 3 can’t tell the different between S being bad and 2 being bad.
  • So we need the extra communication to narrow down the faulty node. What we do here is to have all nodes broadcast every packets it has sent or received. This is done using tradition byzantine agreement algorithm with n^2 per-bit complexity. What we can show is that given a failure is detected, there must be a conflict between packets broadcasted by the bad node and a good node. And since there will be no conflict between a pair of good nodes, then we can narrow down the bad node to be either S or 2. In this case, node 2 knows S is bad, and it won’t trust packets from S any more. So we remove the pair of links between S and 2.
  • Now we start the second generation. The source use the same (6,3) code, except that the packets for 2 are skipped. This time, instead of z, S sends a z’ to node 3.
  • In the 2nd step, 1 and 3 first forward their first packets to node 2. Now look at node 2, remember that we are trying to agree on 3 packets of data, but node 2 has only 2 packets. What we do here is to have node 1 send its second packet to node 2.
  • Now node 2 has 3 packets, it first solves them for the “data”. Then is send y’ to nodes 1 and 3. Similar to generation 1, the peers detect the failure again.
  • So we need the extra communication to narrow down the faulty node. What we do here is to have all nodes broadcast every packets it has sent or received. This is done using tradition byzantine agreement algorithm with n^2 per-bit complexity. What we can show is that given a failure is detected, there must be a conflict between packets broadcasted by the bad node and a good node. And since there will be no conflict between a pair of good nodes, then we can narrow down the bad node to be either S or 2. In this case, node 2 knows S is bad, and it won’t trust packets from S any more. So we remove the pair of links between S and 2.
  • In terms of complexity, the major cost comes from the code. To be able to detect the failure, every peer will receive at most n coded packets, in order to agree on n-f packets of data. So the cost of detection is n(n-1)/(n-f) times L.By choosing the size of each generation appropriately, we can make the cost of other operations, including the extra broadcast, to O(n^4L^0.5). Then the per-bit complexity becomes n(n-1)/n-f + O(). The 1st team is linear in n, and the second team diminishes as L becomes large. Since even no node is faulty, it still takes n-1 to just send the message. The per-bit complexity is at least n-1. also given that f must be less than n/3, then our complexity is within a factor of 1.5 of optimal.
  • Remember that the agreement of capacity is the supremum of achievable throughput, which corresponds to the infimum of the total complexity. There has no existing work on the tight lower bounds of the complexity of byzantine agreement. So we develop our own lower bound.
  • The multiparty equality function problem is defined as this. There are n nodes, each is given a value from 1 to M, and they want to know whether their values are equal or not, by communicating with each other. If the values are not all the same, then at least one of the nodes will know/detect that. If the values are all equal, then no one detects.
  • Any protocol that solves the meq problem can be represented as a directed multi graph. Each link represents the symbol begin sent in each step. The numbers next to the links denotes which step this link represents, and the f_i is the function according to which the sender of each step compute what symbol to send. We can show that, any protocol that solves the meq problem can be transformed to one that the graph is acyclic, and every node only transits to nodes with larger indices.
  • Also, we prove that it’s sufficient for each node to compute the messages it sent only with its input value x. then we can pack the symbol sent in all steps into 1, and the graph looks like this.
  • Then the complexity of a protocol is defined as the sum of number of bits communicated between every pair of nodesAnd the complexity of a meq problem is the minimum over all protocols that solve it.
  • An upper bound of the meq problem is obtained trivially by construction. Just have nodes 1 to n-1 send their x to node n and let node n compare the values. So the upper bound is n-1 times logM
  • we can show that the upper bound is not tight with the counter example of MEQ(3,6).
  • The communication model for the point-to-point rate region is the same as the one used for a lot of work in networking literature. Again, the nodes communicated over point-to-point private links. But this time, each link has its own fixed capacity.
  • For this rate region, we first states a couple necessary conditions for a throughput R to be achievable. if any f peers are removed, the min-cut from source to any remaining peer must be at least R.this is necessary since the adversary can have f peers just stop communicating with the rest of the network. Then the remaining mincut must be at least R just to send the information to the good peers.
  • If f nodes, including the source is removed, the incoming rate of any remaining peer must be at least R. This is necessary because a bad source can stop communicate with A, and behave correctly to the other peers. The other nodes can’t tell whether 1 is bad or s is bad. Then they must still agree on something at rate R. Then the only way for node 1 to also agree with 2 and 3 is to receive from 2 and 3 at rate at least R.
  • Now lets compare our algorithm with applying tradition ones directly to the point-to-point model.The traditional algorithms will have the source send the whole message on each of its outgoing links. So the throughput is upper bounded by minimum of the capacity of the outgoing links of s. But in our algorithm, the throughput is upper bounded by the minimum of the sum of any 2 outgoing links.In this example, we can do arbitrarily better than the traditional ones.
  • This is an example to show you why nc1 and nc2 is not tight in general: the source partitions the network. This network satisfies nc1 and 2. And it can detect any faulty peer. However, since the source itself is a cut of the network, it can have the two sides of the network agreeing on different messages without being detected.
  • This bound is proven using the fooling set argument we used for the meq problem.
  • Should a node transmit to more nodes in a longer range with lower rate or to fewer nodes at higher rate?
  • When the packets are routed through multiple hops, the relay nodes can tamper the packets it forwards. An intuitive way to mitigate this is the use source coding. But it won’t work, since no matter what code is being used, the relay can tamper as many packets as needed to create a different valid coded word.
  • Another idea is to use overhearing in wireless. A node that can overhear transmissions from both the source and the relay can compare the packets received from them and detects an attack is mismatch. But it may still fail when it fails to overhear both copies of some packets. So if the relay tampers just a few packet, it’s not unlikely that the destination will accept a bad packet
  • Our idea is very simple, just to combine source coding and overhearingIf the relay tampers a small fraction of the packets, it will be caught by the destination during decodeOn the other hand, if it tampers a lot packets, it will be caught by the neighboring nodes with high probability.We show that we can make the coding rate approach 1 as the probability of not detecting an attack goes to 0 by choosing the code appropriately.

Prelim Prelim Presentation Transcript