Uploaded on

Capacity of Byzantine Agreement with Finite Link Capacity

Capacity of Byzantine Agreement with Finite Link Capacity

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
143
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Distributed systems are becoming more and more popular. Like cloud computing, google, microsoft and now china mobile. Distributed systems consists of many components that can perform computation independentlyBut to be one integrated systems, these distributed computing components need to be coordinated in some way so that they can provide services as one single system.
  • For this purpose of coordinating distributed devices, researchers have studied and designed a number of distributed primitive that realize some basic functionalities that are needed for most distributed systems. And they are the building blocks of more sophisticated distributed systems.Examples of these distributed primitives are clock synchronization which synchronizes the clocks on different machines. mutual exclusion makes sure that a shared resource such as a block of memory can be accessed by only one user at a time and every user can access it eventually. And agreement makes sure that the different nodes receives consistent data or instruction of operations, so that consistent state across the system is maintained.There has been a lot of work on all the listed primitives in the distributed algorithm literature.
  • Since the distributed systems are built using these primitives, its performance is highly affected by the efficiency these building blocks. As a researcher in the area of networking and communication, it is natural to ask “how would the performance of these primitives be affected by the underlying communication network?”Unfortunately, as far as we are aware, very little has be done in the algorithm literature.
  • You can imagine the sender as the client who tries to store a file onto a data center that keeps multiple copies of the file of different machines. We want to make sure that all copies of the file are identical.
  • Distributed systems contains many computers, and computers crash from time to time. Or the codes may have bugs.Also since the system is distributed, it is not so difficult for a malicious attacker to take control of some of the machines and try to crack to system.
  • For this example, we have the thicker links of capacity 1, and the thinner links of capacity epsilon. The upper bound is 1+ epsilon
  • Now let’s see how the classic solution for broadcast works
  • Since the classic solution requires the whole message being sent on every link, its throughput is automatically upper bounded by the capacity of the slowest linkSo for the same example, it achieves a throughput no more than epsilon while the upper bound we obtained before is at least 1. This is very bad since it can be arbitrarily worst than the upper bound. Now we ask ourselves, is epsilon the best we can get in this case? Can we get any closer to the upper bound? The answer is yes. Yes we can do much better than epsilon and we can get arbitrarily close to the upper bound.
  • So, to approach the upper bound, we first observed that the classic solution is in fact doing error correction with repetition code.And as we all know, for the same number of errors, error detection codes have a higher coding rate than error correction codes since it only detects the errors but may not be able to correct them.
  • If we look at the outgoing links of the sender and the incoming links of the receivers, the content transmitted on these links actually is a error detection code with parity
  • We break the whole file into many small pieces and try to agree on the pieces one by one, in roundsIf X misbehaves with Y in a given round, it will be detected, and we have a mechanism to identify a pair of nodes, one of which must be bad. Then the pair of links between these two nodes will not be used for the next round
  • We have many fast rounds over the infinite time horizonAnd a constant number of expensive round after failure detection

Transcript

  • 1. Capacity of Agreementwith Finite Link Capacity Guanfeng Liang @ Infocom 2011 Electrical and Computer Engineering University of Illinois at Urbana-Champaign Joint work with Prof. Nitin Vaidya 1
  • 2. MOTIVATION 2
  • 3. Motivation Distributed systems are emerging  Cloud computing (e.g. Windows Azure), distributed file systems, data centers, multiplayer online games Large number of distributed components Distributed components need to be coordinated 3
  • 4. Motivation Distributed primitives  Clock synchronization  Mutual exclusion  Agreement  etc. Large body of literature in Distributed Algorithms 4
  • 5. MotivationA networking guy asks: “How would constraints of the network affect the performance of these primitives?”A algorithm guy replies: “……” Network-aware distributed algorithm design 5
  • 6. BYZANTINE AGREEMENTIN P2P NETWORKS 6
  • 7. Byzantine Agreement (BA): Broadcast A sender wants to send message to n-1 receivers  Fault-free receivers must agree  Sender fault-free  agree on its message  Any ≤ f nodes may fail 7
  • 8. Why agreement? Distributed systems are failure-prone  Non-malicious: crashed nodes, buggy codes  Malicious: attacker tries to crack the system Robust system against faults: Important to maintain consistent state 8
  • 9. Impact of the Network How does capacity (rate region) of the network affect agreement performance? How to quantify the impact? 9
  • 10. Rate Region Defines the way “links” may share channel Interference posed to each other determines whether a set of transmissions can succeed together 10
  • 11. “Ethernet” Rate Region S Rate S21 2 Rate S1 Rate S1 + Rate S2 ≤ C 11
  • 12. Point-to-Point Network Rate Region S Each directed link independent of other links 1 2Rate ij ≤ Capacity ij 12
  • 13. Capacity of Agreement b(t) = # bits agreed in [0,t] b(t ) Throughput lim t t Capacity of agreement: supremum of achievable throughput for a given rate region 13
  • 14. Upper Bound of Capacity in P2P Networks NC1: C ≤ min-cut(S,X | f receivers removed) S 1 3 2 14
  • 15. Upper Bound of Capacity in P2P Networks NC2: C ≤ In(X | f nodes removed) S 1 3 2 15
  • 16. Upper Bound of Capacity in P2P Networks NC1: C ≤ min-cut(S,X | f receivers removed) NC2: C ≤ In(X | f nodes removed) S Upper bound = 1+ε ε 1 3 2 16
  • 17. Classic Solution for Broadcast value v S v v vFaulty peer 1 3 2 17
  • 18. Classic Solution for Broadcast value v S v v v 1 v v 3 2 18
  • 19. Classic Solution for Broadcast value v S v v v 1 v v 3 2 ? ? 19
  • 20. Classic Solution for Broadcast value v S v v v 1 v v 3 2 ? v ? v 20
  • 21. Classic Solution for Broadcast value v S v v v 1 v v 3 [v,v,?] 2 ? v [v,v,?] ? v 21
  • 22. Classic Solution for Broadcast value v S v v v 1 v v 3 vMajority 2vote results ? vin correct vresult at ?good receiver v 22
  • 23. Classic Solution for BroadcastFaulty source S v x w 1 3 2 23
  • 24. Classic Solution for Broadcast S v x w 1 w w 3 2 24
  • 25. Classic Solution for Broadcast S v x w 1 w w 3 2 v x v x 25
  • 26. Classic Solution for Broadcast S v x w [v,w,x] 1 w w 3 [v,w,x] 2 v x [v,w,x] v x 26
  • 27. Classic Solution for Broadcast S v x w [v,w,x] 1 w w 3 [v,w,x] 2 v x [v,w,x]Vote result videntical atgood receivers x 27
  • 28. Classic Solution in P2P Networks Whole message is sent on every link Throughput ≤ slowest link S Throughput ≤ ε but ε Upper bound = 1+ε 1 3 2 28
  • 29. Improving Broadcast Throughput Observation: classic solution is in fact an “error correction code” “Error detection codes” are more efficient 29
  • 30. Error Detection CodeTwo-bit value a, b S a a+b b 1 3 2 30
  • 31. Error Detection Code Two-bit value a, b S a a+b b[a,b,a+b] 1 b b 3 [a,b,a+b] 2 a a+b [a,b,a+b] a a+b 31
  • 32. Error Detection Code Two-bit value a, b S a a+b b [a,b,a+b] 1 b b 3 [a,b,a+b] 2 a a+b [a,b,a+b] Parity check passes a at all nodes a+b Agree on (a,b) 32
  • 33. Error Detection Code Two-bit value a, b S a a+b b 1 b b 3 [?,b,a+b] 2 ? a+b [?,b,a+b] Parity check fails at a node ?if 1 misbehaves a+b 33
  • 34. Error Detection Code Two-bit value Only detection is a, b not what we want S a z b [a,b,z] 1 b b 3 [a,b,z] 2 a z [a,b,z] Check fails at a good node a if S sends bad zcodeword (a,b,z) 34
  • 35. Modification Agree on small pieces of data in each “round” If X misbehaves with Y in a given round, avoid using XY link in the next round (for next piece of data) Repeat 35
  • 36. Algorithm Structure Fast round (as in the example) 36
  • 37. Algorithm Structure Fast round (as in the example) S a a+b b [a,b,a+b] 1 b b 3 [a,b,a+b] 2 a a+b [a,b,a+b] a a+b 37
  • 38. Algorithm Structure Fast round (as in the example) Fast round… Fast round in which failure is detected Expensive round to learn new info about failure 38
  • 39. Algorithm Structure Fast round (as in the example) Fast round… Fast round in which failure is detected Expensive round to learn new info about failure Fast round Fast round… Expensive round to learn new info about failure. 39
  • 40. Algorithm Structure Fast round (as in the example) Fast round… Fast round in which failure is detected Expensive round to learn new info about failure Fast round Fast round… Expensive round to learn new info about failure. After a small number of expensive rounds, failures completely identified 40
  • 41. Algorithm Structure Fast round (as in the example) Fast round… Fast round in which failure is detected Expensive round to learn new info about failure Fast round Fast round… Expensive round to learn new info about failure. After a small number of rounds failures identified Only fast rounds hereon 41
  • 42. Algorithm “Analysis” Many fast rounds Few expensive rounds When averaged over time, the cost of expensive rounds is negligible Average usage of link capacity depends only on the fast round, which is very efficient Achieves capacity for 4-node networks, and symmetric networks 42
  • 43. OPEN PROBLEMS 43
  • 44. Open Problems Capacity of agreement for general rate regions 44
  • 45. Open Problems Capacity of agreement for general rate regions Even the multicast problem with Byzantine nodes is unsolved - For multicast, sources fault-free 45
  • 46. Rich Problem Space Wireless channel allows overhearing Transmit to 2 at high 1 rate, or low rate ? 2 - Low rate allows S reception at 1 3 46
  • 47. Rich Problem Space Similar questions relevant for any multi-party computation Distributed Communication Computation Multi-party computing under Communication Constraints 47
  • 48. MIND TEASER 48
  • 49. How many bits needed? N nodes each has a k-bit input Check if all inputs are identical At least 1 node “detects” if not identical 2 Intuitive guess: (N-1)k bit 1 Is it the best we can do? 3 49
  • 50. THANK YOU! 50
  • 51. Improving Broadcast Throughput Observation: classic solution is in fact an “error correction” “Error detection” suffices  Disseminate some data  Check if consistent or not  Consistent: decide  Inconsistent: diagnose and adapt  Repeat for new data 51