Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Practical Byzantine Fault Tolerance <ul><li>by Miguel Castro and Barbara Liskov </li></ul><ul><li>Matthias Eberli,  Anna Y...
Outline <ul><li>Introduction </li></ul><ul><li>System model </li></ul><ul><li>Algorithm </li></ul><ul><li>Implementation <...
Introduction <ul><li>Replication </li></ul><ul><ul><li>BFT </li></ul></ul><ul><ul><li>Asynchronous </li></ul></ul><ul><ul>...
System model <ul><li>Asynchronous </li></ul><ul><li>Possible faults: </li></ul><ul><ul><li>Fail to deliver messages </li><...
Cryptography <ul><li>To prevent: </li></ul><ul><ul><li>Spoofing </li></ul></ul><ul><ul><li>Replays </li></ul></ul><ul><li>...
Algorithm properties <ul><li>Safety </li></ul><ul><ul><li>Linearizability </li></ul></ul><ul><li>Liveness </li></ul><ul><u...
Algorithm <ul><li>State-machine replication </li></ul><ul><li>Total order for the execution </li></ul><ul><li>Views:  p = ...
Algorithm: Client Request:  o, t, c Reply:  v, t, c, i, r
Algorithm: 3-phase protocol <ul><li>Atomic multicast to the replicas </li></ul>
3-phase protocol <ul><li>Non-deterministic cheat: </li></ul><ul><ul><li>3-phase for  (request + r 0   + r 1  + ...) </li><...
Pre-prepare phase <ul><li>Backup accepts if: </li></ul><ul><ul><li>signature in the  request  and  pre-prepare  are correc...
Prepare phase <ul><li>prepared(m, v, n, i) </li></ul><ul><ul><li>request  m </li></ul></ul><ul><ul><li>pre-prepare  for m ...
Total order <ul><li>prepared(m, v, n, i)  ⇒  !prepared(m’, v, n, j)  for any non-faulty replica ( m’ ≠ m ) </li></ul><ul><...
Commit phase <ul><li>commited(m, v, n) </li></ul><ul><ul><li>prepared(m, v, n, i)  for  (f + 1)  replicas </li></ul></ul><...
Agreement <ul><li>commited-local(m, v, n, i)  for some non-faulty replica ⇒  commited(m, v, n)   </li></ul><ul><li>So, non...
Logging <ul><li>messages are kept until request is executed by  (f + 1)  non-faulty replicas </li></ul><ul><li>proof that ...
If the primary fails <ul><li>n : #(last stable checkpoint),  C : proof </li></ul><ul><li>P :  {P m :  pre-prepare, (2f) pr...
Correctness:  Safety <ul><ul><li>total order of local commits on non-faulty replicas in the same view </li></ul></ul><ul><...
Correctness:  Liveness <ul><ul><li>mcast  view-change , wait for  (2f + 1)   view-changes , start timer  T </li></ul></ul>...
Optimization <ul><li>Don’t send large replies  </li></ul><ul><ul><li>One replica sends full reply </li></ul></ul><ul><ul><...
Implementation <ul><li>Replication library </li></ul><ul><ul><li>Client:  invoke </li></ul></ul><ul><ul><li>Server:  execu...
Implementation
Performance
Upcoming SlideShare
Loading in …5
×

PBFT

3,809 views

Published on

Published in: Education
  • Be the first to comment

PBFT

  1. 1. Practical Byzantine Fault Tolerance <ul><li>by Miguel Castro and Barbara Liskov </li></ul><ul><li>Matthias Eberli, Anna Yudina </li></ul>
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>System model </li></ul><ul><li>Algorithm </li></ul><ul><li>Implementation </li></ul><ul><li>Evaluation </li></ul>
  3. 3. Introduction <ul><li>Replication </li></ul><ul><ul><li>BFT </li></ul></ul><ul><ul><li>Asynchronous </li></ul></ul><ul><ul><li>Pretty fast </li></ul></ul><ul><ul><li>Practical! </li></ul></ul>
  4. 4. System model <ul><li>Asynchronous </li></ul><ul><li>Possible faults: </li></ul><ul><ul><li>Fail to deliver messages </li></ul></ul><ul><ul><li>Delayed messages </li></ul></ul><ul><ul><li>Deliver out of order </li></ul></ul><ul><ul><li>Byzantine faults </li></ul></ul><ul><li>Independent node failures </li></ul>
  5. 5. Cryptography <ul><li>To prevent: </li></ul><ul><ul><li>Spoofing </li></ul></ul><ul><ul><li>Replays </li></ul></ul><ul><li>Public-key signatures: <m> σ i </li></ul><ul><li>Authentication codes </li></ul><ul><li>Digest: D(m) </li></ul><ul><li>Assumption: adversary is polynomial </li></ul>
  6. 6. Algorithm properties <ul><li>Safety </li></ul><ul><ul><li>Linearizability </li></ul></ul><ul><li>Liveness </li></ul><ul><ul><li>Synchrony (otherwise we break FLP impossibility) </li></ul></ul><ul><ul><li>delay(t) = O(t) </li></ul></ul>
  7. 7. Algorithm <ul><li>State-machine replication </li></ul><ul><li>Total order for the execution </li></ul><ul><li>Views: p = v mod |R| </li></ul>
  8. 8. Algorithm: Client Request: o, t, c Reply: v, t, c, i, r
  9. 9. Algorithm: 3-phase protocol <ul><li>Atomic multicast to the replicas </li></ul>
  10. 10. 3-phase protocol <ul><li>Non-deterministic cheat: </li></ul><ul><ul><li>3-phase for (request + r 0 + r 1 + ...) </li></ul></ul><ul><ul><li>deterministic value choice </li></ul></ul>
  11. 11. Pre-prepare phase <ul><li>Backup accepts if: </li></ul><ul><ul><li>signature in the request and pre-prepare are correct, d — digest for m </li></ul></ul><ul><ul><li>backup is in view </li></ul></ul><ul><ul><li>it hasn’t accepted a pre-prepare for v and sequence number n containing a different digest </li></ul></ul><ul><ul><li>sequence number in pre-prepare ∈ [h, H] </li></ul></ul>
  12. 12. Prepare phase <ul><li>prepared(m, v, n, i) </li></ul><ul><ul><li>request m </li></ul></ul><ul><ul><li>pre-prepare for m in v with seq. number n </li></ul></ul><ul><ul><li>( 2f) prepares that match pre-prepare </li></ul></ul>
  13. 13. Total order <ul><li>prepared(m, v, n, i) ⇒ !prepared(m’, v, n, j) for any non-faulty replica ( m’ ≠ m ) </li></ul><ul><li>prepared(m, v, n, i) AND |R| = 3f + 1 ⇒ (f + 1) non-faulty replicas sent pre-prepare / prepare(m, v, n, i) </li></ul><ul><li>prepared(m’, v, n, j) ⇒ some of (f + 1) replicas sent 2 conflicting pre-prepares / prepares </li></ul><ul><li>replicas are non-faulty ⇒ contradiction </li></ul>
  14. 14. Commit phase <ul><li>commited(m, v, n) </li></ul><ul><ul><li>prepared(m, v, n, i) for (f + 1) replicas </li></ul></ul><ul><li>commited_local(m, v, n, i) </li></ul><ul><ul><li>prepared(m, v, n, i) </li></ul></ul><ul><ul><li>(2f + 1) commits accepted on i </li></ul></ul>
  15. 15. Agreement <ul><li>commited-local(m, v, n, i) for some non-faulty replica ⇒ commited(m, v, n) </li></ul><ul><li>So, non-faulty replicas agree on the sequence numbers of requests that commit locally </li></ul>
  16. 16. Logging <ul><li>messages are kept until request is executed by (f + 1) non-faulty replicas </li></ul><ul><li>proof that state is correct: checkpoints </li></ul><ul><ul><li>multicast <CHECKPOINT, n, d, i> σ i </li></ul></ul><ul><ul><li>collect these messages until get (2f + 1) </li></ul></ul><ul><ul><li>discard all earlier messages ( < n ) </li></ul></ul><ul><ul><li>watermarks: h = n , H = h + k </li></ul></ul>
  17. 17. If the primary fails <ul><li>n : #(last stable checkpoint), C : proof </li></ul><ul><li>P : {P m : pre-prepare, (2f) prepares} for m: #(m) > n </li></ul><ul><li>V : (2f+1) view-changes </li></ul><ul><li>O: pre-prepares without the piggybacked request (from the last stable checkpoint) </li></ul>
  18. 18. Correctness: Safety <ul><ul><li>total order of local commits on non-faulty replicas in the same view </li></ul></ul><ul><ul><li>view-change: m commits on i iff commited(m, v, n) </li></ul></ul><ul><ul><li>∃ R 1 of (f+1) n-f replicas: prepared(m, v, n, i) </li></ul></ul><ul><ul><li>n-f replicas won’t accept pre-prepare for v’ > v without new-view for v’ </li></ul></ul><ul><ul><li>∀ correct new-view for v’ contains view-changes from R 2 of (2f+1) replicas </li></ul></ul><ul><ul><li>R 1 and R 2 intersect in at least one n-f replica k </li></ul></ul><ul><ul><li>k’ s view-change ensures correct processing of m </li></ul></ul>
  19. 19. Correctness: Liveness <ul><ul><li>mcast view-change , wait for (2f + 1) view-changes , start timer T </li></ul></ul><ul><ul><ul><li>if timer expired, start new view-change , start timer 2T </li></ul></ul></ul><ul><ul><li>if receive (f + 1) view-changes for (v+1, v+2, ...) , send view-change for (v+1) </li></ul></ul><ul><ul><li>faulty replica can cause a view change only if it’s primary (not more than f times in a row) </li></ul></ul>
  20. 20. Optimization <ul><li>Don’t send large replies </li></ul><ul><ul><li>One replica sends full reply </li></ul></ul><ul><ul><li>Others send just digests </li></ul></ul><ul><li>Reduce message delays </li></ul><ul><ul><li>Replicas execute and reply before commit </li></ul></ul><ul><ul><li>Client waits for (2f + 1) replies, or retransmits request and act as usual </li></ul></ul><ul><li>Commit read-only operations immediately </li></ul>
  21. 21. Implementation <ul><li>Replication library </li></ul><ul><ul><li>Client: invoke </li></ul></ul><ul><ul><li>Server: execute, make/delete/get/set_checkpoint, get_digest </li></ul></ul><ul><li>UDP over IP multicast </li></ul><ul><li>BFT file system </li></ul>
  22. 22. Implementation
  23. 23. Performance

×