Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Byzantine Generals by Wilfred Springer 3410 views
- Byzantine General Problem - Siddhar... by Siddharth Chaudhry 2111 views
- The byzantine generals problem by NGUYEN VAN LUONG 4269 views
- Blockchain Consensus Protocols by Melanie Swan 39880 views
- Byzantine architecture ppt by Despoina Potnia 78053 views

No Downloads

Total views

5,788

On SlideShare

0

From Embeds

0

Number of Embeds

10

Shares

0

Downloads

239

Comments

0

Likes

3

No embeds

No notes for slide

- 1. Fault Tolerance Paul Krzyzanowski [email_address] [email_address] Distributed Systems Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
- 2. Faults <ul><li>Deviation from expected behavior </li></ul><ul><li>Variety of factors </li></ul><ul><ul><li>hardware </li></ul></ul><ul><ul><li>software </li></ul></ul><ul><ul><li>operator </li></ul></ul><ul><ul><li>Network </li></ul></ul><ul><li>Three categories </li></ul><ul><ul><li>transient faults </li></ul></ul><ul><ul><li>intermittent faults </li></ul></ul><ul><ul><li>permanent faults </li></ul></ul>
- 3. Faults <ul><li>Three categories </li></ul><ul><ul><li>transient faults </li></ul></ul><ul><ul><li>intermittent faults </li></ul></ul><ul><ul><li>permanent faults </li></ul></ul><ul><li>Any fault may be </li></ul><ul><ul><li>fail-silent (fail-stop) </li></ul></ul><ul><ul><li>Byzantine </li></ul></ul><ul><li>synchronous system vs. asynchronous system </li></ul><ul><ul><li>E.g., IP packet versus serial port transmission </li></ul></ul>
- 4. Achieving fault tolerence <ul><li>Redundancy </li></ul><ul><ul><li>information redundancy </li></ul></ul><ul><ul><ul><li>Hamming codes, parity memory ECC memory </li></ul></ul></ul><ul><ul><li>time redundancy </li></ul></ul><ul><ul><ul><li>Timeout & retransmit </li></ul></ul></ul><ul><ul><li>physical redundancy </li></ul></ul><ul><ul><ul><li>TMR, RAID disks, backup servers </li></ul></ul></ul>
- 5. How much fault tolerance? <ul><li>100 % fault-tolerance cannot be achieved. </li></ul><ul><ul><li>The closer we wish to get to 100%, the more expensive the system will be. </li></ul></ul><ul><li>A system is k-fault tolerant if it can withstand k faults. </li></ul><ul><ul><li>Need k+1 components with silent faults k can fail and one will still be working </li></ul></ul><ul><ul><li>Need 2k+1 components with Byzantine faults k can generate false replies: k+1 will provide a majority vote </li></ul></ul>
- 6. Active replication <ul><li>Technique for fault tolerance through physical redundancy </li></ul><ul><li>No redundancy: </li></ul><ul><li>Triple Modular Redundancy (TMR): </li></ul><ul><ul><li>Threefold component replication to detect and correct a single component failure </li></ul></ul>
- 7. Primary backup <ul><li>One server does all the work </li></ul><ul><li>When it fails, backup takes over </li></ul><ul><ul><li>Backup may ping primary with are you alive messages </li></ul></ul><ul><li>Simpler design: no need for multicast </li></ul><ul><li>Works poorly with Byzantine faults </li></ul><ul><li>Recovery may be time-consuming and/or complex </li></ul>
- 8. Agreement in faulty systems <ul><li>Two army problem </li></ul><ul><ul><li>good processors - faulty communication lines </li></ul></ul><ul><ul><li>coordinated attack </li></ul></ul><ul><ul><li>multiple acknowledgement problem </li></ul></ul>
- 9. Agreement in faulty systems <ul><li>Byzantine Generals problem </li></ul><ul><ul><li>reliable communication lines - faulty processors </li></ul></ul><ul><ul><li>n generals head different divisions </li></ul></ul><ul><ul><li>m generals are traitors and are trying to prevent others from reaching agreement </li></ul></ul><ul><ul><ul><li>4 generals agree to attack </li></ul></ul></ul><ul><ul><ul><li>4 generals agree to retreat </li></ul></ul></ul><ul><ul><ul><li>1 traitor tells the 1 st group that he’ll attack and tells the 2 nd group that he’ll retreat </li></ul></ul></ul><ul><ul><li>can the loyal generals reach agreement? </li></ul></ul>
- 10. Agreement in faulty systems <ul><li>Byzantine Generals problem </li></ul><ul><ul><li>Solutions require: </li></ul></ul><ul><ul><ul><li>3m+1 participants for m traitors ( 2m+1 loyal generals) </li></ul></ul></ul><ul><ul><ul><li>m+1 rounds of message exchanges </li></ul></ul></ul><ul><ul><ul><li>O(m 2 ) messages </li></ul></ul></ul><ul><ul><li>Costly solution! </li></ul></ul>
- 11. The end.

No public clipboards found for this slide

Be the first to comment