Fault tolerance in FPGA-based systems
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Triple module redundancy
Triple module redundancy (voter) <ul><li>The voter can be implemented </li></ul><ul><ul><li>with Look-Up Tables (LUTs) </l...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Throughput logic <ul><li>The system will include 3 copies of: </li></ul><ul><ul><li>the module itself </li></ul></ul><ul><...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
State-machine logic <ul><li>State-machines strictly depend on their state </li></ul><ul><ul><li>The voter has to be implem...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
I/O logic (Input) <ul><li>Input pins have to be replicated in order to avoid  single-points-of-failure </li></ul><ul><li>I...
I/O logic (Output) <ul><li>In order to avoid a single-point-of-failure on output pins it is necessary to implement the fol...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
BRAM <ul><li>BRAMs are large block of static memory (4K bits each) that are true dual port and fully synchronous </li></ul...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Error detection and error correction <ul><li>It is more performance and cost effective to correct and error rather than re...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Partial reconfiguration <ul><li>Access to the configuration memory: </li></ul><ul><ul><li>Readback </li></ul></ul><ul><ul>...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Dynamic partial reconfiguration <ul><li>Dynamic partial reconfiguration can be useful to trigger the reconfiguration of th...
Dynamic partial reconfiguration (DWC) <ul><li>Fault detection  and characterization </li></ul><ul><ul><li>Identification o...
Dynamic partial reconfiguration (ro-index) <ul><li>ro-index: the ratio between the occupied area and its  minimal placemen...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Run-time fault reconfiguration <ul><li>Recovery from permanent logic and interconnect faults </li></ul><ul><ul><li>fine-gr...
Run-time fault reconfiguration <ul><li>Assumptions </li></ul><ul><ul><li>Detection of a fault </li></ul></ul><ul><ul><li>L...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Conclusions <ul><li>Reliable systems can be effectively implemented on FPGA devices </li></ul><ul><li>The previously prese...
The end <ul><li>Thank you for your attention </li></ul><ul><li>Do you have any questions? </li></ul>
Upcoming SlideShare
Loading in …5
×

3D-DRESD FT

264
-1

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
264
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

3D-DRESD FT

  1. 1. Fault tolerance in FPGA-based systems
  2. 2. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  3. 3. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  4. 4. Triple module redundancy
  5. 5. Triple module redundancy (voter) <ul><li>The voter can be implemented </li></ul><ul><ul><li>with Look-Up Tables (LUTs) </li></ul></ul><ul><ul><li>with buffer 3-state (BUFT) </li></ul></ul>
  6. 6. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  7. 7. Throughput logic <ul><li>The system will include 3 copies of: </li></ul><ul><ul><li>the module itself </li></ul></ul><ul><ul><li>the input signals </li></ul></ul><ul><ul><li>the output signals </li></ul></ul><ul><li>No voter is needed </li></ul><ul><li>No single-point-of-failure </li></ul>
  8. 8. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  9. 9. State-machine logic <ul><li>State-machines strictly depend on their state </li></ul><ul><ul><li>The voter has to be implemented internally </li></ul></ul><ul><li>A voter has to be inserted in the system for: </li></ul><ul><ul><li>each state register </li></ul></ul><ul><ul><li>each feedback path </li></ul></ul><ul><li>This approach allows to keep each state-machine always in the correct state </li></ul>
  10. 10. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  11. 11. I/O logic (Input) <ul><li>Input pins have to be replicated in order to avoid single-points-of-failure </li></ul><ul><li>If the number of required input pins exceeds the number of input pins available on the reconfigurable devices: </li></ul><ul><ul><li>Just a subset of input pins can be replicated </li></ul></ul><ul><ul><li>The system can be split in more than one FPGA </li></ul></ul>
  12. 12. I/O logic (Output) <ul><li>In order to avoid a single-point-of-failure on output pins it is necessary to implement the following circuit </li></ul>
  13. 13. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  14. 14. BRAM <ul><li>BRAMs are large block of static memory (4K bits each) that are true dual port and fully synchronous </li></ul><ul><li>Techniques: </li></ul><ul><ul><li>Simple redundancy </li></ul></ul><ul><ul><ul><li>Replication of BRAMs </li></ul></ul></ul><ul><ul><li>Redundancy and refresh </li></ul></ul><ul><ul><ul><li>Replication of BRAMs </li></ul></ul></ul><ul><ul><ul><li>Refresh with voter </li></ul></ul></ul><ul><ul><li>Data encryption </li></ul></ul><ul><ul><ul><li>Error Correction Control (ECC) </li></ul></ul></ul>
  15. 15. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  16. 16. Error detection and error correction <ul><li>It is more performance and cost effective to correct and error rather than retransmit the data </li></ul><ul><li>Parity data are added to true data (64+8 or 32+7) </li></ul><ul><li>No memory replication </li></ul>
  17. 17. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  18. 18. Partial reconfiguration <ul><li>Access to the configuration memory: </li></ul><ul><ul><li>Readback </li></ul></ul><ul><ul><ul><li>Post-configuration read operation </li></ul></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><ul><ul><li>Post-configuration write operation </li></ul></ul></ul><ul><li>Techniques: </li></ul><ul><ul><li>SEU scrubbing </li></ul></ul><ul><ul><ul><li>Partial reconfiguration </li></ul></ul></ul><ul><ul><li>SEU detection </li></ul></ul><ul><ul><ul><li>Readback </li></ul></ul></ul><ul><ul><ul><ul><li>Bit for bit comparison </li></ul></ul></ul></ul><ul><ul><ul><ul><li>CRC comparison </li></ul></ul></ul></ul><ul><ul><li>SEU correction </li></ul></ul><ul><ul><ul><li>Readback </li></ul></ul></ul><ul><ul><ul><li>Partial reconfiguration </li></ul></ul></ul>
  19. 19. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  20. 20. Dynamic partial reconfiguration <ul><li>Dynamic partial reconfiguration can be useful to trigger the reconfiguration of the affected portion of the architecture </li></ul><ul><ul><li>while the rest of the system is still working </li></ul></ul><ul><ul><li>without need to perform a complete reconfiguration </li></ul></ul><ul><li>It can be very useful to reconfigure the smallest portion of the FPGA where the fault is located (a good partitioning phase is needed) </li></ul><ul><li>Solution space exploration has to be performed </li></ul>
  21. 21. Dynamic partial reconfiguration (DWC) <ul><li>Fault detection and characterization </li></ul><ul><ul><li>Identification of a mismatch </li></ul></ul><ul><li>Fault localization </li></ul><ul><ul><li>Identification of the portion of the device where the fault is located </li></ul></ul><ul><li>Several solutions with applying DWC </li></ul>
  22. 22. Dynamic partial reconfiguration (ro-index) <ul><li>ro-index: the ratio between the occupied area and its minimal placement constraint, both computed in slices </li></ul><ul><ul><li>Occupied area in Slices: S o </li></ul></ul><ul><ul><li>Placement constraint in Slices: S c </li></ul></ul><ul><ul><li>ro-index = S o / S c </li></ul></ul>
  23. 23. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  24. 24. Run-time fault reconfiguration <ul><li>Recovery from permanent logic and interconnect faults </li></ul><ul><ul><li>fine-grained physical design partitioning </li></ul></ul><ul><li>Faults are localized to small partitioned blocks that have fixed interfaces to the surrounding portion of the device </li></ul><ul><ul><li>affected block are reconfigured with previously generated, functionally equivalent block instances that do not use the faulty resources </li></ul></ul>
  25. 25. Run-time fault reconfiguration <ul><li>Assumptions </li></ul><ul><ul><li>Detection of a fault </li></ul></ul><ul><ul><li>Localization of a fault </li></ul></ul><ul><ul><li>Diagnosis of a fault (just helpful, not necessary) </li></ul></ul><ul><li>Action </li></ul><ul><ul><li>An alternate configuration of the design can be loaded that does not utilize the faulty resources </li></ul></ul><ul><li>Advantages </li></ul><ul><ul><li>extremely low area overhead </li></ul></ul><ul><ul><li>very low timing overhead </li></ul></ul><ul><ul><li>run-time management of faults </li></ul></ul><ul><ul><li>high flexibility </li></ul></ul><ul><li>Disadvantages </li></ul><ul><ul><li>very complex design phase (and run-time management) </li></ul></ul>
  26. 26. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  27. 27. Conclusions <ul><li>Reliable systems can be effectively implemented on FPGA devices </li></ul><ul><li>The previously presented techniques can be combined together in order to improve the overall reliability of the whole design </li></ul><ul><ul><li>TMR combined with SEU correction through partial reconfiguration is a powerful and effective SEU migration strategy </li></ul></ul><ul><li>3-state buffer can be used in order to implement fault tolerance methodologies without wasting LUTs (keeping low the area overhead) </li></ul>
  28. 28. The end <ul><li>Thank you for your attention </li></ul><ul><li>Do you have any questions? </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×