Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Fault tolerance in FPGA-based systems
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Triple module redundancy
Triple module redundancy (voter) <ul><li>The voter can be implemented </li></ul><ul><ul><li>with Look-Up Tables (LUTs) </l...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Throughput logic <ul><li>The system will include 3 copies of: </li></ul><ul><ul><li>the module itself </li></ul></ul><ul><...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
State-machine logic <ul><li>State-machines strictly depend on their state </li></ul><ul><ul><li>The voter has to be implem...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
I/O logic (Input) <ul><li>Input pins have to be replicated in order to avoid  single-points-of-failure </li></ul><ul><li>I...
I/O logic (Output) <ul><li>In order to avoid a single-point-of-failure on output pins it is necessary to implement the fol...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
BRAM <ul><li>BRAMs are large block of static memory (4K bits each) that are true dual port and fully synchronous </li></ul...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Error detection and error correction <ul><li>It is more performance and cost effective to correct and error rather than re...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Partial reconfiguration <ul><li>Access to the configuration memory: </li></ul><ul><ul><li>Readback </li></ul></ul><ul><ul>...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Dynamic partial reconfiguration <ul><li>Dynamic partial reconfiguration can be useful to trigger the reconfiguration of th...
Dynamic partial reconfiguration (DWC) <ul><li>Fault detection  and characterization </li></ul><ul><ul><li>Identification o...
Dynamic partial reconfiguration (ro-index) <ul><li>ro-index: the ratio between the occupied area and its  minimal placemen...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Run-time fault reconfiguration <ul><li>Recovery from permanent logic and interconnect faults </li></ul><ul><ul><li>fine-gr...
Run-time fault reconfiguration <ul><li>Assumptions </li></ul><ul><ul><li>Detection of a fault </li></ul></ul><ul><ul><li>L...
Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic...
Conclusions <ul><li>Reliable systems can be effectively implemented on FPGA devices </li></ul><ul><li>The previously prese...
The end <ul><li>Thank you for your attention </li></ul><ul><li>Do you have any questions? </li></ul>
Upcoming SlideShare
Loading in …5
×

3D-DRESD FT

510 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

3D-DRESD FT

  1. 1. Fault tolerance in FPGA-based systems
  2. 2. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  3. 3. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  4. 4. Triple module redundancy
  5. 5. Triple module redundancy (voter) <ul><li>The voter can be implemented </li></ul><ul><ul><li>with Look-Up Tables (LUTs) </li></ul></ul><ul><ul><li>with buffer 3-state (BUFT) </li></ul></ul>
  6. 6. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  7. 7. Throughput logic <ul><li>The system will include 3 copies of: </li></ul><ul><ul><li>the module itself </li></ul></ul><ul><ul><li>the input signals </li></ul></ul><ul><ul><li>the output signals </li></ul></ul><ul><li>No voter is needed </li></ul><ul><li>No single-point-of-failure </li></ul>
  8. 8. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  9. 9. State-machine logic <ul><li>State-machines strictly depend on their state </li></ul><ul><ul><li>The voter has to be implemented internally </li></ul></ul><ul><li>A voter has to be inserted in the system for: </li></ul><ul><ul><li>each state register </li></ul></ul><ul><ul><li>each feedback path </li></ul></ul><ul><li>This approach allows to keep each state-machine always in the correct state </li></ul>
  10. 10. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  11. 11. I/O logic (Input) <ul><li>Input pins have to be replicated in order to avoid single-points-of-failure </li></ul><ul><li>If the number of required input pins exceeds the number of input pins available on the reconfigurable devices: </li></ul><ul><ul><li>Just a subset of input pins can be replicated </li></ul></ul><ul><ul><li>The system can be split in more than one FPGA </li></ul></ul>
  12. 12. I/O logic (Output) <ul><li>In order to avoid a single-point-of-failure on output pins it is necessary to implement the following circuit </li></ul>
  13. 13. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  14. 14. BRAM <ul><li>BRAMs are large block of static memory (4K bits each) that are true dual port and fully synchronous </li></ul><ul><li>Techniques: </li></ul><ul><ul><li>Simple redundancy </li></ul></ul><ul><ul><ul><li>Replication of BRAMs </li></ul></ul></ul><ul><ul><li>Redundancy and refresh </li></ul></ul><ul><ul><ul><li>Replication of BRAMs </li></ul></ul></ul><ul><ul><ul><li>Refresh with voter </li></ul></ul></ul><ul><ul><li>Data encryption </li></ul></ul><ul><ul><ul><li>Error Correction Control (ECC) </li></ul></ul></ul>
  15. 15. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  16. 16. Error detection and error correction <ul><li>It is more performance and cost effective to correct and error rather than retransmit the data </li></ul><ul><li>Parity data are added to true data (64+8 or 32+7) </li></ul><ul><li>No memory replication </li></ul>
  17. 17. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  18. 18. Partial reconfiguration <ul><li>Access to the configuration memory: </li></ul><ul><ul><li>Readback </li></ul></ul><ul><ul><ul><li>Post-configuration read operation </li></ul></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><ul><ul><li>Post-configuration write operation </li></ul></ul></ul><ul><li>Techniques: </li></ul><ul><ul><li>SEU scrubbing </li></ul></ul><ul><ul><ul><li>Partial reconfiguration </li></ul></ul></ul><ul><ul><li>SEU detection </li></ul></ul><ul><ul><ul><li>Readback </li></ul></ul></ul><ul><ul><ul><ul><li>Bit for bit comparison </li></ul></ul></ul></ul><ul><ul><ul><ul><li>CRC comparison </li></ul></ul></ul></ul><ul><ul><li>SEU correction </li></ul></ul><ul><ul><ul><li>Readback </li></ul></ul></ul><ul><ul><ul><li>Partial reconfiguration </li></ul></ul></ul>
  19. 19. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  20. 20. Dynamic partial reconfiguration <ul><li>Dynamic partial reconfiguration can be useful to trigger the reconfiguration of the affected portion of the architecture </li></ul><ul><ul><li>while the rest of the system is still working </li></ul></ul><ul><ul><li>without need to perform a complete reconfiguration </li></ul></ul><ul><li>It can be very useful to reconfigure the smallest portion of the FPGA where the fault is located (a good partitioning phase is needed) </li></ul><ul><li>Solution space exploration has to be performed </li></ul>
  21. 21. Dynamic partial reconfiguration (DWC) <ul><li>Fault detection and characterization </li></ul><ul><ul><li>Identification of a mismatch </li></ul></ul><ul><li>Fault localization </li></ul><ul><ul><li>Identification of the portion of the device where the fault is located </li></ul></ul><ul><li>Several solutions with applying DWC </li></ul>
  22. 22. Dynamic partial reconfiguration (ro-index) <ul><li>ro-index: the ratio between the occupied area and its minimal placement constraint, both computed in slices </li></ul><ul><ul><li>Occupied area in Slices: S o </li></ul></ul><ul><ul><li>Placement constraint in Slices: S c </li></ul></ul><ul><ul><li>ro-index = S o / S c </li></ul></ul>
  23. 23. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  24. 24. Run-time fault reconfiguration <ul><li>Recovery from permanent logic and interconnect faults </li></ul><ul><ul><li>fine-grained physical design partitioning </li></ul></ul><ul><li>Faults are localized to small partitioned blocks that have fixed interfaces to the surrounding portion of the device </li></ul><ul><ul><li>affected block are reconfigured with previously generated, functionally equivalent block instances that do not use the faulty resources </li></ul></ul>
  25. 25. Run-time fault reconfiguration <ul><li>Assumptions </li></ul><ul><ul><li>Detection of a fault </li></ul></ul><ul><ul><li>Localization of a fault </li></ul></ul><ul><ul><li>Diagnosis of a fault (just helpful, not necessary) </li></ul></ul><ul><li>Action </li></ul><ul><ul><li>An alternate configuration of the design can be loaded that does not utilize the faulty resources </li></ul></ul><ul><li>Advantages </li></ul><ul><ul><li>extremely low area overhead </li></ul></ul><ul><ul><li>very low timing overhead </li></ul></ul><ul><ul><li>run-time management of faults </li></ul></ul><ul><ul><li>high flexibility </li></ul></ul><ul><li>Disadvantages </li></ul><ul><ul><li>very complex design phase (and run-time management) </li></ul></ul>
  26. 26. Outline <ul><li>Techniques: </li></ul><ul><ul><li>Triple module redundancy </li></ul></ul><ul><ul><ul><li>Throughput logic </li></ul></ul></ul><ul><ul><ul><li>State-machine logic </li></ul></ul></ul><ul><ul><ul><li>I/O logic </li></ul></ul></ul><ul><ul><ul><li>BRAM </li></ul></ul></ul><ul><ul><li>Error detection and error correction </li></ul></ul><ul><ul><li>Partial reconfiguration </li></ul></ul><ul><li>Real approaches </li></ul><ul><ul><li>SEU migration through dynamic partial reconfiguration </li></ul></ul><ul><ul><li>Run-time fault reconfiguration </li></ul></ul><ul><li>Conclusions </li></ul>
  27. 27. Conclusions <ul><li>Reliable systems can be effectively implemented on FPGA devices </li></ul><ul><li>The previously presented techniques can be combined together in order to improve the overall reliability of the whole design </li></ul><ul><ul><li>TMR combined with SEU correction through partial reconfiguration is a powerful and effective SEU migration strategy </li></ul></ul><ul><li>3-state buffer can be used in order to implement fault tolerance methodologies without wasting LUTs (keeping low the area overhead) </li></ul>
  28. 28. The end <ul><li>Thank you for your attention </li></ul><ul><li>Do you have any questions? </li></ul>

×