Your SlideShare is downloading. ×
3D-DRESD FT
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

3D-DRESD FT

219

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
219
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Fault tolerance in FPGA-based systems
  • 2. Outline
    • Techniques:
      • Triple module redundancy
        • Throughput logic
        • State-machine logic
        • I/O logic
        • BRAM
      • Error detection and error correction
      • Partial reconfiguration
    • Real approaches
      • SEU migration through dynamic partial reconfiguration
      • Run-time fault reconfiguration
    • Conclusions
  • 3. Outline
    • Techniques:
      • Triple module redundancy
        • Throughput logic
        • State-machine logic
        • I/O logic
        • BRAM
      • Error detection and error correction
      • Partial reconfiguration
    • Real approaches
      • SEU migration through dynamic partial reconfiguration
      • Run-time fault reconfiguration
    • Conclusions
  • 4. Triple module redundancy
  • 5. Triple module redundancy (voter)
    • The voter can be implemented
      • with Look-Up Tables (LUTs)
      • with buffer 3-state (BUFT)
  • 6. Outline
    • Techniques:
      • Triple module redundancy
        • Throughput logic
        • State-machine logic
        • I/O logic
        • BRAM
      • Error detection and error correction
      • Partial reconfiguration
    • Real approaches
      • SEU migration through dynamic partial reconfiguration
      • Run-time fault reconfiguration
    • Conclusions
  • 7. Throughput logic
    • The system will include 3 copies of:
      • the module itself
      • the input signals
      • the output signals
    • No voter is needed
    • No single-point-of-failure
  • 8. Outline
    • Techniques:
      • Triple module redundancy
        • Throughput logic
        • State-machine logic
        • I/O logic
        • BRAM
      • Error detection and error correction
      • Partial reconfiguration
    • Real approaches
      • SEU migration through dynamic partial reconfiguration
      • Run-time fault reconfiguration
    • Conclusions
  • 9. State-machine logic
    • State-machines strictly depend on their state
      • The voter has to be implemented internally
    • A voter has to be inserted in the system for:
      • each state register
      • each feedback path
    • This approach allows to keep each state-machine always in the correct state
  • 10. Outline
    • Techniques:
      • Triple module redundancy
        • Throughput logic
        • State-machine logic
        • I/O logic
        • BRAM
      • Error detection and error correction
      • Partial reconfiguration
    • Real approaches
      • SEU migration through dynamic partial reconfiguration
      • Run-time fault reconfiguration
    • Conclusions
  • 11. I/O logic (Input)
    • Input pins have to be replicated in order to avoid single-points-of-failure
    • If the number of required input pins exceeds the number of input pins available on the reconfigurable devices:
      • Just a subset of input pins can be replicated
      • The system can be split in more than one FPGA
  • 12. I/O logic (Output)
    • In order to avoid a single-point-of-failure on output pins it is necessary to implement the following circuit
  • 13. Outline
    • Techniques:
      • Triple module redundancy
        • Throughput logic
        • State-machine logic
        • I/O logic
        • BRAM
      • Error detection and error correction
      • Partial reconfiguration
    • Real approaches
      • SEU migration through dynamic partial reconfiguration
      • Run-time fault reconfiguration
    • Conclusions
  • 14. BRAM
    • BRAMs are large block of static memory (4K bits each) that are true dual port and fully synchronous
    • Techniques:
      • Simple redundancy
        • Replication of BRAMs
      • Redundancy and refresh
        • Replication of BRAMs
        • Refresh with voter
      • Data encryption
        • Error Correction Control (ECC)
  • 15. Outline
    • Techniques:
      • Triple module redundancy
        • Throughput logic
        • State-machine logic
        • I/O logic
        • BRAM
      • Error detection and error correction
      • Partial reconfiguration
    • Real approaches
      • SEU migration through dynamic partial reconfiguration
      • Run-time fault reconfiguration
    • Conclusions
  • 16. Error detection and error correction
    • It is more performance and cost effective to correct and error rather than retransmit the data
    • Parity data are added to true data (64+8 or 32+7)
    • No memory replication
  • 17. Outline
    • Techniques:
      • Triple module redundancy
        • Throughput logic
        • State-machine logic
        • I/O logic
        • BRAM
      • Error detection and error correction
      • Partial reconfiguration
    • Real approaches
      • SEU migration through dynamic partial reconfiguration
      • Run-time fault reconfiguration
    • Conclusions
  • 18. Partial reconfiguration
    • Access to the configuration memory:
      • Readback
        • Post-configuration read operation
      • Partial reconfiguration
        • Post-configuration write operation
    • Techniques:
      • SEU scrubbing
        • Partial reconfiguration
      • SEU detection
        • Readback
          • Bit for bit comparison
          • CRC comparison
      • SEU correction
        • Readback
        • Partial reconfiguration
  • 19. Outline
    • Techniques:
      • Triple module redundancy
        • Throughput logic
        • State-machine logic
        • I/O logic
        • BRAM
      • Error detection and error correction
      • Partial reconfiguration
    • Real approaches
      • SEU migration through dynamic partial reconfiguration
      • Run-time fault reconfiguration
    • Conclusions
  • 20. Dynamic partial reconfiguration
    • Dynamic partial reconfiguration can be useful to trigger the reconfiguration of the affected portion of the architecture
      • while the rest of the system is still working
      • without need to perform a complete reconfiguration
    • It can be very useful to reconfigure the smallest portion of the FPGA where the fault is located (a good partitioning phase is needed)
    • Solution space exploration has to be performed
  • 21. Dynamic partial reconfiguration (DWC)
    • Fault detection and characterization
      • Identification of a mismatch
    • Fault localization
      • Identification of the portion of the device where the fault is located
    • Several solutions with applying DWC
  • 22. Dynamic partial reconfiguration (ro-index)
    • ro-index: the ratio between the occupied area and its minimal placement constraint, both computed in slices
      • Occupied area in Slices: S o
      • Placement constraint in Slices: S c
      • ro-index = S o / S c
  • 23. Outline
    • Techniques:
      • Triple module redundancy
        • Throughput logic
        • State-machine logic
        • I/O logic
        • BRAM
      • Error detection and error correction
      • Partial reconfiguration
    • Real approaches
      • SEU migration through dynamic partial reconfiguration
      • Run-time fault reconfiguration
    • Conclusions
  • 24. Run-time fault reconfiguration
    • Recovery from permanent logic and interconnect faults
      • fine-grained physical design partitioning
    • Faults are localized to small partitioned blocks that have fixed interfaces to the surrounding portion of the device
      • affected block are reconfigured with previously generated, functionally equivalent block instances that do not use the faulty resources
  • 25. Run-time fault reconfiguration
    • Assumptions
      • Detection of a fault
      • Localization of a fault
      • Diagnosis of a fault (just helpful, not necessary)
    • Action
      • An alternate configuration of the design can be loaded that does not utilize the faulty resources
    • Advantages
      • extremely low area overhead
      • very low timing overhead
      • run-time management of faults
      • high flexibility
    • Disadvantages
      • very complex design phase (and run-time management)
  • 26. Outline
    • Techniques:
      • Triple module redundancy
        • Throughput logic
        • State-machine logic
        • I/O logic
        • BRAM
      • Error detection and error correction
      • Partial reconfiguration
    • Real approaches
      • SEU migration through dynamic partial reconfiguration
      • Run-time fault reconfiguration
    • Conclusions
  • 27. Conclusions
    • Reliable systems can be effectively implemented on FPGA devices
    • The previously presented techniques can be combined together in order to improve the overall reliability of the whole design
      • TMR combined with SEU correction through partial reconfiguration is a powerful and effective SEU migration strategy
    • 3-state buffer can be used in order to implement fault tolerance methodologies without wasting LUTs (keeping low the area overhead)
  • 28. The end
    • Thank you for your attention
    • Do you have any questions?

×