Published on

Published in: Technology, Design
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. SEU M itigation for SRAM-B ased FPGAs through D ynamic P artial R econfiguration - 3D-DRESD Second Edition -
  2. 2. Motivations <ul><li>Designing reliable systems implemented on FPGAs, able to cope with the effects of faults caused by radiations </li></ul><ul><ul><li>Appling already known and well studied detection and recovery techniques to novel scenarios </li></ul></ul><ul><ul><li>Exploiting dynamic partial reconfiguration to trigger the reconfiguration of the affected portion of the architecture </li></ul></ul><ul><ul><ul><li>… while the rest of the system is still working </li></ul></ul></ul><ul><ul><ul><li>… without need to entirely reprogrammed the system </li></ul></ul></ul>
  3. 3. Outline <ul><li>Goals </li></ul><ul><li>Starting point </li></ul><ul><ul><li>Fault tolerance and reliability </li></ul></ul><ul><ul><li>Reconfigurable architecture </li></ul></ul><ul><ul><li>Related work </li></ul></ul><ul><li>The proposed approach </li></ul><ul><ul><li>Requirements </li></ul></ul><ul><ul><li>Solution space exploration </li></ul></ul><ul><li>Project roadmap </li></ul><ul><ul><li>Completed steps </li></ul></ul><ul><ul><li>Work in progress </li></ul></ul><ul><li>Other works </li></ul><ul><li>Conclusions and Future Work </li></ul>
  4. 4. Goals <ul><li>Design space exploration w.r.t. reliability </li></ul><ul><ul><li>Apply traditional, sound techniques in a different context, exploiting the peculiarity of the platform </li></ul></ul><ul><ul><li>Evaluate the alternative designs, comparing costs, performance and fault detection properties </li></ul></ul><ul><ul><li>Support the designer in selecting the most convenient solution </li></ul></ul>
  5. 5. Fault Model && Reliability <ul><li>Adopted fault model </li></ul><ul><ul><li>Radiation and  -particles caused </li></ul></ul><ul><ul><li>Single Event Transient (SET), Single Event Upset (SEU) </li></ul></ul><ul><li>Bit-flip </li></ul><ul><ul><li>Temporary – data and control registers </li></ul></ul><ul><ul><li>Permanent – configuration memory </li></ul></ul>
  6. 6. Reconfigurable Scenario <ul><li>FPGAs: </li></ul><ul><ul><li>Xilinx family </li></ul></ul><ul><ul><ul><li>(Virtex, VirtexII, VirtexIIPro, Virtex4, ...) </li></ul></ul></ul><ul><li>Reconfiguration </li></ul><ul><ul><li>Modular design flow </li></ul></ul><ul><ul><ul><li>E.g., Early Access Partial Reconfiguration (EAPR) </li></ul></ul></ul>
  7. 7. Related Work <ul><li>TMR at different levels of abstraction replication of the entire circuit or of each register </li></ul><ul><li>Periodic bitstream scrubbing </li></ul><ul><li>Bitstream readback </li></ul><ul><ul><li>Area overhead, latency in recovering and power consumption </li></ul></ul>
  8. 8. Proposed Approach <ul><li>Fault detection and masking </li></ul><ul><ul><li>Duplication with comparison (DWC) </li></ul></ul><ul><ul><li>Triple Modular Redundancy (TMR) </li></ul></ul><ul><ul><li>Redundant Codes </li></ul></ul><ul><ul><li>presented in the 70s and 80s </li></ul></ul><ul><li>Recovery </li></ul><ul><ul><li>Partial dynamic reconfiguration </li></ul></ul>
  9. 9. Requirements <ul><li>Fault detection and characterization </li></ul><ul><ul><li>Identification of a mismatch </li></ul></ul><ul><ul><li>Detect if transient or permanent </li></ul></ul><ul><li>Fault localization </li></ul><ul><ul><li>Identification of the portion of the device where the fault occurred </li></ul></ul><ul><li>Partial reconfiguration </li></ul><ul><ul><li>Reconfiguration of the smallest portion of the FPGA if fault effect is characterized as permanent </li></ul></ul>
  10. 10. Design Space Exploration <ul><li>Several solutions with applying DWC </li></ul><ul><li>Several solutions with applying TMR </li></ul>
  11. 11. Design Space Exploration <ul><li>Discarding of disadvantageous solutions </li></ul><ul><ul><li>For instance, elimination of not required error controlling modules (E.g.: voters) </li></ul></ul>
  12. 12. Design Space Exploration <ul><li>Presented issues lead to the definition of a framework for the design space exploration </li></ul><ul><li>It aims at </li></ul><ul><ul><li>Estimating the costs and benefits deriving from the possible different solutions </li></ul></ul><ul><ul><li>Exploring the solution space on the based of several metrics </li></ul></ul><ul><ul><ul><li>E.g.: size of the subsystems, size of the data widths </li></ul></ul></ul><ul><ul><li>Identifying most promising solutions </li></ul></ul>
  13. 13. <ul><li>Project roadmap: </li></ul><ul><li>Completed steps </li></ul>
  14. 14. Case Studies <ul><li>Noekeon algorithm: </li></ul><ul><ul><li>Block cipher ( 128-bit key, 128-bit block) </li></ul></ul><ul><li>FIR filter: </li></ul><ul><ul><li>Simple and regular architecture </li></ul></ul>
  15. 15. A first attempt <ul><li>Few solutions have been implemented </li></ul><ul><ul><li>DWC (or TMR) has been adopted </li></ul></ul><ul><ul><li>Each solution proposes a different grouping of system modules and a different placement on reconfigurable areas </li></ul></ul>
  16. 16. Exhaustive exploration of solution space <ul><li>Considering TMR, all the possible solutions have been generated (not implemented!) </li></ul><ul><li>An all-to-all comparison have been performed to choose most promising ones and to discard least interesting </li></ul><ul><ul><li>Area occupation has been taken into account as metric </li></ul></ul><ul><ul><li>Solution area have been estimated by adding single module area occupations </li></ul></ul>
  17. 17. <ul><li>Project roadmap: </li></ul><ul><li>Work in progress </li></ul>
  18. 18. Exhaustive exploration of solution space <ul><li>Designing an algorithm that </li></ul><ul><ul><li>Enables a “smart” exploration of the solution space </li></ul></ul><ul><ul><li>Enable the search of the most promising solutions on the base of an objective function that considers cost/benefit metrics </li></ul></ul><ul><ul><li>Explores the design space considering more than one technique (E.g.: TMR, DWC, redundant codes) </li></ul></ul>
  19. 19. Implementing the framework <ul><li>A first draft </li></ul>RoadRunner Lib (TRC, ...) Project Lib Top Module VHDL Transf. XML Mod. VHDL VHDL Parser VHDL Re-builder Mod. VHDL Rec Arch VHDL Graph Manipulator Rec Lib (TRC, ...) Component Syntheses Constraint File Builder Constr File Tranf. Rules (Rec, TMR,...)
  20. 20. Other works <ul><li>Another related work deals with the design of a fault injector for FPGA </li></ul><ul><li>Motivations: </li></ul><ul><ul><li>Reliability assessment is an important task when designing reliable embedded systems </li></ul></ul><ul><ul><li>It is usually performed by means of fault injector experiments </li></ul></ul><ul><li>Requirements: </li></ul><ul><ul><li>Stop the execution preserving system state </li></ul></ul><ul><ul><li>Inject a fault by downloading a partial bitstream </li></ul></ul><ul><ul><ul><li>It should allow corruption of both data registers and configuration memory </li></ul></ul></ul><ul><ul><li>Restart the execution </li></ul></ul><ul><ul><li>IMPORTANT ISSUES: osservability and controllability of fault injection </li></ul></ul>
  21. 21. Conclusions and Future Work <ul><li>We proposed guidelines for evaluating various alternatives for SEU mitigation techniques </li></ul><ul><li>We applied DWC and TMR to detect faults and partial dynamic reconfiguration to recover </li></ul><ul><li>We explored exhaustively the solution space considering a single technique </li></ul><ul><li>Next steps: </li></ul><ul><ul><li>Automatic system partitioning in reliable areas </li></ul></ul><ul><ul><li>Gathering alternative concurrent error detection techniques </li></ul></ul><ul><ul><li>Designing an EAPR-based flow </li></ul></ul>
  22. 22. Questions ?