Upcoming SlideShare
Loading in...5







Total Views
Views on SlideShare
Embed Views



1 Embed 7

http://www.dresd.org 7



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    3D-DRESD R4R 3D-DRESD R4R Presentation Transcript

    • SEU M itigation for SRAM-B ased FPGAs through D ynamic P artial R econfiguration - 3D-DRESD Second Edition -
    • Motivations
      • Designing reliable systems implemented on FPGAs, able to cope with the effects of faults caused by radiations
        • Appling already known and well studied detection and recovery techniques to novel scenarios
        • Exploiting dynamic partial reconfiguration to trigger the reconfiguration of the affected portion of the architecture
          • … while the rest of the system is still working
          • … without need to entirely reprogrammed the system
    • Outline
      • Goals
      • Starting point
        • Fault tolerance and reliability
        • Reconfigurable architecture
        • Related work
      • The proposed approach
        • Requirements
        • Solution space exploration
      • Project roadmap
        • Completed steps
        • Work in progress
      • Other works
      • Conclusions and Future Work
    • Goals
      • Design space exploration w.r.t. reliability
        • Apply traditional, sound techniques in a different context, exploiting the peculiarity of the platform
        • Evaluate the alternative designs, comparing costs, performance and fault detection properties
        • Support the designer in selecting the most convenient solution
    • Fault Model && Reliability
      • Adopted fault model
        • Radiation and  -particles caused
        • Single Event Transient (SET), Single Event Upset (SEU)
      • Bit-flip
        • Temporary – data and control registers
        • Permanent – configuration memory
    • Reconfigurable Scenario
      • FPGAs:
        • Xilinx family
          • (Virtex, VirtexII, VirtexIIPro, Virtex4, ...)
      • Reconfiguration
        • Modular design flow
          • E.g., Early Access Partial Reconfiguration (EAPR)
    • Related Work
      • TMR at different levels of abstraction replication of the entire circuit or of each register
      • Periodic bitstream scrubbing
      • Bitstream readback
        • Area overhead, latency in recovering and power consumption
    • Proposed Approach
      • Fault detection and masking
        • Duplication with comparison (DWC)
        • Triple Modular Redundancy (TMR)
        • Redundant Codes
        • presented in the 70s and 80s
      • Recovery
        • Partial dynamic reconfiguration
    • Requirements
      • Fault detection and characterization
        • Identification of a mismatch
        • Detect if transient or permanent
      • Fault localization
        • Identification of the portion of the device where the fault occurred
      • Partial reconfiguration
        • Reconfiguration of the smallest portion of the FPGA if fault effect is characterized as permanent
    • Design Space Exploration
      • Several solutions with applying DWC
      • Several solutions with applying TMR
    • Design Space Exploration
      • Discarding of disadvantageous solutions
        • For instance, elimination of not required error controlling modules (E.g.: voters)
    • Design Space Exploration
      • Presented issues lead to the definition of a framework for the design space exploration
      • It aims at
        • Estimating the costs and benefits deriving from the possible different solutions
        • Exploring the solution space on the based of several metrics
          • E.g.: size of the subsystems, size of the data widths
        • Identifying most promising solutions
      • Project roadmap:
      • Completed steps
    • Case Studies
      • Noekeon algorithm:
        • Block cipher ( 128-bit key, 128-bit block)
      • FIR filter:
        • Simple and regular architecture
    • A first attempt
      • Few solutions have been implemented
        • DWC (or TMR) has been adopted
        • Each solution proposes a different grouping of system modules and a different placement on reconfigurable areas
    • Exhaustive exploration of solution space
      • Considering TMR, all the possible solutions have been generated (not implemented!)
      • An all-to-all comparison have been performed to choose most promising ones and to discard least interesting
        • Area occupation has been taken into account as metric
        • Solution area have been estimated by adding single module area occupations
      • Project roadmap:
      • Work in progress
    • Exhaustive exploration of solution space
      • Designing an algorithm that
        • Enables a “smart” exploration of the solution space
        • Enable the search of the most promising solutions on the base of an objective function that considers cost/benefit metrics
        • Explores the design space considering more than one technique (E.g.: TMR, DWC, redundant codes)
    • Implementing the framework
      • A first draft
      RoadRunner Lib (TRC, ...) Project Lib Top Module VHDL Transf. XML Mod. VHDL VHDL Parser VHDL Re-builder Mod. VHDL Rec Arch VHDL Graph Manipulator Rec Lib (TRC, ...) Component Syntheses Constraint File Builder Constr File Tranf. Rules (Rec, TMR,...)
    • Other works
      • Another related work deals with the design of a fault injector for FPGA
      • Motivations:
        • Reliability assessment is an important task when designing reliable embedded systems
        • It is usually performed by means of fault injector experiments
      • Requirements:
        • Stop the execution preserving system state
        • Inject a fault by downloading a partial bitstream
          • It should allow corruption of both data registers and configuration memory
        • Restart the execution
        • IMPORTANT ISSUES: osservability and controllability of fault injection
    • Conclusions and Future Work
      • We proposed guidelines for evaluating various alternatives for SEU mitigation techniques
      • We applied DWC and TMR to detect faults and partial dynamic reconfiguration to recover
      • We explored exhaustively the solution space considering a single technique
      • Next steps:
        • Automatic system partitioning in reliable areas
        • Gathering alternative concurrent error detection techniques
        • Designing an EAPR-based flow
    • Questions ?