Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SBCCI08

980 views

Published on

Published in: Travel, Technology
  • Be the first to comment

  • Be the first to like this

SBCCI08

  1. 1. A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures J. M. Martins Ferreira [ jmf@fe.up.pt ] FEUP / DEEC Rua Dr. Roberto Frias 4200-465 Porto - PORTUGAL André Fidalgo, Gustavo R. Alves Manuel Gericota [ anf/gca/mgg @isep.ipp.pt ] ISEP / DEE Rua Ant. Bernardino Almeida, 431 4200-072 Porto - PORTUGAL SBCCI’08: Gramado, Brazil, 1-4 September 2008 These slides are available at http://www.slideshare.net/josemmf
  2. 2. Outline of the presentation <ul><li>Introduction and motivation </li></ul><ul><li>Setup, workbench, workflow </li></ul><ul><li>Experimental results </li></ul><ul><ul><li>Basic, extended and OCD-FI </li></ul></ul><ul><ul><li>OCD-FI extensions (EDAC, RTREG) </li></ul></ul><ul><li>Comparison and discussion </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Scope, focus, setup <ul><li>Scope : usage of OCD resources for validating fault tolerance / fault injection </li></ul><ul><li>Focus : comparative analysis of experimental results for various OCD configurations and debugging scenarios </li></ul><ul><li>Setup : a) 32-bit Freescale MPC-565, iSystem IC3000 (iTracePro), Winidea 2005 b) OCD enhancements in VHDL </li></ul>
  4. 4. Motivation <ul><li>OCD offers controllability and observability features that may be used to inject faults and observe their effect (R/W access to registers and memory) </li></ul><ul><li>Usefulness for fault tolerance validation may be limited in bandwidth, coverage and repeatability / representativeness of results </li></ul><ul><li>Mitigation is possible by enhancing OCD </li></ul>
  5. 5. Our approach <ul><li>Configurations: basic (2:8), extended (8:8), OCD-FI (with a fault injection module) </li></ul><ul><li>Fault injection scenarios: off-line or real-time, predefined or on-the-fly </li></ul><ul><li>OCD-FI is able to cope with error detection / correction and real-time requirements </li></ul><ul><li>Comparison of results uses a common set of workload applications and FI campaigns </li></ul>
  6. 6. NEXUS FI for the MPC565 Trace data: Program trace data output by the OCD Campaign data: scripts that describe the FI experiments Improved Fault Effects Classification 3 Data Trace Real Time Fault Insertion 3 Dynamic Register and Memory Access Fault Effects Classification 2 Program Trace Static Fault Insertion 1 Static Register and Memory Access Real Time Triggering 1 Watchpoints Internal Triggering 1 Breakpoints External Triggering 1 Run-Control Usability for FI Class NEXUS Debug Features
  7. 7. OCD infrastructure developed to support this work <ul><li>NEXUS class 2 compliant with real- -time memory access </li></ul><ul><li>Adjustable data bus </li></ul><ul><li>OCD configurations </li></ul><ul><ul><li>Basic (2,8) </li></ul></ul><ul><ul><li>Extended (8,8) </li></ul></ul><ul><ul><li>OCD-FI: comprises a fault injection module </li></ul></ul>
  8. 8. Fault injection: Workload applications <ul><li>Workload applications: </li></ul><ul><ul><li>Matrix adder (Madder) </li></ul></ul><ul><ul><li>Vector sorter (Vsorter) </li></ul></ul><ul><ul><li>LUT control algorithm (Xcontrol) </li></ul></ul><ul><li>Each application was implemented in two versions: normal and fault tolerant </li></ul><ul><li>Fault tolerance by duplicating data in memory and repeating each operation </li></ul>
  9. 9. Fault injection campaigns <ul><li>Scripts that define 10 FI experiments during system operation </li></ul><ul><li>100 campaigns were executed for each scenario using the three workload applications (Madder, Vsorter, Xcontrol) </li></ul><ul><li>FI campaigns mostly target memory positions and cause a bit-flip to emulate SEU effects </li></ul>
  10. 10. Predetermination to improve performance of FI campaigns <ul><li>Predetermination of the contents of the target memory cell at the FI instant may be done through a “gold run” or by ensuring: </li></ul><ul><ul><li>Complete knowledge of the program flow </li></ul></ul><ul><ul><li>Full observability of external inputs </li></ul></ul><ul><ul><li>Precise control of the FI instant and location </li></ul></ul><ul><li>Otherwise the target memory cell must be read “immediately” before the FI instant </li></ul>
  11. 11. Experimental scenarios B : Basic; E : Extended; OCD-FI : OCD for Fault Injection OF : Off-line; RT : Real-time; + : predetermination not required 4 57 Real Time NO MDI=2 MDO=8 OCD-FI+ 2 57 Real Time YES MDI=2 MDO=8 OCD-FI 18 6 Real Time NO MDI=8 MDO=8 ERT+ 9 6 Real Time YES MDI=8 MDO=8 ERT 44 22 Real Time NO MDI=2 MDO=8 BRT+ 35 22 Real Time YES MDI=2 MDO=8 BRT 18 6 Offline NO MDI=8 MDO=8 EOF+ 9 6 Offline YES MDI=8 MDO=8 EOF 44 22 Offline NO MDI=2 MDO=8 BOF+ 35 22 Offline YES MDI=2 MDO=8 BOF Insertion Set-Up Delays (Clk cycles) Fault injection method Predetermination of the faulty value Bandwidth Configur. & Scenario
  12. 12. Experimental results (%): B, E, OCD-FI (results) U ERR : Undetected errors (incorrect final result that goes undetected) D ERR : Detected errors (error detection signal activated) N ERR : No errors (application ended correctly) 70,1 1,1 28,8 70,2 29,8 1,2 1,9 96,9 2 98 58 13,9 28,1 80,9 19,1 OCD-FI+ 69,9 1,2 28,9 70,4 29,6 1,3 1,9 96,8 1,9 98,1 58 13,8 28,2 80,7 19,3 ERT+ 69,4 1,5 29,1 70,7 29,3 1,4 1,9 96,7 1,8 98,2 57,8 13,8 28,4 80,5 19,5 BRT+ 1 2 97 2 98 58,1 13,9 28 81 19 OCD-FI 1,1 2 96,9 2 98 58 13,9 28,1 80,8 19,2 ERT 1,2 2 96,8 1,9 98,1 57,9 13,8 28,3 80,6 19,4 BRT Not Possible 1 2 97 2 98 58,1 13,9 28 81 19 OFF N ERR U ERR D ERR N ERR U ERR N ERR U ERR D ERR N ERR U ERR N ERR U ERR D ERR N ERR U ERR SW-FT non-FT SW-FT non-FT SW-FT non-FT Configur . & Scenario XControl VSorter MAdder  
  13. 13. Experimental results (%): Erroneous fault insertions <ul><li>Further experiments in RT scenarios were carried out to identify erroneous FI which were classified as Inconclusive (INC) </li></ul>1,3 1,2 1,7 0,3 0,2 0,4 OCD-FI+ 2,4 2,1 3,7 1,5 0,8 2 ERT+ 3,2 2,8 4,8 2,1 1,2 3 BRT+ 0,2 0,2 0,1 0,2 OCD-FI 1,1 2,3 0,6 1,4 ERT Not Possible 2,2 4 Not Possible 0,9 3,1 BRT 0 0 OFF XControl VSorter MAdder XControl VSorter MAdder SW-FT non-FT Configur. & Scenario
  14. 14. Experimental results: Pros and cons of FI methods <ul><li>Off-line configurations always produce the most reliable results </li></ul><ul><li>The CPU may overwrite the target memory cell before the FI is complete (INC) </li></ul><ul><li>INC results increase with the delay between fault triggering and fault insertion, and are mitigated by OCD-FI and predetermination </li></ul>
  15. 15. Experimental results (%): OCD-FI extensions for EDAC <ul><li>FT versions of the workload applications were not used due to EDAC </li></ul>D ERR : Percentage of errors detected that were corrected by EDAC 0,5 69,5 0 30 1 69,1 0 29,9 XControl 0,3 0,7 0 99 0,9 0,8 0 98,3 VSorter 0,8 59,5 0 39,7 1,6 58,8 0 39,6 MAdder INC Nerr Uerr Derr INC Nerr Uerr Derr Predetermination No Predetermination
  16. 16. Experimental results: Pros and cons of OCD-FI EDAC extensions <ul><li>EDAC mechanisms effectively eliminate the effects of single bit-flip errors on the target system </li></ul><ul><li>The OCD-FI EDAC extension enables FI into protected memory blocks </li></ul>
  17. 17. Experimental results (%): OCD-FI for RTREG <ul><li>RT register access requires a collision manager that degrades dynamic performance… </li></ul>40 14 46 40 60 VSorter 16 22 62 11 89 MAdder Nerr Uerr Derr Nerr Uerr   SW-FT non-FT
  18. 18. Experimental results: Pros and cons of OCD-FI RTREG extensions <ul><li>Due to their higher occurrence rate, INC results were explicitly avoided </li></ul><ul><li>Not all code lines qualify to trigger a FI experiment (45% of the code lines could be used for triggering accumulator FI) </li></ul><ul><li>FI results and software fault tolerance efficiency differ significantly between registers and memory </li></ul>
  19. 19. Performance (FI rate) <ul><li>Maximum faults / second rates (single bit-flips on the same memory cell, 30 MHz clock frequency): </li></ul>483k 491k OCD_FI+ 1150k 1250k ERT+ 400k 454k BRT+ 1150k EOF+ 400k Not possible BOF+ Halted Access Real Time Conf. & Scenario
  20. 20. Performance (overhead, dynamic) <ul><li>Silicon overhead and maximum operating frequency on a Virtex-2 FPGA: </li></ul>25 108,3% 77484 x x +BOTH   x 27 106,8% 76392 x   +RTREG   x 32 102,3% 73184   x +EDAC   x 36 100,4% 71842     x   x 36 106,4% 76127       ERT x 32 101,5% 72619   x   BRT x 36 100,0% 71527       BRT x 32 76,9% 55018   x     x 37 75,4% 53926         x [MHz] [%] [Eq Gates] Max f Overhead Area RTREG EDAC OCD-FI OCD CPU Core
  21. 21. Conclusions <ul><li>Wide spectrum (FPGA, ASIC, etc.) </li></ul><ul><li>FI rate does not justify real-time </li></ul><ul><li>Low overhead </li></ul><ul><li>Better C&O than radiation techniques </li></ul><ul><li>Less intrusive than software techniques </li></ul><ul><li>Should be used with the final HW and SW </li></ul><ul><li>Limitations in coverage, lack of standards </li></ul>

×