A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures J. M. Martins Ferreira  [ jmf...
Outline of the presentation <ul><li>Introduction and motivation </li></ul><ul><li>Setup, workbench, workflow </li></ul><ul...
Scope, focus, setup <ul><li>Scope : usage of OCD resources for validating fault tolerance / fault injection </li></ul><ul>...
Motivation <ul><li>OCD offers controllability and observability features that may be used to inject faults and observe the...
Our approach <ul><li>Configurations: basic (2:8), extended (8:8), OCD-FI (with a fault injection module) </li></ul><ul><li...
NEXUS FI for the MPC565 Trace data: Program trace data output by the OCD  Campaign data: scripts that describe the FI expe...
OCD infrastructure developed to support this work <ul><li>NEXUS class 2  compliant with real- -time memory access </li></u...
Fault injection:  Workload applications <ul><li>Workload applications: </li></ul><ul><ul><li>Matrix adder (Madder) </li></...
Fault injection campaigns <ul><li>Scripts that define 10 FI experiments during system operation </li></ul><ul><li>100 camp...
Predetermination to improve performance of FI campaigns <ul><li>Predetermination of the contents of the target memory cell...
Experimental scenarios B : Basic;  E : Extended;  OCD-FI  : OCD for Fault Injection OF : Off-line;  RT : Real-time;  + : p...
Experimental results (%):  B, E, OCD-FI (results) U ERR : Undetected errors (incorrect final result that goes undetected) ...
Experimental results (%): Erroneous fault insertions <ul><li>Further experiments in RT scenarios were carried out to ident...
Experimental results:  Pros and cons of FI methods <ul><li>Off-line configurations always produce the most reliable result...
Experimental results (%):  OCD-FI extensions for EDAC <ul><li>FT versions of the workload applications were not used due t...
Experimental results: Pros and cons of OCD-FI EDAC extensions <ul><li>EDAC mechanisms effectively eliminate the effects of...
Experimental results (%):  OCD-FI for RTREG <ul><li>RT register access requires a collision manager that degrades dynamic ...
Experimental results: Pros and cons of OCD-FI RTREG extensions <ul><li>Due to their higher occurrence rate, INC results we...
Performance (FI rate) <ul><li>Maximum faults / second rates (single bit-flips on the same memory cell, 30 MHz clock freque...
Performance (overhead, dynamic) <ul><li>Silicon overhead and maximum operating frequency on a Virtex-2 FPGA: </li></ul>25 ...
Conclusions <ul><li>Wide spectrum (FPGA, ASIC, etc.) </li></ul><ul><li>FI rate does not justify real-time </li></ul><ul><l...
Upcoming SlideShare
Loading in …5
×

SBCCI08

957 views

Published on

Published in: Travel, Technology
  • Be the first to comment

  • Be the first to like this

SBCCI08

  1. 1. A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures J. M. Martins Ferreira [ jmf@fe.up.pt ] FEUP / DEEC Rua Dr. Roberto Frias 4200-465 Porto - PORTUGAL André Fidalgo, Gustavo R. Alves Manuel Gericota [ anf/gca/mgg @isep.ipp.pt ] ISEP / DEE Rua Ant. Bernardino Almeida, 431 4200-072 Porto - PORTUGAL SBCCI’08: Gramado, Brazil, 1-4 September 2008 These slides are available at http://www.slideshare.net/josemmf
  2. 2. Outline of the presentation <ul><li>Introduction and motivation </li></ul><ul><li>Setup, workbench, workflow </li></ul><ul><li>Experimental results </li></ul><ul><ul><li>Basic, extended and OCD-FI </li></ul></ul><ul><ul><li>OCD-FI extensions (EDAC, RTREG) </li></ul></ul><ul><li>Comparison and discussion </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Scope, focus, setup <ul><li>Scope : usage of OCD resources for validating fault tolerance / fault injection </li></ul><ul><li>Focus : comparative analysis of experimental results for various OCD configurations and debugging scenarios </li></ul><ul><li>Setup : a) 32-bit Freescale MPC-565, iSystem IC3000 (iTracePro), Winidea 2005 b) OCD enhancements in VHDL </li></ul>
  4. 4. Motivation <ul><li>OCD offers controllability and observability features that may be used to inject faults and observe their effect (R/W access to registers and memory) </li></ul><ul><li>Usefulness for fault tolerance validation may be limited in bandwidth, coverage and repeatability / representativeness of results </li></ul><ul><li>Mitigation is possible by enhancing OCD </li></ul>
  5. 5. Our approach <ul><li>Configurations: basic (2:8), extended (8:8), OCD-FI (with a fault injection module) </li></ul><ul><li>Fault injection scenarios: off-line or real-time, predefined or on-the-fly </li></ul><ul><li>OCD-FI is able to cope with error detection / correction and real-time requirements </li></ul><ul><li>Comparison of results uses a common set of workload applications and FI campaigns </li></ul>
  6. 6. NEXUS FI for the MPC565 Trace data: Program trace data output by the OCD Campaign data: scripts that describe the FI experiments Improved Fault Effects Classification 3 Data Trace Real Time Fault Insertion 3 Dynamic Register and Memory Access Fault Effects Classification 2 Program Trace Static Fault Insertion 1 Static Register and Memory Access Real Time Triggering 1 Watchpoints Internal Triggering 1 Breakpoints External Triggering 1 Run-Control Usability for FI Class NEXUS Debug Features
  7. 7. OCD infrastructure developed to support this work <ul><li>NEXUS class 2 compliant with real- -time memory access </li></ul><ul><li>Adjustable data bus </li></ul><ul><li>OCD configurations </li></ul><ul><ul><li>Basic (2,8) </li></ul></ul><ul><ul><li>Extended (8,8) </li></ul></ul><ul><ul><li>OCD-FI: comprises a fault injection module </li></ul></ul>
  8. 8. Fault injection: Workload applications <ul><li>Workload applications: </li></ul><ul><ul><li>Matrix adder (Madder) </li></ul></ul><ul><ul><li>Vector sorter (Vsorter) </li></ul></ul><ul><ul><li>LUT control algorithm (Xcontrol) </li></ul></ul><ul><li>Each application was implemented in two versions: normal and fault tolerant </li></ul><ul><li>Fault tolerance by duplicating data in memory and repeating each operation </li></ul>
  9. 9. Fault injection campaigns <ul><li>Scripts that define 10 FI experiments during system operation </li></ul><ul><li>100 campaigns were executed for each scenario using the three workload applications (Madder, Vsorter, Xcontrol) </li></ul><ul><li>FI campaigns mostly target memory positions and cause a bit-flip to emulate SEU effects </li></ul>
  10. 10. Predetermination to improve performance of FI campaigns <ul><li>Predetermination of the contents of the target memory cell at the FI instant may be done through a “gold run” or by ensuring: </li></ul><ul><ul><li>Complete knowledge of the program flow </li></ul></ul><ul><ul><li>Full observability of external inputs </li></ul></ul><ul><ul><li>Precise control of the FI instant and location </li></ul></ul><ul><li>Otherwise the target memory cell must be read “immediately” before the FI instant </li></ul>
  11. 11. Experimental scenarios B : Basic; E : Extended; OCD-FI : OCD for Fault Injection OF : Off-line; RT : Real-time; + : predetermination not required 4 57 Real Time NO MDI=2 MDO=8 OCD-FI+ 2 57 Real Time YES MDI=2 MDO=8 OCD-FI 18 6 Real Time NO MDI=8 MDO=8 ERT+ 9 6 Real Time YES MDI=8 MDO=8 ERT 44 22 Real Time NO MDI=2 MDO=8 BRT+ 35 22 Real Time YES MDI=2 MDO=8 BRT 18 6 Offline NO MDI=8 MDO=8 EOF+ 9 6 Offline YES MDI=8 MDO=8 EOF 44 22 Offline NO MDI=2 MDO=8 BOF+ 35 22 Offline YES MDI=2 MDO=8 BOF Insertion Set-Up Delays (Clk cycles) Fault injection method Predetermination of the faulty value Bandwidth Configur. & Scenario
  12. 12. Experimental results (%): B, E, OCD-FI (results) U ERR : Undetected errors (incorrect final result that goes undetected) D ERR : Detected errors (error detection signal activated) N ERR : No errors (application ended correctly) 70,1 1,1 28,8 70,2 29,8 1,2 1,9 96,9 2 98 58 13,9 28,1 80,9 19,1 OCD-FI+ 69,9 1,2 28,9 70,4 29,6 1,3 1,9 96,8 1,9 98,1 58 13,8 28,2 80,7 19,3 ERT+ 69,4 1,5 29,1 70,7 29,3 1,4 1,9 96,7 1,8 98,2 57,8 13,8 28,4 80,5 19,5 BRT+ 1 2 97 2 98 58,1 13,9 28 81 19 OCD-FI 1,1 2 96,9 2 98 58 13,9 28,1 80,8 19,2 ERT 1,2 2 96,8 1,9 98,1 57,9 13,8 28,3 80,6 19,4 BRT Not Possible 1 2 97 2 98 58,1 13,9 28 81 19 OFF N ERR U ERR D ERR N ERR U ERR N ERR U ERR D ERR N ERR U ERR N ERR U ERR D ERR N ERR U ERR SW-FT non-FT SW-FT non-FT SW-FT non-FT Configur . & Scenario XControl VSorter MAdder  
  13. 13. Experimental results (%): Erroneous fault insertions <ul><li>Further experiments in RT scenarios were carried out to identify erroneous FI which were classified as Inconclusive (INC) </li></ul>1,3 1,2 1,7 0,3 0,2 0,4 OCD-FI+ 2,4 2,1 3,7 1,5 0,8 2 ERT+ 3,2 2,8 4,8 2,1 1,2 3 BRT+ 0,2 0,2 0,1 0,2 OCD-FI 1,1 2,3 0,6 1,4 ERT Not Possible 2,2 4 Not Possible 0,9 3,1 BRT 0 0 OFF XControl VSorter MAdder XControl VSorter MAdder SW-FT non-FT Configur. & Scenario
  14. 14. Experimental results: Pros and cons of FI methods <ul><li>Off-line configurations always produce the most reliable results </li></ul><ul><li>The CPU may overwrite the target memory cell before the FI is complete (INC) </li></ul><ul><li>INC results increase with the delay between fault triggering and fault insertion, and are mitigated by OCD-FI and predetermination </li></ul>
  15. 15. Experimental results (%): OCD-FI extensions for EDAC <ul><li>FT versions of the workload applications were not used due to EDAC </li></ul>D ERR : Percentage of errors detected that were corrected by EDAC 0,5 69,5 0 30 1 69,1 0 29,9 XControl 0,3 0,7 0 99 0,9 0,8 0 98,3 VSorter 0,8 59,5 0 39,7 1,6 58,8 0 39,6 MAdder INC Nerr Uerr Derr INC Nerr Uerr Derr Predetermination No Predetermination
  16. 16. Experimental results: Pros and cons of OCD-FI EDAC extensions <ul><li>EDAC mechanisms effectively eliminate the effects of single bit-flip errors on the target system </li></ul><ul><li>The OCD-FI EDAC extension enables FI into protected memory blocks </li></ul>
  17. 17. Experimental results (%): OCD-FI for RTREG <ul><li>RT register access requires a collision manager that degrades dynamic performance… </li></ul>40 14 46 40 60 VSorter 16 22 62 11 89 MAdder Nerr Uerr Derr Nerr Uerr   SW-FT non-FT
  18. 18. Experimental results: Pros and cons of OCD-FI RTREG extensions <ul><li>Due to their higher occurrence rate, INC results were explicitly avoided </li></ul><ul><li>Not all code lines qualify to trigger a FI experiment (45% of the code lines could be used for triggering accumulator FI) </li></ul><ul><li>FI results and software fault tolerance efficiency differ significantly between registers and memory </li></ul>
  19. 19. Performance (FI rate) <ul><li>Maximum faults / second rates (single bit-flips on the same memory cell, 30 MHz clock frequency): </li></ul>483k 491k OCD_FI+ 1150k 1250k ERT+ 400k 454k BRT+ 1150k EOF+ 400k Not possible BOF+ Halted Access Real Time Conf. & Scenario
  20. 20. Performance (overhead, dynamic) <ul><li>Silicon overhead and maximum operating frequency on a Virtex-2 FPGA: </li></ul>25 108,3% 77484 x x +BOTH   x 27 106,8% 76392 x   +RTREG   x 32 102,3% 73184   x +EDAC   x 36 100,4% 71842     x   x 36 106,4% 76127       ERT x 32 101,5% 72619   x   BRT x 36 100,0% 71527       BRT x 32 76,9% 55018   x     x 37 75,4% 53926         x [MHz] [%] [Eq Gates] Max f Overhead Area RTREG EDAC OCD-FI OCD CPU Core
  21. 21. Conclusions <ul><li>Wide spectrum (FPGA, ASIC, etc.) </li></ul><ul><li>FI rate does not justify real-time </li></ul><ul><li>Low overhead </li></ul><ul><li>Better C&O than radiation techniques </li></ul><ul><li>Less intrusive than software techniques </li></ul><ul><li>Should be used with the final HW and SW </li></ul><ul><li>Limitations in coverage, lack of standards </li></ul>

×