Multi-faceted
Microarchitecture
Level Reliability
Characterization for
NVIDIA and AMD GPUs
A. Vallero*, S.
Tselonis,
D. Gizopoulos and
S. Di Carlo
26 IEEE VLSI Test Symposium, Hyatt Hotel,
San Francisco, CA, April 22-25, 2018
2
Licensing Note
This work is licensed under the Creative Commons Attribution-NonCommercial NoDerivatives
4.0 International License.
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
You are free: to copy, distribute, display, and perform the work
Under the following conditions:
• Attribution. You must attribute the work in the manner specified by the author or licensor.
• Non-commercial. You may not use this work for commercial purposes.
• No Derivative Works. You may not alter, transform, or build upon this work.
• For any reuse or distribution, you must make clear to others the license terms of this work.
• Any of these conditions can be waived if you get permission from the copyright holder.
Your fair use and other rights are in no way affected by the above.
• GPUs are increasingly used in applications where reliability
is a top concern
✚ High computational power
⁃ Susceptibility to faults due to technology shrinking
• Reliability analysis for GPU based systems is a complex
task that requires dedicated tools.
MOTIVATIONS
Reliability analysis for design space exploration
in GPGPU systems
Super computing Automotive Biomedical
OUTLINE
Previous works
Contributions
The reliability evaluation framework
Experiment
Conclusions
• Reliability estimation framework for NVIDIA GPUs
⁃ GPGPU-Soda [Tan et al., JPC,2013] – microarchitecture-level ACE analysis
framework working on PTX assembly language
⁃ GUFI [Tselonis et al., ISPASS,2016] – microarchitecture-level fault injector
based on GPGPU-Sim
⁃ SASSIFI [Kumar et al., ISPASS, 2017] – very fast, fault injection by profiling
and debugging on real hardware
• Reliability estimation framework for AMD GPUs
⁃ [Farazmand et al., SELSE, 2012] - Evergreen AMD GPU fault injector
⁃ SIFI [Vallero et al., IOLTS, 2017] – Southern Islands microarchitecture-level
ACE analysis and fault injector framework
PREVIOUS WORKS
Reliability evaluation tools for NVIDIA and AMD
GPUs
• Reliability and Performance are jointly evaluated
• Two reliability analysis methodologies are adopted
and compared:
⁃ ACE analysis
⁃ Fault injection
• Software and hardware masking properties are
highlighted
CONTRIBUTION
Microarchitecture level comparison among
NVIDIA and AMD GPUs
THE RELIABILITY FRAMEWORK
Architecture of the tools
SIFI / GUFI
Multi2Sim / GPGPU-Sim
Fault Injector
ACE
Analyzer
Reliability and performance
report
Architecture Vulnerability
Factor (AVF)
Architecture Vulnerability
Factor Util (AVF Util)
Failures In Time (FIT)
THE GPU MICRO-ARCHITECTURE
An example for Southern Islands GPU
architecture
Branch Unit
Local Data Unit
Scalar Unit
SIMD
1
SIMD
2
SIMD
1
Local
Memory
Vector
Reg.
File
Scalar
Reg.
File
Front-end
Global memory
Compute unit (CU1) CU2 CU3
Ultra-thread dispatcher
Big area increases
susceptibility to soft-errors
Power consumption can be
saved by disabling the ECC
on these hardware
structures” [Fang et. al,
ISPASS 2014]
Soft-errors
THE FAULT INJECTION FRAMEWORK
Fault injection engine
Application
profiler
Computes time intervals in which kernels are executed and the output of
the golden simulation
Fault
generation
Generates the fault list - single bit flips uniformly distributed in time and
space
Fault profiler
Issues a simulation to profile if faults
are striking “Util” resources.
Marks simulations containing
these faults as “Util” injections
Parallel fault
injection
Performs “Util” injections
in parallel threads.
Non “Util” injections are always masked
and therefore not simulated
Error
Classification
Classifies errors based on the outcome of simulations and generates the
reliability report
describes the programming models and architectures of the
considered GPUs. Section III discusses the reliability
evaluation framework. Results are presented and discussed in
Section IV. Related works are reported in Section V, and
Section VI concludes the paper.
II. A. GPU ARCHITECTURES AND PROGRAMMING MODELS
In our study, we considered four GPU chips with different
architectures: AMD HD RadeonTM
7970 (Southern Islands
architecture), NVIDIA QuadroTM
FX 5600 (G80 architecture),
NVIDIA QuadroTM
FX5800 (GT200 architecture) and
NVIDIA GeforceTM
GTX 480 (Fermi architecture). We
performed our analysis using a set of 10 benchmarks coded
using the OpenCL programming language for the AMD
Southern Islands chip and the CUDA programming language
for the NVIDIA chips. OpenCL [13] and CUDA [14] are the
most widespread GPGPU programming languages and both
represent abstractions of the physical hardware. Both languages
rely on similar concepts and explore the Single Instruction
Multiple Data (SIMD) paradigm. Throughout the paper, we use
the OpenCL and AMD terminology. Readers more familiar
with CUDA can refer to Fig. 1 and Fig. 2 where
CUDA/NVIDIA terminology is reported in parentheses next to
the OpenCL/AMD terms.
From the hardware standpoint (Fig. 1), a GPU typically
consists of several compute units (CU) sharing a global
memory and managed by a dispatcher. Each CU contains
multiple processing elements (PEs), each one including a set of
functional units (e.g., integer unit, floating point unit, etc.) and
a register file (typically a portion of a global register file split
and assigned to the different PEs). The CU also includes a local
memory and a scheduler that distributes and coordinates the
work of the different PEs. The PEs are grouped into SIMD
Units. All PEs in a SIMD Unit share the instruction
fetch/dispatch logic and execute the same instruction
concurrently.
Fig. 1. A generic GPU architecture. Nomenclature follows the
OpenCL/AMD model with equivalent CUDA/NVIDIA terms in
parentheses (when different).
Fig. 2 graphically represents the OpenCL/CUDA
programming model that can be mapped on the GPU
architecture of Fig. 1. Parallel portions of the application are
executed in parallel on the PEs of a SIMD Unit (Fig. 1)
constitutes a wavefront. Work-items are also aggregated in
work-groups. Each work-group is independent from the others.
Work-items belonging to the same work-group can be
synchronized and can communicate with each other through a
common memory space named local memory. Every time a
work-group must be executed it is assigned to a compute unit
(CU). The availability of multiple CUs enables the
accommodation of a large number of work-groups. Finally, a
set of work-groups that are concurrently scheduled to run on a
GPU constitute a ND-Range. The ND-Range provides a global
memory shared by all the work-groups. Communication among
different work-groups is not allowed.
Fig. 2. The OpenCL/CUDA programming model for AMD and NVIDIA
GPUs.
III. RELIABILITY EVALUATION FRAMEWORK
This section introduces the framework we developed to
analyze reliability and performance of different GPU
architectures and workloads. The framework includes two tools
named GUFI and SIFI. GUFI, previously presented in [8] has
been developed to perform reliability analysis on NVIDIA
GPUs. It is based on the GPGPU-Sim (v3.2.2) [33] micro
architectural simulator and is able to perform complex fault-
injection campaigns supporting both the SASS and PTX
assembly languages. GUFI has been extended for the purposes
of our study to perform ACE-based analysis. Similarly, to
GUFI, SIFI is a fault injection and ACE analysis tool
developed to characterize AMD GPUs [31]. SIFI is built on top
of the Mult2Sim (v4.2) microarchitectural simulator and
supports the Southern Island assembly language [28]. For both
tools, reliability is analyzed looking at the low-level assembly
code running on the real hardware. Therefore, for NVIDIA
GPUs, the SASS assembly is used instead of the PTX. This
gives access to the real architectural registers and, therefore,
allows for a fair comparison of NVIDIA and AMD chips.
This study focuses on soft-errors, i.e., bit-flips of a memory
element mainly caused by radiations, thermal cycling,
transistor variability and erratic fluctuation of voltage.
Global memory (Global memory)
Compute Unit
(Streaming
Multiprocessor)
Ultra-Threaded Dispatcher (Thread Block Scheduler)
Wavefront(Warp)
scheduler
Local Memory (Shared memory)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
Register
File
Functional
Units
Register
File
SIMD Unit (SFU) SIMD Unit (SFU)
Private memory
(Local memory)
__kernel func {
…
…
}
Work-Item (Thread)
Work-item
Wavefront (Warp)
Work-item
Work-item
Work-item
...
Wavefront Wavefront
Wavefront Wavefront
Local memory (Shared memory)
Work-group (Block)
Global memory (Global memory)
NDRange (Grid)
Work-group Work-group
Work-group Work-group...
...
... ...
Comment [
CU1
WI0
R0 R1 Rn
WI1
WG0
WG1
THE ACE ANALYSIS FRAMEWORK
ACE analysis engine
• Profiles the application to identify ACE bits at each clock
cycleVector register file
computation
Register
access
profile for
each kernel
READ WRITE
WRITE WRITE
WRITE READ
READ READ
Un-ACE
Un-ACE
ACE
ACE
We do not consider dead instructionsand logic masking
• Pessimistic AVF
• Very fast analysis
AVF calculation
EXPERIMENTAL RESULTS
Experimental
setup
CHIP FX FX 5800 GTX
480
HD 7970
Technology 90nm 55nm 28 nm 28 nm
λ FIT/bit* 1E-3 0.72E-3 0.52E- 0.32E-3
Vendor NVIDIA NVIDIA NVIDIA AMD
Architecture G80 GT200 Fermi Southern
Islands
Vector
File
64KB 64KB 128KB 256 KB
Local Memory 16KB 16KB 64KB 64 KB
SIMD Units 1 2 4 4
(1) Backprop, (2) DWTHaar1D, (3) Gaussian, (4) Histogram, (5)Kmeans,
(6)MatrixMultiplication, (7) Reduction, (8) Scan (9) MatrixTranspose, (10) VectorAdd
10 benchmarks from AMD-APP SDK, CUDA SDK, RODINIA
Statistical fault sampling
e = 3% error margin
t = 95% confidence level
p = 0.5
N = structure size
2K injections per
structure* [Ibe et al. T-ED, 2010]
EXPERIMENTAL RESULTS
Computing AVF of register file
AVF for the Vector Register File
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0%
10%
20%
30%
40%
50%
60%
70%
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd average
Occupancy
AVF
Register File
AVF-FI AVF-ACE Occupancy
High overestimation of ACE analysis with respect to FI
Fault injection vs. ACE analysis
EXPERIMENTAL RESULTS
Computing AVF Util of register file
Differences of microarchitectures are reflected in the scheduling and in
vulnerability
To eliminate the contribution of occupancy
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd average
AVFUTIL
Register File
AVF UTIL - FI AVF UTIL - ACE
EXPERIMENTAL RESULTS
Computing Failures In Time
Register file impacts reliability more than Local Memory
0
50
100
150
200
250
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd
FIT
Failures in Time
FIT regfile - FI FIT lmem - FI
FIT does not take into account performance
1E+12
1E+13
1E+14
1E+15
1E+16
1E+17
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd average
Executions per Failure
EPF-FI
EXPERIMENTAL RESULTS
Computing Execution Per Failure
Higher parallelism translates into more performance
A larger number of executions before a failure happens
•10 benchmarks AMD/OpenCL and
NVIDIA/CUDA
•Reliability analysis comparing fault injection
and ACE analysis
•Joint evaluation of reliability and
performance
CONCLUSIONS
Reliability and performance evaluation of AMD and
NVIDIA GPU architectures
Questions?
18
http://www.testgroup.polito.it
TestGroup
@TestGroupPolito

Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA and AMD GPUs

  • 1.
    Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIAand AMD GPUs A. Vallero*, S. Tselonis, D. Gizopoulos and S. Di Carlo 26 IEEE VLSI Test Symposium, Hyatt Hotel, San Francisco, CA, April 22-25, 2018
  • 2.
    2 Licensing Note This workis licensed under the Creative Commons Attribution-NonCommercial NoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. You are free: to copy, distribute, display, and perform the work Under the following conditions: • Attribution. You must attribute the work in the manner specified by the author or licensor. • Non-commercial. You may not use this work for commercial purposes. • No Derivative Works. You may not alter, transform, or build upon this work. • For any reuse or distribution, you must make clear to others the license terms of this work. • Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above.
  • 3.
    • GPUs areincreasingly used in applications where reliability is a top concern ✚ High computational power ⁃ Susceptibility to faults due to technology shrinking • Reliability analysis for GPU based systems is a complex task that requires dedicated tools. MOTIVATIONS Reliability analysis for design space exploration in GPGPU systems Super computing Automotive Biomedical
  • 4.
    OUTLINE Previous works Contributions The reliabilityevaluation framework Experiment Conclusions
  • 5.
    • Reliability estimationframework for NVIDIA GPUs ⁃ GPGPU-Soda [Tan et al., JPC,2013] – microarchitecture-level ACE analysis framework working on PTX assembly language ⁃ GUFI [Tselonis et al., ISPASS,2016] – microarchitecture-level fault injector based on GPGPU-Sim ⁃ SASSIFI [Kumar et al., ISPASS, 2017] – very fast, fault injection by profiling and debugging on real hardware • Reliability estimation framework for AMD GPUs ⁃ [Farazmand et al., SELSE, 2012] - Evergreen AMD GPU fault injector ⁃ SIFI [Vallero et al., IOLTS, 2017] – Southern Islands microarchitecture-level ACE analysis and fault injector framework PREVIOUS WORKS Reliability evaluation tools for NVIDIA and AMD GPUs
  • 6.
    • Reliability andPerformance are jointly evaluated • Two reliability analysis methodologies are adopted and compared: ⁃ ACE analysis ⁃ Fault injection • Software and hardware masking properties are highlighted CONTRIBUTION Microarchitecture level comparison among NVIDIA and AMD GPUs
  • 7.
    THE RELIABILITY FRAMEWORK Architectureof the tools SIFI / GUFI Multi2Sim / GPGPU-Sim Fault Injector ACE Analyzer Reliability and performance report Architecture Vulnerability Factor (AVF) Architecture Vulnerability Factor Util (AVF Util) Failures In Time (FIT)
  • 8.
    THE GPU MICRO-ARCHITECTURE Anexample for Southern Islands GPU architecture Branch Unit Local Data Unit Scalar Unit SIMD 1 SIMD 2 SIMD 1 Local Memory Vector Reg. File Scalar Reg. File Front-end Global memory Compute unit (CU1) CU2 CU3 Ultra-thread dispatcher Big area increases susceptibility to soft-errors Power consumption can be saved by disabling the ECC on these hardware structures” [Fang et. al, ISPASS 2014] Soft-errors
  • 9.
    THE FAULT INJECTIONFRAMEWORK Fault injection engine Application profiler Computes time intervals in which kernels are executed and the output of the golden simulation Fault generation Generates the fault list - single bit flips uniformly distributed in time and space Fault profiler Issues a simulation to profile if faults are striking “Util” resources. Marks simulations containing these faults as “Util” injections Parallel fault injection Performs “Util” injections in parallel threads. Non “Util” injections are always masked and therefore not simulated Error Classification Classifies errors based on the outcome of simulations and generates the reliability report describes the programming models and architectures of the considered GPUs. Section III discusses the reliability evaluation framework. Results are presented and discussed in Section IV. Related works are reported in Section V, and Section VI concludes the paper. II. A. GPU ARCHITECTURES AND PROGRAMMING MODELS In our study, we considered four GPU chips with different architectures: AMD HD RadeonTM 7970 (Southern Islands architecture), NVIDIA QuadroTM FX 5600 (G80 architecture), NVIDIA QuadroTM FX5800 (GT200 architecture) and NVIDIA GeforceTM GTX 480 (Fermi architecture). We performed our analysis using a set of 10 benchmarks coded using the OpenCL programming language for the AMD Southern Islands chip and the CUDA programming language for the NVIDIA chips. OpenCL [13] and CUDA [14] are the most widespread GPGPU programming languages and both represent abstractions of the physical hardware. Both languages rely on similar concepts and explore the Single Instruction Multiple Data (SIMD) paradigm. Throughout the paper, we use the OpenCL and AMD terminology. Readers more familiar with CUDA can refer to Fig. 1 and Fig. 2 where CUDA/NVIDIA terminology is reported in parentheses next to the OpenCL/AMD terms. From the hardware standpoint (Fig. 1), a GPU typically consists of several compute units (CU) sharing a global memory and managed by a dispatcher. Each CU contains multiple processing elements (PEs), each one including a set of functional units (e.g., integer unit, floating point unit, etc.) and a register file (typically a portion of a global register file split and assigned to the different PEs). The CU also includes a local memory and a scheduler that distributes and coordinates the work of the different PEs. The PEs are grouped into SIMD Units. All PEs in a SIMD Unit share the instruction fetch/dispatch logic and execute the same instruction concurrently. Fig. 1. A generic GPU architecture. Nomenclature follows the OpenCL/AMD model with equivalent CUDA/NVIDIA terms in parentheses (when different). Fig. 2 graphically represents the OpenCL/CUDA programming model that can be mapped on the GPU architecture of Fig. 1. Parallel portions of the application are executed in parallel on the PEs of a SIMD Unit (Fig. 1) constitutes a wavefront. Work-items are also aggregated in work-groups. Each work-group is independent from the others. Work-items belonging to the same work-group can be synchronized and can communicate with each other through a common memory space named local memory. Every time a work-group must be executed it is assigned to a compute unit (CU). The availability of multiple CUs enables the accommodation of a large number of work-groups. Finally, a set of work-groups that are concurrently scheduled to run on a GPU constitute a ND-Range. The ND-Range provides a global memory shared by all the work-groups. Communication among different work-groups is not allowed. Fig. 2. The OpenCL/CUDA programming model for AMD and NVIDIA GPUs. III. RELIABILITY EVALUATION FRAMEWORK This section introduces the framework we developed to analyze reliability and performance of different GPU architectures and workloads. The framework includes two tools named GUFI and SIFI. GUFI, previously presented in [8] has been developed to perform reliability analysis on NVIDIA GPUs. It is based on the GPGPU-Sim (v3.2.2) [33] micro architectural simulator and is able to perform complex fault- injection campaigns supporting both the SASS and PTX assembly languages. GUFI has been extended for the purposes of our study to perform ACE-based analysis. Similarly, to GUFI, SIFI is a fault injection and ACE analysis tool developed to characterize AMD GPUs [31]. SIFI is built on top of the Mult2Sim (v4.2) microarchitectural simulator and supports the Southern Island assembly language [28]. For both tools, reliability is analyzed looking at the low-level assembly code running on the real hardware. Therefore, for NVIDIA GPUs, the SASS assembly is used instead of the PTX. This gives access to the real architectural registers and, therefore, allows for a fair comparison of NVIDIA and AMD chips. This study focuses on soft-errors, i.e., bit-flips of a memory element mainly caused by radiations, thermal cycling, transistor variability and erratic fluctuation of voltage. Global memory (Global memory) Compute Unit (Streaming Multiprocessor) Ultra-Threaded Dispatcher (Thread Block Scheduler) Wavefront(Warp) scheduler Local Memory (Shared memory) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) Register File Functional Units Register File SIMD Unit (SFU) SIMD Unit (SFU) Private memory (Local memory) __kernel func { … … } Work-Item (Thread) Work-item Wavefront (Warp) Work-item Work-item Work-item ... Wavefront Wavefront Wavefront Wavefront Local memory (Shared memory) Work-group (Block) Global memory (Global memory) NDRange (Grid) Work-group Work-group Work-group Work-group... ... ... ... Comment [ CU1 WI0 R0 R1 Rn WI1 WG0 WG1
  • 10.
    THE ACE ANALYSISFRAMEWORK ACE analysis engine • Profiles the application to identify ACE bits at each clock cycleVector register file computation Register access profile for each kernel READ WRITE WRITE WRITE WRITE READ READ READ Un-ACE Un-ACE ACE ACE We do not consider dead instructionsand logic masking • Pessimistic AVF • Very fast analysis AVF calculation
  • 11.
    EXPERIMENTAL RESULTS Experimental setup CHIP FXFX 5800 GTX 480 HD 7970 Technology 90nm 55nm 28 nm 28 nm λ FIT/bit* 1E-3 0.72E-3 0.52E- 0.32E-3 Vendor NVIDIA NVIDIA NVIDIA AMD Architecture G80 GT200 Fermi Southern Islands Vector File 64KB 64KB 128KB 256 KB Local Memory 16KB 16KB 64KB 64 KB SIMD Units 1 2 4 4 (1) Backprop, (2) DWTHaar1D, (3) Gaussian, (4) Histogram, (5)Kmeans, (6)MatrixMultiplication, (7) Reduction, (8) Scan (9) MatrixTranspose, (10) VectorAdd 10 benchmarks from AMD-APP SDK, CUDA SDK, RODINIA Statistical fault sampling e = 3% error margin t = 95% confidence level p = 0.5 N = structure size 2K injections per structure* [Ibe et al. T-ED, 2010]
  • 12.
    EXPERIMENTAL RESULTS Computing AVFof register file AVF for the Vector Register File 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd average Occupancy AVF Register File AVF-FI AVF-ACE Occupancy High overestimation of ACE analysis with respect to FI Fault injection vs. ACE analysis
  • 13.
    EXPERIMENTAL RESULTS Computing AVFUtil of register file Differences of microarchitectures are reflected in the scheduling and in vulnerability To eliminate the contribution of occupancy 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd average AVFUTIL Register File AVF UTIL - FI AVF UTIL - ACE
  • 14.
    EXPERIMENTAL RESULTS Computing FailuresIn Time Register file impacts reliability more than Local Memory 0 50 100 150 200 250 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd FIT Failures in Time FIT regfile - FI FIT lmem - FI FIT does not take into account performance
  • 15.
  • 16.
    •10 benchmarks AMD/OpenCLand NVIDIA/CUDA •Reliability analysis comparing fault injection and ACE analysis •Joint evaluation of reliability and performance CONCLUSIONS Reliability and performance evaluation of AMD and NVIDIA GPU architectures
  • 17.
  • 18.