SlideShare a Scribd company logo
1 of 18
Multi-faceted
Microarchitecture
Level Reliability
Characterization for
NVIDIA and AMD GPUs
A. Vallero*, S.
Tselonis,
D. Gizopoulos and
S. Di Carlo
26 IEEE VLSI Test Symposium, Hyatt Hotel,
San Francisco, CA, April 22-25, 2018
2
Licensing Note
This work is licensed under the Creative Commons Attribution-NonCommercial NoDerivatives
4.0 International License.
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
You are free: to copy, distribute, display, and perform the work
Under the following conditions:
• Attribution. You must attribute the work in the manner specified by the author or licensor.
• Non-commercial. You may not use this work for commercial purposes.
• No Derivative Works. You may not alter, transform, or build upon this work.
• For any reuse or distribution, you must make clear to others the license terms of this work.
• Any of these conditions can be waived if you get permission from the copyright holder.
Your fair use and other rights are in no way affected by the above.
• GPUs are increasingly used in applications where reliability
is a top concern
✚ High computational power
⁃ Susceptibility to faults due to technology shrinking
• Reliability analysis for GPU based systems is a complex
task that requires dedicated tools.
MOTIVATIONS
Reliability analysis for design space exploration
in GPGPU systems
Super computing Automotive Biomedical
OUTLINE
Previous works
Contributions
The reliability evaluation framework
Experiment
Conclusions
• Reliability estimation framework for NVIDIA GPUs
⁃ GPGPU-Soda [Tan et al., JPC,2013] – microarchitecture-level ACE analysis
framework working on PTX assembly language
⁃ GUFI [Tselonis et al., ISPASS,2016] – microarchitecture-level fault injector
based on GPGPU-Sim
⁃ SASSIFI [Kumar et al., ISPASS, 2017] – very fast, fault injection by profiling
and debugging on real hardware
• Reliability estimation framework for AMD GPUs
⁃ [Farazmand et al., SELSE, 2012] - Evergreen AMD GPU fault injector
⁃ SIFI [Vallero et al., IOLTS, 2017] – Southern Islands microarchitecture-level
ACE analysis and fault injector framework
PREVIOUS WORKS
Reliability evaluation tools for NVIDIA and AMD
GPUs
• Reliability and Performance are jointly evaluated
• Two reliability analysis methodologies are adopted
and compared:
⁃ ACE analysis
⁃ Fault injection
• Software and hardware masking properties are
highlighted
CONTRIBUTION
Microarchitecture level comparison among
NVIDIA and AMD GPUs
THE RELIABILITY FRAMEWORK
Architecture of the tools
SIFI / GUFI
Multi2Sim / GPGPU-Sim
Fault Injector
ACE
Analyzer
Reliability and performance
report
Architecture Vulnerability
Factor (AVF)
Architecture Vulnerability
Factor Util (AVF Util)
Failures In Time (FIT)
THE GPU MICRO-ARCHITECTURE
An example for Southern Islands GPU
architecture
Branch Unit
Local Data Unit
Scalar Unit
SIMD
1
SIMD
2
SIMD
1
Local
Memory
Vector
Reg.
File
Scalar
Reg.
File
Front-end
Global memory
Compute unit (CU1) CU2 CU3
Ultra-thread dispatcher
Big area increases
susceptibility to soft-errors
Power consumption can be
saved by disabling the ECC
on these hardware
structures” [Fang et. al,
ISPASS 2014]
Soft-errors
THE FAULT INJECTION FRAMEWORK
Fault injection engine
Application
profiler
Computes time intervals in which kernels are executed and the output of
the golden simulation
Fault
generation
Generates the fault list - single bit flips uniformly distributed in time and
space
Fault profiler
Issues a simulation to profile if faults
are striking “Util” resources.
Marks simulations containing
these faults as “Util” injections
Parallel fault
injection
Performs “Util” injections
in parallel threads.
Non “Util” injections are always masked
and therefore not simulated
Error
Classification
Classifies errors based on the outcome of simulations and generates the
reliability report
describes the programming models and architectures of the
considered GPUs. Section III discusses the reliability
evaluation framework. Results are presented and discussed in
Section IV. Related works are reported in Section V, and
Section VI concludes the paper.
II. A. GPU ARCHITECTURES AND PROGRAMMING MODELS
In our study, we considered four GPU chips with different
architectures: AMD HD RadeonTM
7970 (Southern Islands
architecture), NVIDIA QuadroTM
FX 5600 (G80 architecture),
NVIDIA QuadroTM
FX5800 (GT200 architecture) and
NVIDIA GeforceTM
GTX 480 (Fermi architecture). We
performed our analysis using a set of 10 benchmarks coded
using the OpenCL programming language for the AMD
Southern Islands chip and the CUDA programming language
for the NVIDIA chips. OpenCL [13] and CUDA [14] are the
most widespread GPGPU programming languages and both
represent abstractions of the physical hardware. Both languages
rely on similar concepts and explore the Single Instruction
Multiple Data (SIMD) paradigm. Throughout the paper, we use
the OpenCL and AMD terminology. Readers more familiar
with CUDA can refer to Fig. 1 and Fig. 2 where
CUDA/NVIDIA terminology is reported in parentheses next to
the OpenCL/AMD terms.
From the hardware standpoint (Fig. 1), a GPU typically
consists of several compute units (CU) sharing a global
memory and managed by a dispatcher. Each CU contains
multiple processing elements (PEs), each one including a set of
functional units (e.g., integer unit, floating point unit, etc.) and
a register file (typically a portion of a global register file split
and assigned to the different PEs). The CU also includes a local
memory and a scheduler that distributes and coordinates the
work of the different PEs. The PEs are grouped into SIMD
Units. All PEs in a SIMD Unit share the instruction
fetch/dispatch logic and execute the same instruction
concurrently.
Fig. 1. A generic GPU architecture. Nomenclature follows the
OpenCL/AMD model with equivalent CUDA/NVIDIA terms in
parentheses (when different).
Fig. 2 graphically represents the OpenCL/CUDA
programming model that can be mapped on the GPU
architecture of Fig. 1. Parallel portions of the application are
executed in parallel on the PEs of a SIMD Unit (Fig. 1)
constitutes a wavefront. Work-items are also aggregated in
work-groups. Each work-group is independent from the others.
Work-items belonging to the same work-group can be
synchronized and can communicate with each other through a
common memory space named local memory. Every time a
work-group must be executed it is assigned to a compute unit
(CU). The availability of multiple CUs enables the
accommodation of a large number of work-groups. Finally, a
set of work-groups that are concurrently scheduled to run on a
GPU constitute a ND-Range. The ND-Range provides a global
memory shared by all the work-groups. Communication among
different work-groups is not allowed.
Fig. 2. The OpenCL/CUDA programming model for AMD and NVIDIA
GPUs.
III. RELIABILITY EVALUATION FRAMEWORK
This section introduces the framework we developed to
analyze reliability and performance of different GPU
architectures and workloads. The framework includes two tools
named GUFI and SIFI. GUFI, previously presented in [8] has
been developed to perform reliability analysis on NVIDIA
GPUs. It is based on the GPGPU-Sim (v3.2.2) [33] micro
architectural simulator and is able to perform complex fault-
injection campaigns supporting both the SASS and PTX
assembly languages. GUFI has been extended for the purposes
of our study to perform ACE-based analysis. Similarly, to
GUFI, SIFI is a fault injection and ACE analysis tool
developed to characterize AMD GPUs [31]. SIFI is built on top
of the Mult2Sim (v4.2) microarchitectural simulator and
supports the Southern Island assembly language [28]. For both
tools, reliability is analyzed looking at the low-level assembly
code running on the real hardware. Therefore, for NVIDIA
GPUs, the SASS assembly is used instead of the PTX. This
gives access to the real architectural registers and, therefore,
allows for a fair comparison of NVIDIA and AMD chips.
This study focuses on soft-errors, i.e., bit-flips of a memory
element mainly caused by radiations, thermal cycling,
transistor variability and erratic fluctuation of voltage.
Global memory (Global memory)
Compute Unit
(Streaming
Multiprocessor)
Ultra-Threaded Dispatcher (Thread Block Scheduler)
Wavefront(Warp)
scheduler
Local Memory (Shared memory)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
PE
(TP)
Register
File
Functional
Units
Register
File
SIMD Unit (SFU) SIMD Unit (SFU)
Private memory
(Local memory)
__kernel func {
…
…
}
Work-Item (Thread)
Work-item
Wavefront (Warp)
Work-item
Work-item
Work-item
...
Wavefront Wavefront
Wavefront Wavefront
Local memory (Shared memory)
Work-group (Block)
Global memory (Global memory)
NDRange (Grid)
Work-group Work-group
Work-group Work-group...
...
... ...
Comment [
CU1
WI0
R0 R1 Rn
WI1
WG0
WG1
THE ACE ANALYSIS FRAMEWORK
ACE analysis engine
• Profiles the application to identify ACE bits at each clock
cycleVector register file
computation
Register
access
profile for
each kernel
READ WRITE
WRITE WRITE
WRITE READ
READ READ
Un-ACE
Un-ACE
ACE
ACE
We do not consider dead instructionsand logic masking
• Pessimistic AVF
• Very fast analysis
AVF calculation
EXPERIMENTAL RESULTS
Experimental
setup
CHIP FX FX 5800 GTX
480
HD 7970
Technology 90nm 55nm 28 nm 28 nm
λ FIT/bit* 1E-3 0.72E-3 0.52E- 0.32E-3
Vendor NVIDIA NVIDIA NVIDIA AMD
Architecture G80 GT200 Fermi Southern
Islands
Vector
File
64KB 64KB 128KB 256 KB
Local Memory 16KB 16KB 64KB 64 KB
SIMD Units 1 2 4 4
(1) Backprop, (2) DWTHaar1D, (3) Gaussian, (4) Histogram, (5)Kmeans,
(6)MatrixMultiplication, (7) Reduction, (8) Scan (9) MatrixTranspose, (10) VectorAdd
10 benchmarks from AMD-APP SDK, CUDA SDK, RODINIA
Statistical fault sampling
e = 3% error margin
t = 95% confidence level
p = 0.5
N = structure size
2K injections per
structure* [Ibe et al. T-ED, 2010]
EXPERIMENTAL RESULTS
Computing AVF of register file
AVF for the Vector Register File
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0%
10%
20%
30%
40%
50%
60%
70%
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd average
Occupancy
AVF
Register File
AVF-FI AVF-ACE Occupancy
High overestimation of ACE analysis with respect to FI
Fault injection vs. ACE analysis
EXPERIMENTAL RESULTS
Computing AVF Util of register file
Differences of microarchitectures are reflected in the scheduling and in
vulnerability
To eliminate the contribution of occupancy
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd average
AVFUTIL
Register File
AVF UTIL - FI AVF UTIL - ACE
EXPERIMENTAL RESULTS
Computing Failures In Time
Register file impacts reliability more than Local Memory
0
50
100
150
200
250
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd
FIT
Failures in Time
FIT regfile - FI FIT lmem - FI
FIT does not take into account performance
1E+12
1E+13
1E+14
1E+15
1E+16
1E+17
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
Rad.7970
FX5600
FX5800
GTX480
backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd average
Executions per Failure
EPF-FI
EXPERIMENTAL RESULTS
Computing Execution Per Failure
Higher parallelism translates into more performance
A larger number of executions before a failure happens
•10 benchmarks AMD/OpenCL and
NVIDIA/CUDA
•Reliability analysis comparing fault injection
and ACE analysis
•Joint evaluation of reliability and
performance
CONCLUSIONS
Reliability and performance evaluation of AMD and
NVIDIA GPU architectures
Questions?
18
http://www.testgroup.polito.it
TestGroup
@TestGroupPolito

More Related Content

What's hot

WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...AMD Developer Central
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDARaymond Tay
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
GPU programming and Its Case Study
GPU programming and Its Case StudyGPU programming and Its Case Study
GPU programming and Its Case StudyZhengjie Lu
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...AMD Developer Central
 
Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFOfer Rosenberg
 
Congatec_Global Vendor for Innovative Embedded Solutions_Istanbul
Congatec_Global Vendor for Innovative Embedded Solutions_IstanbulCongatec_Global Vendor for Innovative Embedded Solutions_Istanbul
Congatec_Global Vendor for Innovative Embedded Solutions_IstanbulIşınsu Akçetin
 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudamistercteam
 
From Breadboard to Finished Product
From Breadboard to Finished ProductFrom Breadboard to Finished Product
From Breadboard to Finished ProductOmer Kilic
 
Performance and power comparisons between nvidia and ati gpus
Performance and power comparisons between nvidia and ati gpusPerformance and power comparisons between nvidia and ati gpus
Performance and power comparisons between nvidia and ati gpusijcsit
 
SE-4128, DRM: From software secrets to hardware protection, by Rod Schultz
SE-4128, DRM: From software secrets to hardware protection, by Rod SchultzSE-4128, DRM: From software secrets to hardware protection, by Rod Schultz
SE-4128, DRM: From software secrets to hardware protection, by Rod SchultzAMD Developer Central
 
Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...
Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...
Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...智啓 出川
 
Dynamic Resolution Techniques for Intel® Processor Graphics | SIGGRAPH 2018 T...
Dynamic Resolution Techniques for Intel® Processor Graphics | SIGGRAPH 2018 T...Dynamic Resolution Techniques for Intel® Processor Graphics | SIGGRAPH 2018 T...
Dynamic Resolution Techniques for Intel® Processor Graphics | SIGGRAPH 2018 T...Intel® Software
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
 

What's hot (18)

WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
WT-4071, GPU accelerated 3D graphics for Java, by Kevin Rushforth, Chien Yang...
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
GPU Ecosystem
GPU EcosystemGPU Ecosystem
GPU Ecosystem
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
Cuda
CudaCuda
Cuda
 
GPU programming and Its Case Study
GPU programming and Its Case StudyGPU programming and Its Case Study
GPU programming and Its Case Study
 
The GPGPU Continuum
The GPGPU ContinuumThe GPGPU Continuum
The GPGPU Continuum
 
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
 
Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOF
 
Congatec_Global Vendor for Innovative Embedded Solutions_Istanbul
Congatec_Global Vendor for Innovative Embedded Solutions_IstanbulCongatec_Global Vendor for Innovative Embedded Solutions_Istanbul
Congatec_Global Vendor for Innovative Embedded Solutions_Istanbul
 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cuda
 
From Breadboard to Finished Product
From Breadboard to Finished ProductFrom Breadboard to Finished Product
From Breadboard to Finished Product
 
Performance and power comparisons between nvidia and ati gpus
Performance and power comparisons between nvidia and ati gpusPerformance and power comparisons between nvidia and ati gpus
Performance and power comparisons between nvidia and ati gpus
 
SE-4128, DRM: From software secrets to hardware protection, by Rod Schultz
SE-4128, DRM: From software secrets to hardware protection, by Rod SchultzSE-4128, DRM: From software secrets to hardware protection, by Rod Schultz
SE-4128, DRM: From software secrets to hardware protection, by Rod Schultz
 
Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...
Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...
Schematic diagrams of GPUs' architecture and Time evolution of theoretical FL...
 
Dynamic Resolution Techniques for Intel® Processor Graphics | SIGGRAPH 2018 T...
Dynamic Resolution Techniques for Intel® Processor Graphics | SIGGRAPH 2018 T...Dynamic Resolution Techniques for Intel® Processor Graphics | SIGGRAPH 2018 T...
Dynamic Resolution Techniques for Intel® Processor Graphics | SIGGRAPH 2018 T...
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
 

Similar to Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA and AMD GPUs

Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievVolodymyr Saviak
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
 
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...journalBEEI
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit pptSandeep Singh
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
Stream Processing
Stream ProcessingStream Processing
Stream Processingarnamoy10
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Brendan Gregg
 
Optimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core ArchitecturesOptimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core Architecturespsteinb
 
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel ComputingIRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel ComputingIRJET Journal
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
978-1-4577-1343-912$26.00 ©2014 IEEE Reliability an.docx
978-1-4577-1343-912$26.00 ©2014 IEEE  Reliability an.docx978-1-4577-1343-912$26.00 ©2014 IEEE  Reliability an.docx
978-1-4577-1343-912$26.00 ©2014 IEEE Reliability an.docxevonnehoggarth79783
 
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel MovidiusBenchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel MovidiusbyteLAKE
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)Fatima Qayyum
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 

Similar to Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA and AMD GPUs (20)

Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...Architecture exploration of recent GPUs to analyze the efficiency of hardware...
Architecture exploration of recent GPUs to analyze the efficiency of hardware...
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
Optimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core ArchitecturesOptimizing Direct X On Multi Core Architectures
Optimizing Direct X On Multi Core Architectures
 
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel ComputingIRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
978-1-4577-1343-912$26.00 ©2014 IEEE Reliability an.docx
978-1-4577-1343-912$26.00 ©2014 IEEE  Reliability an.docx978-1-4577-1343-912$26.00 ©2014 IEEE  Reliability an.docx
978-1-4577-1343-912$26.00 ©2014 IEEE Reliability an.docx
 
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel MovidiusBenchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius
Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 

Recently uploaded

main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 

Recently uploaded (20)

main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 

Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA and AMD GPUs

  • 1. Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA and AMD GPUs A. Vallero*, S. Tselonis, D. Gizopoulos and S. Di Carlo 26 IEEE VLSI Test Symposium, Hyatt Hotel, San Francisco, CA, April 22-25, 2018
  • 2. 2 Licensing Note This work is licensed under the Creative Commons Attribution-NonCommercial NoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. You are free: to copy, distribute, display, and perform the work Under the following conditions: • Attribution. You must attribute the work in the manner specified by the author or licensor. • Non-commercial. You may not use this work for commercial purposes. • No Derivative Works. You may not alter, transform, or build upon this work. • For any reuse or distribution, you must make clear to others the license terms of this work. • Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above.
  • 3. • GPUs are increasingly used in applications where reliability is a top concern ✚ High computational power ⁃ Susceptibility to faults due to technology shrinking • Reliability analysis for GPU based systems is a complex task that requires dedicated tools. MOTIVATIONS Reliability analysis for design space exploration in GPGPU systems Super computing Automotive Biomedical
  • 4. OUTLINE Previous works Contributions The reliability evaluation framework Experiment Conclusions
  • 5. • Reliability estimation framework for NVIDIA GPUs ⁃ GPGPU-Soda [Tan et al., JPC,2013] – microarchitecture-level ACE analysis framework working on PTX assembly language ⁃ GUFI [Tselonis et al., ISPASS,2016] – microarchitecture-level fault injector based on GPGPU-Sim ⁃ SASSIFI [Kumar et al., ISPASS, 2017] – very fast, fault injection by profiling and debugging on real hardware • Reliability estimation framework for AMD GPUs ⁃ [Farazmand et al., SELSE, 2012] - Evergreen AMD GPU fault injector ⁃ SIFI [Vallero et al., IOLTS, 2017] – Southern Islands microarchitecture-level ACE analysis and fault injector framework PREVIOUS WORKS Reliability evaluation tools for NVIDIA and AMD GPUs
  • 6. • Reliability and Performance are jointly evaluated • Two reliability analysis methodologies are adopted and compared: ⁃ ACE analysis ⁃ Fault injection • Software and hardware masking properties are highlighted CONTRIBUTION Microarchitecture level comparison among NVIDIA and AMD GPUs
  • 7. THE RELIABILITY FRAMEWORK Architecture of the tools SIFI / GUFI Multi2Sim / GPGPU-Sim Fault Injector ACE Analyzer Reliability and performance report Architecture Vulnerability Factor (AVF) Architecture Vulnerability Factor Util (AVF Util) Failures In Time (FIT)
  • 8. THE GPU MICRO-ARCHITECTURE An example for Southern Islands GPU architecture Branch Unit Local Data Unit Scalar Unit SIMD 1 SIMD 2 SIMD 1 Local Memory Vector Reg. File Scalar Reg. File Front-end Global memory Compute unit (CU1) CU2 CU3 Ultra-thread dispatcher Big area increases susceptibility to soft-errors Power consumption can be saved by disabling the ECC on these hardware structures” [Fang et. al, ISPASS 2014] Soft-errors
  • 9. THE FAULT INJECTION FRAMEWORK Fault injection engine Application profiler Computes time intervals in which kernels are executed and the output of the golden simulation Fault generation Generates the fault list - single bit flips uniformly distributed in time and space Fault profiler Issues a simulation to profile if faults are striking “Util” resources. Marks simulations containing these faults as “Util” injections Parallel fault injection Performs “Util” injections in parallel threads. Non “Util” injections are always masked and therefore not simulated Error Classification Classifies errors based on the outcome of simulations and generates the reliability report describes the programming models and architectures of the considered GPUs. Section III discusses the reliability evaluation framework. Results are presented and discussed in Section IV. Related works are reported in Section V, and Section VI concludes the paper. II. A. GPU ARCHITECTURES AND PROGRAMMING MODELS In our study, we considered four GPU chips with different architectures: AMD HD RadeonTM 7970 (Southern Islands architecture), NVIDIA QuadroTM FX 5600 (G80 architecture), NVIDIA QuadroTM FX5800 (GT200 architecture) and NVIDIA GeforceTM GTX 480 (Fermi architecture). We performed our analysis using a set of 10 benchmarks coded using the OpenCL programming language for the AMD Southern Islands chip and the CUDA programming language for the NVIDIA chips. OpenCL [13] and CUDA [14] are the most widespread GPGPU programming languages and both represent abstractions of the physical hardware. Both languages rely on similar concepts and explore the Single Instruction Multiple Data (SIMD) paradigm. Throughout the paper, we use the OpenCL and AMD terminology. Readers more familiar with CUDA can refer to Fig. 1 and Fig. 2 where CUDA/NVIDIA terminology is reported in parentheses next to the OpenCL/AMD terms. From the hardware standpoint (Fig. 1), a GPU typically consists of several compute units (CU) sharing a global memory and managed by a dispatcher. Each CU contains multiple processing elements (PEs), each one including a set of functional units (e.g., integer unit, floating point unit, etc.) and a register file (typically a portion of a global register file split and assigned to the different PEs). The CU also includes a local memory and a scheduler that distributes and coordinates the work of the different PEs. The PEs are grouped into SIMD Units. All PEs in a SIMD Unit share the instruction fetch/dispatch logic and execute the same instruction concurrently. Fig. 1. A generic GPU architecture. Nomenclature follows the OpenCL/AMD model with equivalent CUDA/NVIDIA terms in parentheses (when different). Fig. 2 graphically represents the OpenCL/CUDA programming model that can be mapped on the GPU architecture of Fig. 1. Parallel portions of the application are executed in parallel on the PEs of a SIMD Unit (Fig. 1) constitutes a wavefront. Work-items are also aggregated in work-groups. Each work-group is independent from the others. Work-items belonging to the same work-group can be synchronized and can communicate with each other through a common memory space named local memory. Every time a work-group must be executed it is assigned to a compute unit (CU). The availability of multiple CUs enables the accommodation of a large number of work-groups. Finally, a set of work-groups that are concurrently scheduled to run on a GPU constitute a ND-Range. The ND-Range provides a global memory shared by all the work-groups. Communication among different work-groups is not allowed. Fig. 2. The OpenCL/CUDA programming model for AMD and NVIDIA GPUs. III. RELIABILITY EVALUATION FRAMEWORK This section introduces the framework we developed to analyze reliability and performance of different GPU architectures and workloads. The framework includes two tools named GUFI and SIFI. GUFI, previously presented in [8] has been developed to perform reliability analysis on NVIDIA GPUs. It is based on the GPGPU-Sim (v3.2.2) [33] micro architectural simulator and is able to perform complex fault- injection campaigns supporting both the SASS and PTX assembly languages. GUFI has been extended for the purposes of our study to perform ACE-based analysis. Similarly, to GUFI, SIFI is a fault injection and ACE analysis tool developed to characterize AMD GPUs [31]. SIFI is built on top of the Mult2Sim (v4.2) microarchitectural simulator and supports the Southern Island assembly language [28]. For both tools, reliability is analyzed looking at the low-level assembly code running on the real hardware. Therefore, for NVIDIA GPUs, the SASS assembly is used instead of the PTX. This gives access to the real architectural registers and, therefore, allows for a fair comparison of NVIDIA and AMD chips. This study focuses on soft-errors, i.e., bit-flips of a memory element mainly caused by radiations, thermal cycling, transistor variability and erratic fluctuation of voltage. Global memory (Global memory) Compute Unit (Streaming Multiprocessor) Ultra-Threaded Dispatcher (Thread Block Scheduler) Wavefront(Warp) scheduler Local Memory (Shared memory) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) PE (TP) Register File Functional Units Register File SIMD Unit (SFU) SIMD Unit (SFU) Private memory (Local memory) __kernel func { … … } Work-Item (Thread) Work-item Wavefront (Warp) Work-item Work-item Work-item ... Wavefront Wavefront Wavefront Wavefront Local memory (Shared memory) Work-group (Block) Global memory (Global memory) NDRange (Grid) Work-group Work-group Work-group Work-group... ... ... ... Comment [ CU1 WI0 R0 R1 Rn WI1 WG0 WG1
  • 10. THE ACE ANALYSIS FRAMEWORK ACE analysis engine • Profiles the application to identify ACE bits at each clock cycleVector register file computation Register access profile for each kernel READ WRITE WRITE WRITE WRITE READ READ READ Un-ACE Un-ACE ACE ACE We do not consider dead instructionsand logic masking • Pessimistic AVF • Very fast analysis AVF calculation
  • 11. EXPERIMENTAL RESULTS Experimental setup CHIP FX FX 5800 GTX 480 HD 7970 Technology 90nm 55nm 28 nm 28 nm λ FIT/bit* 1E-3 0.72E-3 0.52E- 0.32E-3 Vendor NVIDIA NVIDIA NVIDIA AMD Architecture G80 GT200 Fermi Southern Islands Vector File 64KB 64KB 128KB 256 KB Local Memory 16KB 16KB 64KB 64 KB SIMD Units 1 2 4 4 (1) Backprop, (2) DWTHaar1D, (3) Gaussian, (4) Histogram, (5)Kmeans, (6)MatrixMultiplication, (7) Reduction, (8) Scan (9) MatrixTranspose, (10) VectorAdd 10 benchmarks from AMD-APP SDK, CUDA SDK, RODINIA Statistical fault sampling e = 3% error margin t = 95% confidence level p = 0.5 N = structure size 2K injections per structure* [Ibe et al. T-ED, 2010]
  • 12. EXPERIMENTAL RESULTS Computing AVF of register file AVF for the Vector Register File 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd average Occupancy AVF Register File AVF-FI AVF-ACE Occupancy High overestimation of ACE analysis with respect to FI Fault injection vs. ACE analysis
  • 13. EXPERIMENTAL RESULTS Computing AVF Util of register file Differences of microarchitectures are reflected in the scheduling and in vulnerability To eliminate the contribution of occupancy 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd average AVFUTIL Register File AVF UTIL - FI AVF UTIL - ACE
  • 14. EXPERIMENTAL RESULTS Computing Failures In Time Register file impacts reliability more than Local Memory 0 50 100 150 200 250 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd FIT Failures in Time FIT regfile - FI FIT lmem - FI FIT does not take into account performance
  • 15. 1E+12 1E+13 1E+14 1E+15 1E+16 1E+17 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 Rad.7970 FX5600 FX5800 GTX480 backprop dwtHaar1D gaussian histogram kmeans matrixMul reduction scan transpose vectoradd average Executions per Failure EPF-FI EXPERIMENTAL RESULTS Computing Execution Per Failure Higher parallelism translates into more performance A larger number of executions before a failure happens
  • 16. •10 benchmarks AMD/OpenCL and NVIDIA/CUDA •Reliability analysis comparing fault injection and ACE analysis •Joint evaluation of reliability and performance CONCLUSIONS Reliability and performance evaluation of AMD and NVIDIA GPU architectures