GPU Performance Prediction Using High-level Application Models

Filipo Novo Mór
Advisors:
Dr. César Augusto Missio Marcon
Dr. Andrew Rau-Chaplin
GPU Performance Prediction
Using High-level Application
Models
ERAD 2014 presentation
2014 March
Pontifical Catholic University of Rio Grande do Sul
Faculty of Informatics
Postgraduate Programme in Computer Science

Outline
• Objectives
• Related Works
• Graphic Processor Units
• Methodology
• Performance Prediction Engine
• Work Schedule

Objectives
• To model applications in high-level in order to
predict their behaviour when running on GPU.
– Secondary goals:
• To create a description of a high-level model for the target
GPU architecture.
• To evaluate the impact of using different cache sizes on
the tested applications
3 / 17

Related Works
• Theoretical works:
app. arch. CUDA HLRA
An Adaptive Performance Modeling Tool for GPU
Architectures
Baghsorkhi et all no yes
source
code
no
performance prediction and bottleneck
indicators
Cross-architecture Performance Predictions for
Scientific Applications Using Parameterized Models
Marin and Mellor-
Crummey
yes yes
source
code
no performance prediction
An Analytical Model for a GPU Architecture with
Memory-level and Thread-level Parallelism Awareness
Hong and Kim no no
source
code
no
performance prediction. Also proposed
two new metrics for GPU modelling,
MWP and CWP
Exploring the multiple-GPU design space Schaa and Kaeli no yes
source
code
no performance benchmark
A Quantitative Performance Analysis Model for GPU
Architectures
Zhang and Owens no yes
source
code
no performance benchmark
yes yes no yes performance prediction
authorswork
modelling inputs
outputs
this work
4 / 17

Related Works
• Application tools:
work authors inputs outputs target architecture
Barra Collange et all CUDA source code execution measurements NVIDIA Tesla
GPU_Sim Bakhoda et all CUDA source code execution measurements NVIDIA Tesla and GT200
GPU Ocelot Diamos et all CUDA source code execution measurements PTX 2.3 (CUDA 4.0)
HLRA execution measurements NVIDIA GK110this work
gpgpu-sim.org
5 / 17

Graphic Processor Unit
Simplified architecture of a NVIDIA GPU
6 / 17

Simplified architecture of a NVIDIA GPU showing the
internal sctructure of streaming multiprocessors
7 / 17

When a thread block is assigned to a streaming
multiprocessor, it is divided into units called WARPS.
8 / 17
Mohamed Zahran

SIMT vs SIMD
• Single Instruction, Multiple Register Sets: each thread has its own register
set, consequently, instructions may process different data simultaneously on
different parallel running threads.
• Single Instruction, Multiple Addresses: each thread is permitted to freely
access non-coalesced memory addresses, given more flexibility to the
programmer. However, this is a unsafe technique because parallel access to
non-coalesced addresses may serialize transactions, which reduce
performance significantly.
• Single Instruction, Multiple Flow Paths: the control flow of different parallel
running threads can diverge.
9 / 17

Branch Divergence
10 / 17

Branch Divergence
11 / 17

The Key Challenges for GPU Programming
• Data transfer between CPU and GPU
• Memory access
• Branch divergence
• No recursion
12 / 17

Methodology
Validating
• Applications will be implement in CUDA as well as in
HLRA.
• Applications will be chosen accordind to its profile:
– Computation vs Communication
– Sizing
14 / 17

Performance Prediction Engine
Aspects to be considered by the engine
• Branch divergence
• Memory access
– Local, Global, Shared and thread register block.
• Thread synchronization
• Loops
15 / 17

Questions
Filipo Novo Mór
filipo.mor at acad.pucrs.br

GPU Performance Prediction Using High-level Application Models

More Related Content

What's hot

Viewers also liked

Similar to GPU Performance Prediction Using High-level Application Models

More from Filipo Mór

Recently uploaded

GPU Performance Prediction Using High-level Application Models