GPU - An Introduction

Graphics Processing Unit

DHAN V SAGAR
CB.EN.P2CSE13007

Introduction
It is a processor optimized for 2D/3D graphics, video,
visual computing, and display.
It is highly parallel, highly multi threaded multiprocessor
optimized for visual computing.
It provide real-time visual interaction with computed
objects via graphics images, and video.

History
●

Up to late 90's
– No GPUs
– Much simpler VGA controller
● Consisted of
– A memory controller
– Display generator + DRAM
● DRAM was either shared with CPU
or private

History
●

By 1997
– More complex VGA controllers
● Incorporated 3D accelerating functions in
hardware
– Triangle set up and rasterization
– Texture mapping and shading
A combination of shapes(Lines, polygons, letters, …)
into an image consisting of individual pixels

History
●

By 2000
– Single chip graphics processor incorporated
nearly all functions of graphics pipeline of
high-end workstations
● Beginning of the end of high-end
workstation market
– VGA controller was renamed Graphic
Processing Units

Current Trends
Well defined APIs
Open GL:
Open standard for 3D graphics programming
Web GL:
Open GL extension for web
DirectX:
Set of MS multimedia programming interfaces
(Direct3D for 3D graphics)
Can implement novel graphics algorithms
Use GPUs for non-conventional applications

Current Trends

Combining powers of CPU and GPU - heterogeneous
architectures
GPUs become scalable parallel processors
Moving from hardware-defined pipelining architectures to
more flexible programmable architectures

Architechture Evolution

Memory

CPU

floating point co-processors
attached to microprocessors.

Graphic
s card

Interest to provide hardware
support for displays

Display

Led to graphics processing units
(GPUs)

GPUs with dedicated pipelines
Input stage

Vertex shader
stage
Graphi
cs
memor
y

Geometry
shader stage

Frame
buffer

Rasterizer
stage
Pixel shading
stage

Graphics chips generally had a
pipeline structure
individual stages performing
Specialized operations, finally
leading to loading frame buffer for
Display
Individual stages may have access
to graphics memory for storing
intermediate computed data.

PROGRAMMING GPUS

•

•

•

Will focus on parallel computing applications
Must decompose problem into set of parallel
computations
Ideally two-level to match GPU organization

Example
Small Small
array array

Data are in
big array

Small
array

Small Small
array array

Tiny

Tiny

Tiny

Tiny

GPGU and CUDA
GPGU
●

General-Purpose computing on GPU

●

Uses traditional graphics API and graphics pipeline
CUDA

●

Compute Unified Device Architecture

●

Parallel computing platform and programming model

●

Invented by NVIDIA

●

Single Program Multiple Data approach

CUDA
➢

➢

➢

CUDA programs are written in C
Within C programs, call SIMT “kernel” routines that are
executed on GPU

Provides three abstractions
➢
➢
➢

Hierarchy of thread groups
Shared memory
Barrier synchronization

CUDA
●

●

●

Lowest level of parallelism – CUDA Thread
Compiler + Hardware can gang 1000s of CUDA threads
together leads to various levels of parallelism within the
GPU
MIMD,SIMD,Instruction level Parallelism
Single Instruction, Multiple Thread (SIMT)

Conventional C Code
// Invoke DAXPY
dapxy(n,2.0,x,y);
// DAXPY in C
void daxpy(int n,double a,double *x, double *y)
{
for (int i=0;i<n;++i)
y[i] = a*x[i] + y[i];
}

Corresponding CUDA Code
// Invoke DAXPY with 256 threads per Thread Block
_host_
int nblocks = (n+255)/256;
daxpy<<<nblocks,256>>>(n,2.0,x,y);
//DAXPY in CUDA
_device_
Void daxpy(int n,double a,double *x, double *y)
{
int i = blockIdX.x*blockDim.x+threadIdx.x;
if(i<n) y[i]=a*x[i]+y[i];
}
●

Cont...
●

_device_ (OR) _global_

●

_host_

●

●

---

functions of GPU

--- functions of the system processor

CUDA variables declared in the _device_ are allocated to
the GPU Memory,which is acessable by all the multithreaded
SIMD processors

Function call syntax for the function uses GPU is
name<<<dimGrid,dimBlock>>>(..parameterlist..)

●

GPU Hardware handles Threads

●

●

Threads are blocked together and executed in group of
32 threads – Thread Block
The hardware that executes a whole block of threats is
called a Multithreaded SIMD Processor

Reference
http://en.wikipedia.org/wiki/Graphics_processing_unit
http://www.nvidia.com/object/cuda_home_new.html
http://computershopper.com/feature/200704_the_right_gpu_for_you
http://www.cs.virginia.edu/~gfx/papers/pdfs/59_HowThingsWork.pdf
http://en.wikipedia.org/wiki/Larrabee_(GPU)#cite_note-siggraph-9
http://www.nvidia.com/geforce
“Larrabee: A Many-Core x86 Architecture for Visual Computing”, Kruger and
Westermann, International Conf. on Computer Graphics and Interactive Techniques,
2005
“ An Analytical Model for a GPU Architecture with Memory-level and Thread-level
Parallelism Awareness”Sunpyo Hong,Hyesoon Kim

GPU - An Introduction

More Related Content

What's hot

Similar to GPU - An Introduction

Recently uploaded

GPU - An Introduction