Graphics Processing Unit

DHAN V SAGAR
CB.EN.P2CSE13007
Introduction
It is a processor optimized for 2D/3D graphics, video,
visual computing, and display.
It is highly parallel, highly multi threaded multiprocessor
optimized for visual computing.
It provide real-time visual interaction with computed
objects via graphics images, and video.
History
●

Up to late 90's
– No GPUs
– Much simpler VGA controller
● Consisted of
– A memory controller
– Display generator + DRAM
● DRAM was either shared with CPU
or private
History
●

By 1997
– More complex VGA controllers
● Incorporated 3D accelerating functions in
hardware
– Triangle set up and rasterization
– Texture mapping and shading
A combination of shapes(Lines, polygons, letters, …)
into an image consisting of individual pixels
History
●

By 2000
– Single chip graphics processor incorporated
nearly all functions of graphics pipeline of
high-end workstations
● Beginning of the end of high-end
workstation market
– VGA controller was renamed Graphic
Processing Units
Current Trends
Well defined APIs
Open GL:
Open standard for 3D graphics programming
Web GL:
Open GL extension for web
DirectX:
Set of MS multimedia programming interfaces
(Direct3D for 3D graphics)
Can implement novel graphics algorithms
Use GPUs for non-conventional applications
Current Trends

Combining powers of CPU and GPU - heterogeneous
architectures
GPUs become scalable parallel processors
Moving from hardware-defined pipelining architectures to
more flexible programmable architectures
Architechture Evolution

Memory

CPU

floating point co-processors
attached to microprocessors.

Graphic
s card

Interest to provide hardware
support for displays

Display

Led to graphics processing units
(GPUs)
GPUs with dedicated pipelines
Input stage

Vertex shader
stage
Graphi
cs
memor
y

Geometry
shader stage

Frame
buffer

Rasterizer
stage
Pixel shading
stage

Graphics chips generally had a
pipeline structure
individual stages performing
Specialized operations, finally
leading to loading frame buffer for
Display
Individual stages may have access
to graphics memory for storing
intermediate computed data.
PROGRAMMING GPUS

•

•

•

Will focus on parallel computing applications
Must decompose problem into set of parallel
computations
Ideally two-level to match GPU organization
Example
Small Small
array array

Data are in
big array

Small
array

Small Small
array array

Tiny

Tiny

Tiny

Tiny
GPGU and CUDA
GPGU
●

General-Purpose computing on GPU

●

Uses traditional graphics API and graphics pipeline
CUDA

●

Compute Unified Device Architecture

●

Parallel computing platform and programming model

●

Invented by NVIDIA

●

Single Program Multiple Data approach
CUDA
➢

➢

➢

CUDA programs are written in C
Within C programs, call SIMT “kernel” routines that are
executed on GPU

Provides three abstractions
➢
➢
➢

Hierarchy of thread groups
Shared memory
Barrier synchronization
Cont..
CUDA
●

●

●

Lowest level of parallelism – CUDA Thread
Compiler + Hardware can gang 1000s of CUDA threads
together leads to various levels of parallelism within the
GPU
MIMD,SIMD,Instruction level Parallelism
Single Instruction, Multiple Thread (SIMT)
Conventional C Code
// Invoke DAXPY
dapxy(n,2.0,x,y);
// DAXPY in C
void daxpy(int n,double a,double *x, double *y)
{
for (int i=0;i<n;++i)
y[i] = a*x[i] + y[i];
}
Corresponding CUDA Code
// Invoke DAXPY with 256 threads per Thread Block
_host_
int nblocks = (n+255)/256;
daxpy<<<nblocks,256>>>(n,2.0,x,y);
//DAXPY in CUDA
_device_
Void daxpy(int n,double a,double *x, double *y)
{
int i = blockIdX.x*blockDim.x+threadIdx.x;
if(i<n) y[i]=a*x[i]+y[i];
}
●
Cont...
●

_device_ (OR) _global_

●

_host_

●

●

---

functions of GPU

--- functions of the system processor

CUDA variables declared in the _device_ are allocated to
the GPU Memory,which is acessable by all the multithreaded
SIMD processors

Function call syntax for the function uses GPU is
name<<<dimGrid,dimBlock>>>(..parameterlist..)

●

GPU Hardware handles Threads
●

●

Threads are blocked together and executed in group of
32 threads – Thread Block
The hardware that executes a whole block of threats is
called a Multithreaded SIMD Processor
Reference
http://en.wikipedia.org/wiki/Graphics_processing_unit
http://www.nvidia.com/object/cuda_home_new.html
http://computershopper.com/feature/200704_the_right_gpu_for_you
http://www.cs.virginia.edu/~gfx/papers/pdfs/59_HowThingsWork.pdf
http://en.wikipedia.org/wiki/Larrabee_(GPU)#cite_note-siggraph-9
http://www.nvidia.com/geforce
“Larrabee: A Many-Core x86 Architecture for Visual Computing”, Kruger and
Westermann, International Conf. on Computer Graphics and Interactive Techniques,
2005
“ An Analytical Model for a GPU Architecture with Memory-level and Thread-level
Parallelism Awareness”Sunpyo Hong,Hyesoon Kim
Thank You..

GPU - An Introduction

  • 1.
    Graphics Processing Unit DHANV SAGAR CB.EN.P2CSE13007
  • 2.
    Introduction It is aprocessor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multi threaded multiprocessor optimized for visual computing. It provide real-time visual interaction with computed objects via graphics images, and video.
  • 3.
    History ● Up to late90's – No GPUs – Much simpler VGA controller ● Consisted of – A memory controller – Display generator + DRAM ● DRAM was either shared with CPU or private
  • 4.
    History ● By 1997 – Morecomplex VGA controllers ● Incorporated 3D accelerating functions in hardware – Triangle set up and rasterization – Texture mapping and shading A combination of shapes(Lines, polygons, letters, …) into an image consisting of individual pixels
  • 5.
    History ● By 2000 – Singlechip graphics processor incorporated nearly all functions of graphics pipeline of high-end workstations ● Beginning of the end of high-end workstation market – VGA controller was renamed Graphic Processing Units
  • 6.
    Current Trends Well definedAPIs Open GL: Open standard for 3D graphics programming Web GL: Open GL extension for web DirectX: Set of MS multimedia programming interfaces (Direct3D for 3D graphics) Can implement novel graphics algorithms Use GPUs for non-conventional applications
  • 7.
    Current Trends Combining powersof CPU and GPU - heterogeneous architectures GPUs become scalable parallel processors Moving from hardware-defined pipelining architectures to more flexible programmable architectures
  • 8.
    Architechture Evolution Memory CPU floating pointco-processors attached to microprocessors. Graphic s card Interest to provide hardware support for displays Display Led to graphics processing units (GPUs)
  • 9.
    GPUs with dedicatedpipelines Input stage Vertex shader stage Graphi cs memor y Geometry shader stage Frame buffer Rasterizer stage Pixel shading stage Graphics chips generally had a pipeline structure individual stages performing Specialized operations, finally leading to loading frame buffer for Display Individual stages may have access to graphics memory for storing intermediate computed data.
  • 10.
    PROGRAMMING GPUS • • • Will focuson parallel computing applications Must decompose problem into set of parallel computations Ideally two-level to match GPU organization
  • 11.
    Example Small Small array array Dataare in big array Small array Small Small array array Tiny Tiny Tiny Tiny
  • 12.
    GPGU and CUDA GPGU ● General-Purposecomputing on GPU ● Uses traditional graphics API and graphics pipeline CUDA ● Compute Unified Device Architecture ● Parallel computing platform and programming model ● Invented by NVIDIA ● Single Program Multiple Data approach
  • 13.
    CUDA ➢ ➢ ➢ CUDA programs arewritten in C Within C programs, call SIMT “kernel” routines that are executed on GPU Provides three abstractions ➢ ➢ ➢ Hierarchy of thread groups Shared memory Barrier synchronization
  • 14.
  • 15.
    CUDA ● ● ● Lowest level ofparallelism – CUDA Thread Compiler + Hardware can gang 1000s of CUDA threads together leads to various levels of parallelism within the GPU MIMD,SIMD,Instruction level Parallelism Single Instruction, Multiple Thread (SIMT)
  • 16.
    Conventional C Code //Invoke DAXPY dapxy(n,2.0,x,y); // DAXPY in C void daxpy(int n,double a,double *x, double *y) { for (int i=0;i<n;++i) y[i] = a*x[i] + y[i]; }
  • 17.
    Corresponding CUDA Code //Invoke DAXPY with 256 threads per Thread Block _host_ int nblocks = (n+255)/256; daxpy<<<nblocks,256>>>(n,2.0,x,y); //DAXPY in CUDA _device_ Void daxpy(int n,double a,double *x, double *y) { int i = blockIdX.x*blockDim.x+threadIdx.x; if(i<n) y[i]=a*x[i]+y[i]; } ●
  • 18.
    Cont... ● _device_ (OR) _global_ ● _host_ ● ● --- functionsof GPU --- functions of the system processor CUDA variables declared in the _device_ are allocated to the GPU Memory,which is acessable by all the multithreaded SIMD processors Function call syntax for the function uses GPU is name<<<dimGrid,dimBlock>>>(..parameterlist..) ● GPU Hardware handles Threads
  • 19.
    ● ● Threads are blockedtogether and executed in group of 32 threads – Thread Block The hardware that executes a whole block of threats is called a Multithreaded SIMD Processor
  • 20.
    Reference http://en.wikipedia.org/wiki/Graphics_processing_unit http://www.nvidia.com/object/cuda_home_new.html http://computershopper.com/feature/200704_the_right_gpu_for_you http://www.cs.virginia.edu/~gfx/papers/pdfs/59_HowThingsWork.pdf http://en.wikipedia.org/wiki/Larrabee_(GPU)#cite_note-siggraph-9 http://www.nvidia.com/geforce “Larrabee: A Many-Corex86 Architecture for Visual Computing”, Kruger and Westermann, International Conf. on Computer Graphics and Interactive Techniques, 2005 “ An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness”Sunpyo Hong,Hyesoon Kim
  • 21.