Computing using GPUs

Computing Using
Graphics Cards

Shree Kumar, Hewlett Packard
http://www.shreekumar.in/

Speaker Intro

• High Performance Computing @ Hewlett‐Packard
– VizStack (http://vizstack.sourceforge.net)
– GPU Computing
• Big 3D enthusiast
• Travels a lot
• Blogs at http://www.shreekumar.in/

What we will cover

• GPUs and their history
• Why use GPUs
• Architecture
• Getting Started with GPU Programming
• Challenges, Techniques & Pitfalls
• Where not to use GPUs ?
• Resources
• The Future

What is a GPU

• Graphics Programming Unit
– Coined in 1999 by NVidia
– Specialized add‐on board
• Accelerates interactive 3D rendering
– 60 image updates (or more) on large data
– Solves embarrassingly parallel problem
– Game driven volume economics
• NVidia v/s ATI, just like Intel v/s AMD
• Demand for better effects led to
– programmable GPUs
– floating point capabilities
– this led to General Purpose GPU(GPGPU) Computation

History of GPUs : a GPGPU Perspective
Date Product Trans Cores Flops Technology

1997 RIVA 128 3 M Rasterization
1999 GeForce 256 25 M Transform & Lighting
2001 GeForce 3 60 M Programmable shaders
2002 GeForce FX 125 M 16, 32 bit FP, long shaders
2004 GeForce 6800 222 M Infinite length shaders, branching
2006 GeForce 8800 681 M 128 Unified graphics & compute, CUDA,
64 bit FP
2008 GeForce GTX 1.4 B 240 933 G IEEE FP, CUDA C, OpenCL and
280 78 M DirectCompute, PCI‐express Gen 2
2009 Tesla M2050 3.0 B 512 1.03 T Improved 64 bit perf, caching, ECC
515 G memory, 64‐bit unified addressing,
asynchronous bidirectional data
transfer, multiple kernels
Source : Nickolls J. , Dally W.J. “The GPU Computing Era”, IEEE Micro, March-April 2010

The GPU Advantage

30x CPU FLOPS on Latest GPUs 10x Memory Bandwidth

Add to these a
3x Performance/$

Energy Efficient : 5x Performance/Watt
All Graphs From: GPU4Vision : http://gpu4vision.icg.tugrz.at/

People use GPUs for…

Source : Nickolls J. , Dally W.J. “The GPU Computing Era”, IEEE Micro, March-April 2010

More “why to use GPUs”

• Proliferation of GPUs
– Mobile devices will have capable GPUs soon !
• Make more things possible
– Make things real‐time
• From seconds to real‐time interactive performance
– Reduce offline processing overhead
• Research Opportunities
– New & efficient algorithms
– Pairing Multi‐core CPUs and massively multi‐threaded
GPUs

GPU Computing 1‐2‐3

A GPU isn’t a CPU replacement!


There ain’t no such thing as a FREE Lunch!


You don’t always “port” a CPU algorithm to a GPU!

CPU versus GPU

• CPU
– Optimized for latency
– Speedup techniques
• Vectorization (MMX, SSE, …)
• Coarse Grained Parallelism using multiple CPUs and cores
– Memory approaching a TB
• GPU
– Optimized for throughput
– Speedup techniques
• Massive multithreading
• Fine grained parallelism
– A few GBs of memory max

Getting Started

• Software
– CUDA (NVidia specific)
– OpenCL (Cross‐platform, GPU/CPU)
– DirectCompute (MS specific)
• Hardware
– A system equipped with GPU
• OS no bar
– But Windows, RedHat Enterprise Linux seem better
supported

CUDA
• Compute Unified Device
Architecture
• Most popular GPGPU toolkit
• CUDA C extends C with
constructs
– Easy to write programs
• Lower level “driver” API is
available
Source: NVIDIA CUDA Architecture, Introduction and Overview
– Provides more control
– Use multiple GPUs in the same
application
– Mix graphics & compute code
• Language bindings available
– PyCUDA, Java, .NET
• Toolkit provides conveniences

CUDA Toolkit

CUDA Architecture
• 1 more streaming
multiprocessors (“cores”)
• Thread Blocks
– Single Instruction, Multiple
Thread (SIMT)
– Hide latency by parallelism
• Memory Hierarchy
– Fermi GPUs can access
system memory
• Primitives for
– Thread synchronization
– Atomic Operations on
memory

Source : The GPU Computing Era

Simple Example : Vector Addition
C/C++ ‐ serial code
void VecAdd(const float *A, const float*B, float *C, int N) {
for(unsigned int i=0;i<N;i++)
C[i]=A[i]+B[i];
}
VecAdd(A,B,C,N);

C/C++ with OpenMP – thread level parallelism
void VecAdd(const float *A, const float*B, float *C, int N) {
#pragma omp for
for(unsigned int i=0;i<N;i++)
C[i]=A[i]+B[i];
}
VecAdd(A,B,C,N);

Vector Addition using CUDA
CUDA C – element level parallelism
__global__ void VecAdd(const float *A, const float*B, float *C, int N) {
int I = blockDim.x * blockIdx.x + threadIdx.x;
if(i<N)
C[i]=A[i]+B[i];
}

Invoking the function
cudaMalloc((void**)&d_A, size);
Allocate Memory on GPU
cudaMalloc((void**)&d_B, size);
cudaMalloc((void**)&d_C, size);
cudaMemcpy(d_A, A, size, cudaMemcpyHostToDevice); Copy Arrays to GPU
cudaMemcpy(d_B, B, size, cudaMemcpyHostToDevice);
int threadsPerBlock = 256;
int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock; Invoke function
VecAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, N);
cudaMemcpy(C, d_C, size, cudaMemcpyDeviceToHost);
Copy Result Back to Main Memory
cudaFree(d_A);
cudaFree(d_B);
Free GPU Memory
cudaFree(d_C);

Compilation
# nvcc vectorAdd.cu –I ../../common/inc

GPU Programming Challenges

• Need high “occupancy” for best performance
• Extracting parallelism with limited resources
– Limited Registers
– Limited Shared Memory
• Preferred Approach
– Small Kernels
– Multiple Passes if needed
• Decompose Problem into Parallel Pieces
– Write once, scale perform everywhere!

GPU Programming

• Use Shared Memory when possible
– Cooperation between threads in a block
– Reduce access to global memory
• Reduce Data Transfer over the Bus
• It’s still a GPU !
– use textures to your advantage
– use vector data types if you can
• Watch out for GPU capability differences!

Enough Theory!

Demo Time
&
Let’s do some programming 

Watch out for

• Portability of programs across GPUs
– Capabilities vary from GPU to GPU
– Memory usage
• Arithmetic differences in the result
• Pay careful attention to demos…

Resources

• CUDA
– Tools on NVIDIA Developer Site
http://developer.nvidia.com/object/gpucomputing.html
– CUDPP
http://code.google.com/p/cudpp/
• OpenCL
• Google Search !

The Future

• Better throughput
– More GPU cores, scaling by Moore’s law
– PCIe Gen 3
• Easier to program
• Arbitrary control and data access patterns

Questions ?

shree.shree@gmail.com

Computing using GPUs

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (18)

Similar to Computing using GPUs

Similar to Computing using GPUs (20)

Recently uploaded

Recently uploaded (20)

Computing using GPUs