CSTalks - GPGPU - 19 Jan

Research in GPU Computing

Cao Thanh Tung

Outline

● Introduction to GPU Computing
– Past: Graphics Processing and GPGPU
– Present: CUDA and OpenCL
– A bit on the architecture
● Why GPU?
● GPU v.s. Multi-core and Distributed
● Open problems.
● Where does this go?

19-Jan-2011 Computing Students talk 2

Introduction to GPU Computing

● Who have access to 1,000 processors?




YOU



● In the past
– GPU = Graphics Processing Unit



● In the past



● In the past
– GPGPU = General Purpose computation using GPUs



● Now al
Gener

__device__ float3 collideCell(int3 gridPos, uint index...
{
uint gridHash = calcGridHash(gridPos);
...
for(uint j=startIndex; j<endIndex; j++) {
if (j != index) {
...
force += collideSpheres(...);
}
}
return force;
}



● Now
– We have CUDA (NVIDIA, proprietary) and OpenCL (open standard)

__device__ float3 collideCell(int3 gridPos, uint index...
{
uint gridHash = calcGridHash(gridPos);
...
for(uint j=startIndex; j<endIndex; j++) {
if (j != index) {
...
force += collideSpheres(...);
}
}
return force;
}



● A (just a little) bit on the
architecture of the latest
NVIDIA GPU (Fermi)
– Very simple core (even simpler
than the Intel Atom)
– Little cache


Why GPU?


Why GPU?

● Performance


Why GPU?

● People have used it, and it works.
– Bio-Informatics
– Finance
– Fluid Dynamics
– Data-mining
– Computer Vision
– Medical Imaging
– Numerical Analytics


Why GPU?

● A new, promising area
– Fast growing
– Ubiquitous
– New paradigm → new problems, new challenges


GPU v.s. Multi-core

● A lot more threads of computation are required:
– The GPU has a lot more “core” than a multi-core CPU.
– A GPU core is no where as powerful as a CPU core.


GPU v.s. Multi-core

● Challenges:
– Not all problems can easily be broken into many small sub-
problems to be solved in parallel.
– Race conditions are much more serious.
– Atomic operations are still doable, locking is a performance killer.
Lock-free algorithms are much preferable.
– Memory access bottleneck (memory is not that parallel)
– Debugging is a nightmare.


GPU v.s. Distributed

● GPU allows much cheaper communication between
different threads.
● GPU memory is still limited compared to a distributed
system.
● GPU cores are not completely independent processors
– Need fine-grain parallelism
– Reaching the scalability of a distributed system is difficult.


Open problems

● Data-structures
● Algorithms
● Tools
● Theory


Open problems

● Data-structures
– Requirement: Able to handle very high level of concurrent access.
– Common data-structures like dynamic arrays, priority queues or
hash tables are not very suitable for the GPU.
– Some existing works: kD-tree, quad-tree, read-only hash table...


Open problems

● Algorithms
– Most sequential algorithms need serious re-design to make good
use of such a huge number of cores.
● Our computational geometry research: use the discrete
space computation to approximate the continuous space
result.
– Traditional parallel algorithms may or may not work.
● Usual assumption: infinite number of processors
● No serious study on this so far!


Open problems

● Tools
– Programming language: Better language or model to express
parallel algorithms?
– Compiler: Optimize GPU code? Auto-parallelization?
● There's some work on OpenMP to CUDA.
– Debugging tool? Maybe a whole new “art of debugging” is needed.

– Software engineering is currently far behind the hardware
development.


Open problems

● Theory
– Some traditional approach:
● PRAM: CRCW, EREW. Too general.
● SIMD: Too restricted.
– Big Oh analysis may not be good enough.
● Time complexity is relevant, but work complexity is more
important.
● Most GPU computing works only talk about actual running
time.
– Performance Modeling for GPU, anyone?


Where does this go?

● Intel/AMD already have 6 core 12 threads processors
(maybe more).
● SeaMicro has a server with 512 Atom dual-core processors.
● AMD Fusion: CPU + GPU.

● The GPU may not stay forever, but massively-multithreaded
is definitely the future of computing.


Where to start?

● Check your PC.
– If it's not at the age of being able to go to a Primary school, there's
a high chance it has a GPU.
● Go to NVIDIA/ATI website, download some development
toolkit, and you're ready to go.


THANK YOU

● Any questions? Just ask.
● Any suggestion? What are you waiting for.
● Any problem or solution to discuss? Let's have a private talk
somewhere (j/k)


CSTalks - GPGPU - 19 Jan

More Related Content

Viewers also liked

Similar to CSTalks - GPGPU - 19 Jan

More from cstalks

Recently uploaded

CSTalks - GPGPU - 19 Jan