2.
Outline ● Introduction to GPU Computing – Past: Graphics Processing and GPGPU – Present: CUDA and OpenCL – A bit on the architecture ● Why GPU? ● GPU v.s. Multi-core and Distributed ● Open problems. ● Where does this go?19-Jan-2011 Computing Students talk 2
3.
Introduction to GPU Computing ● Who have access to 1,000 processors?19-Jan-2011 Computing Students talk 3
4.
Introduction to GPU Computing ● Who have access to 1,000 processors?19-Jan-2011 Computing Students talk 4
5.
Introduction to GPU Computing ● Who have access to 1,000 processors? YOU19-Jan-2011 Computing Students talk 5
6.
Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit19-Jan-2011 Computing Students talk 6
7.
Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit19-Jan-2011 Computing Students talk 7
8.
Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit19-Jan-2011 Computing Students talk 8
9.
Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit19-Jan-2011 Computing Students talk 9
10.
Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit19-Jan-2011 Computing Students talk 10
11.
Introduction to GPU Computing ● In the past – GPGPU = General Purpose computation using GPUs19-Jan-2011 Computing Students talk 11
12.
Introduction to GPU Computing ● Now al Gener – GPU = Graphics Processing Unit __device__ float3 collideCell(int3 gridPos, uint index... { uint gridHash = calcGridHash(gridPos); ... for(uint j=startIndex; j<endIndex; j++) { if (j != index) { ... force += collideSpheres(...); } } return force; }19-Jan-2011 Computing Students talk 12
13.
Introduction to GPU Computing ● Now – We have CUDA (NVIDIA, proprietary) and OpenCL (open standard) __device__ float3 collideCell(int3 gridPos, uint index... { uint gridHash = calcGridHash(gridPos); ... for(uint j=startIndex; j<endIndex; j++) { if (j != index) { ... force += collideSpheres(...); } } return force; }19-Jan-2011 Computing Students talk 13
14.
Introduction to GPU Computing ● A (just a little) bit on the architecture of the latest NVIDIA GPU (Fermi) – Very simple core (even simpler than the Intel Atom) – Little cache19-Jan-2011 Computing Students talk 14
15.
Why GPU?19-Jan-2011 Computing Students talk 15
16.
Why GPU? ● Performance19-Jan-2011 Computing Students talk 16
17.
Why GPU? ● People have used it, and it works. – Bio-Informatics – Finance – Fluid Dynamics – Data-mining – Computer Vision – Medical Imaging – Numerical Analytics19-Jan-2011 Computing Students talk 17
18.
Why GPU? ● A new, promising area – Fast growing – Ubiquitous – New paradigm → new problems, new challenges19-Jan-2011 Computing Students talk 18
19.
GPU v.s. Multi-core ● A lot more threads of computation are required: – The GPU has a lot more “core” than a multi-core CPU. – A GPU core is no where as powerful as a CPU core.19-Jan-2011 Computing Students talk 19
20.
GPU v.s. Multi-core ● Challenges: – Not all problems can easily be broken into many small sub- problems to be solved in parallel. – Race conditions are much more serious. – Atomic operations are still doable, locking is a performance killer. Lock-free algorithms are much preferable. – Memory access bottleneck (memory is not that parallel) – Debugging is a nightmare.19-Jan-2011 Computing Students talk 20
21.
GPU v.s. Distributed ● GPU allows much cheaper communication between different threads. ● GPU memory is still limited compared to a distributed system. ● GPU cores are not completely independent processors – Need fine-grain parallelism – Reaching the scalability of a distributed system is difficult.19-Jan-2011 Computing Students talk 21
22.
Open problems ● Data-structures ● Algorithms ● Tools ● Theory19-Jan-2011 Computing Students talk 22
23.
Open problems ● Data-structures – Requirement: Able to handle very high level of concurrent access. – Common data-structures like dynamic arrays, priority queues or hash tables are not very suitable for the GPU. – Some existing works: kD-tree, quad-tree, read-only hash table...19-Jan-2011 Computing Students talk 23
24.
Open problems ● Algorithms – Most sequential algorithms need serious re-design to make good use of such a huge number of cores. ● Our computational geometry research: use the discrete space computation to approximate the continuous space result. – Traditional parallel algorithms may or may not work. ● Usual assumption: infinite number of processors ● No serious study on this so far!19-Jan-2011 Computing Students talk 24
25.
Open problems ● Tools – Programming language: Better language or model to express parallel algorithms? – Compiler: Optimize GPU code? Auto-parallelization? ● Theres some work on OpenMP to CUDA. – Debugging tool? Maybe a whole new “art of debugging” is needed. – Software engineering is currently far behind the hardware development.19-Jan-2011 Computing Students talk 25
26.
Open problems ● Theory – Some traditional approach: ● PRAM: CRCW, EREW. Too general. ● SIMD: Too restricted. – Big Oh analysis may not be good enough. ● Time complexity is relevant, but work complexity is more important. ● Most GPU computing works only talk about actual running time. – Performance Modeling for GPU, anyone?19-Jan-2011 Computing Students talk 26
27.
Where does this go? ● Intel/AMD already have 6 core 12 threads processors (maybe more). ● SeaMicro has a server with 512 Atom dual-core processors. ● AMD Fusion: CPU + GPU. ● The GPU may not stay forever, but massively-multithreaded is definitely the future of computing.19-Jan-2011 Computing Students talk 27
28.
Where to start? ● Check your PC. – If its not at the age of being able to go to a Primary school, theres a high chance it has a GPU. ● Go to NVIDIA/ATI website, download some development toolkit, and youre ready to go.19-Jan-2011 Computing Students talk 28
29.
THANK YOU ● Any questions? Just ask. ● Any suggestion? What are you waiting for. ● Any problem or solution to discuss? Lets have a private talk somewhere (j/k)19-Jan-2011 Computing Students talk 29
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Be the first to comment