Research in GPU Computing




                     Cao Thanh Tung
Outline

 ●   Introduction to GPU Computing
              –   Past:      Graphics Processing and GPGPU
              –   Present:   CUDA and OpenCL
              –   A bit on the architecture
 ●   Why GPU?
 ●   GPU v.s. Multi-core and Distributed
 ●   Open problems.
 ●   Where does this go?

19-Jan-2011                            Computing Students talk   2
Introduction to GPU Computing

 ●   Who have access to 1,000 processors?




19-Jan-2011              Computing Students talk   3
Introduction to GPU Computing

 ●   Who have access to 1,000 processors?




19-Jan-2011              Computing Students talk   4
Introduction to GPU Computing

 ●   Who have access to 1,000 processors?

                                                   YOU




19-Jan-2011              Computing Students talk         5
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   6
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   7
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   8
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   9
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   10
Introduction to GPU Computing

 ●   In the past
              –   GPGPU = General Purpose computation using GPUs




19-Jan-2011                        Computing Students talk         11
Introduction to GPU Computing

 ●   Now                          al
                            Gener
               –   GPU = Graphics Processing Unit

              __device__ float3 collideCell(int3 gridPos, uint index...
              {
                  uint gridHash = calcGridHash(gridPos);
                  ...
                  for(uint j=startIndex; j<endIndex; j++) {
                      if (j != index) {
                          ...
                          force += collideSpheres(...);
                      }
                  }
                  return force;
              }

19-Jan-2011                          Computing Students talk              12
Introduction to GPU Computing

 ●   Now
               –   We have CUDA (NVIDIA, proprietary) and OpenCL (open standard)

              __device__ float3 collideCell(int3 gridPos, uint index...
              {
                  uint gridHash = calcGridHash(gridPos);
                  ...
                  for(uint j=startIndex; j<endIndex; j++) {
                      if (j != index) {
                          ...
                          force += collideSpheres(...);
                      }
                  }
                  return force;
              }

19-Jan-2011                         Computing Students talk                    13
Introduction to GPU Computing

 ●   A (just a little) bit on the
     architecture of the latest
     NVIDIA GPU (Fermi)
       –   Very simple core (even simpler
             than the Intel Atom)
       –   Little cache




19-Jan-2011                       Computing Students talk   14
Why GPU?




19-Jan-2011    Computing Students talk   15
Why GPU?

 ●   Performance




19-Jan-2011         Computing Students talk   16
Why GPU?

 ●   People have used it, and it works.
              –   Bio-Informatics
              –   Finance
              –   Fluid Dynamics
              –   Data-mining
              –   Computer Vision
              –   Medical Imaging
              –   Numerical Analytics



19-Jan-2011                          Computing Students talk   17
Why GPU?

 ●   A new, promising area
              –   Fast growing
              –   Ubiquitous
              –   New paradigm → new problems, new challenges




19-Jan-2011                        Computing Students talk      18
GPU v.s. Multi-core

 ●   A lot more threads of computation are required:
              –   The GPU has a lot more “core” than a multi-core CPU.
              –   A GPU core is no where as powerful as a CPU core.




19-Jan-2011                         Computing Students talk              19
GPU v.s. Multi-core

 ●   Challenges:
              –   Not all problems can easily be broken into many small sub-
                    problems to be solved in parallel.
              –   Race conditions are much more serious.
              –   Atomic operations are still doable, locking is a performance killer.
                    Lock-free algorithms are much preferable.
              –   Memory access bottleneck (memory is not that parallel)
              –   Debugging is a nightmare.




19-Jan-2011                           Computing Students talk                            20
GPU v.s. Distributed

 ●   GPU allows much cheaper communication between
     different threads.
 ●   GPU memory is still limited compared to a distributed
     system.
 ●   GPU cores are not completely independent processors
              –   Need fine-grain parallelism
              –   Reaching the scalability of a distributed system is difficult.




19-Jan-2011                           Computing Students talk                      21
Open problems

 ●   Data-structures
 ●   Algorithms
 ●   Tools
 ●   Theory




19-Jan-2011               Computing Students talk   22
Open problems

 ●   Data-structures
              –   Requirement: Able to handle very high level of concurrent access.
              –   Common data-structures like dynamic arrays, priority queues or
                    hash tables are not very suitable for the GPU.
              –   Some existing works: kD-tree, quad-tree, read-only hash table...




19-Jan-2011                          Computing Students talk                          23
Open problems

 ●   Algorithms
              –   Most sequential algorithms need serious re-design to make good
                   use of such a huge number of cores.
                        ●   Our computational geometry research: use the discrete
                             space computation to approximate the continuous space
                             result.
              –   Traditional parallel algorithms may or may not work.
                        ●   Usual assumption: infinite number of processors
                        ●   No serious study on this so far!



19-Jan-2011                            Computing Students talk                     24
Open problems

 ●   Tools
              –   Programming language: Better language or model to express
                    parallel algorithms?
              –   Compiler: Optimize GPU code? Auto-parallelization?
                        ●   There's some work on OpenMP to CUDA.
              –   Debugging tool? Maybe a whole new “art of debugging” is needed.


              –   Software engineering is currently far behind the hardware
                    development.


19-Jan-2011                          Computing Students talk                   25
Open problems

 ●   Theory
              –   Some traditional approach:
                        ●   PRAM: CRCW, EREW. Too general.
                        ●   SIMD: Too restricted.
              –   Big Oh analysis may not be good enough.
                        ●   Time complexity is relevant, but work complexity is more
                              important.
                        ●   Most GPU computing works only talk about actual running
                             time.
              –   Performance Modeling for GPU, anyone?

19-Jan-2011                            Computing Students talk                         26
Where does this go?

 ●   Intel/AMD already have 6 core 12 threads processors
     (maybe more).
 ●   SeaMicro has a server with 512 Atom dual-core processors.
 ●   AMD Fusion: CPU + GPU.


 ●   The GPU may not stay forever, but massively-multithreaded
     is definitely the future of computing.


19-Jan-2011               Computing Students talk            27
Where to start?

 ●   Check your PC.
              –   If it's not at the age of being able to go to a Primary school, there's
                      a high chance it has a GPU.
 ●   Go to NVIDIA/ATI website, download some development
     toolkit, and you're ready to go.




19-Jan-2011                           Computing Students talk                           28
THANK YOU

 ●   Any questions? Just ask.
 ●   Any suggestion? What are you waiting for.
 ●   Any problem or solution to discuss? Let's have a private talk
     somewhere (j/k)




19-Jan-2011                Computing Students talk              29

CSTalks - GPGPU - 19 Jan

  • 1.
    Research in GPUComputing Cao Thanh Tung
  • 2.
    Outline ● Introduction to GPU Computing – Past: Graphics Processing and GPGPU – Present: CUDA and OpenCL – A bit on the architecture ● Why GPU? ● GPU v.s. Multi-core and Distributed ● Open problems. ● Where does this go? 19-Jan-2011 Computing Students talk 2
  • 3.
    Introduction to GPUComputing ● Who have access to 1,000 processors? 19-Jan-2011 Computing Students talk 3
  • 4.
    Introduction to GPUComputing ● Who have access to 1,000 processors? 19-Jan-2011 Computing Students talk 4
  • 5.
    Introduction to GPUComputing ● Who have access to 1,000 processors? YOU 19-Jan-2011 Computing Students talk 5
  • 6.
    Introduction to GPUComputing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 6
  • 7.
    Introduction to GPUComputing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 7
  • 8.
    Introduction to GPUComputing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 8
  • 9.
    Introduction to GPUComputing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 9
  • 10.
    Introduction to GPUComputing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 10
  • 11.
    Introduction to GPUComputing ● In the past – GPGPU = General Purpose computation using GPUs 19-Jan-2011 Computing Students talk 11
  • 12.
    Introduction to GPUComputing ● Now al Gener – GPU = Graphics Processing Unit __device__ float3 collideCell(int3 gridPos, uint index... { uint gridHash = calcGridHash(gridPos); ... for(uint j=startIndex; j<endIndex; j++) { if (j != index) { ... force += collideSpheres(...); } } return force; } 19-Jan-2011 Computing Students talk 12
  • 13.
    Introduction to GPUComputing ● Now – We have CUDA (NVIDIA, proprietary) and OpenCL (open standard) __device__ float3 collideCell(int3 gridPos, uint index... { uint gridHash = calcGridHash(gridPos); ... for(uint j=startIndex; j<endIndex; j++) { if (j != index) { ... force += collideSpheres(...); } } return force; } 19-Jan-2011 Computing Students talk 13
  • 14.
    Introduction to GPUComputing ● A (just a little) bit on the architecture of the latest NVIDIA GPU (Fermi) – Very simple core (even simpler than the Intel Atom) – Little cache 19-Jan-2011 Computing Students talk 14
  • 15.
    Why GPU? 19-Jan-2011 Computing Students talk 15
  • 16.
    Why GPU? ● Performance 19-Jan-2011 Computing Students talk 16
  • 17.
    Why GPU? ● People have used it, and it works. – Bio-Informatics – Finance – Fluid Dynamics – Data-mining – Computer Vision – Medical Imaging – Numerical Analytics 19-Jan-2011 Computing Students talk 17
  • 18.
    Why GPU? ● A new, promising area – Fast growing – Ubiquitous – New paradigm → new problems, new challenges 19-Jan-2011 Computing Students talk 18
  • 19.
    GPU v.s. Multi-core ● A lot more threads of computation are required: – The GPU has a lot more “core” than a multi-core CPU. – A GPU core is no where as powerful as a CPU core. 19-Jan-2011 Computing Students talk 19
  • 20.
    GPU v.s. Multi-core ● Challenges: – Not all problems can easily be broken into many small sub- problems to be solved in parallel. – Race conditions are much more serious. – Atomic operations are still doable, locking is a performance killer. Lock-free algorithms are much preferable. – Memory access bottleneck (memory is not that parallel) – Debugging is a nightmare. 19-Jan-2011 Computing Students talk 20
  • 21.
    GPU v.s. Distributed ● GPU allows much cheaper communication between different threads. ● GPU memory is still limited compared to a distributed system. ● GPU cores are not completely independent processors – Need fine-grain parallelism – Reaching the scalability of a distributed system is difficult. 19-Jan-2011 Computing Students talk 21
  • 22.
    Open problems ● Data-structures ● Algorithms ● Tools ● Theory 19-Jan-2011 Computing Students talk 22
  • 23.
    Open problems ● Data-structures – Requirement: Able to handle very high level of concurrent access. – Common data-structures like dynamic arrays, priority queues or hash tables are not very suitable for the GPU. – Some existing works: kD-tree, quad-tree, read-only hash table... 19-Jan-2011 Computing Students talk 23
  • 24.
    Open problems ● Algorithms – Most sequential algorithms need serious re-design to make good use of such a huge number of cores. ● Our computational geometry research: use the discrete space computation to approximate the continuous space result. – Traditional parallel algorithms may or may not work. ● Usual assumption: infinite number of processors ● No serious study on this so far! 19-Jan-2011 Computing Students talk 24
  • 25.
    Open problems ● Tools – Programming language: Better language or model to express parallel algorithms? – Compiler: Optimize GPU code? Auto-parallelization? ● There's some work on OpenMP to CUDA. – Debugging tool? Maybe a whole new “art of debugging” is needed. – Software engineering is currently far behind the hardware development. 19-Jan-2011 Computing Students talk 25
  • 26.
    Open problems ● Theory – Some traditional approach: ● PRAM: CRCW, EREW. Too general. ● SIMD: Too restricted. – Big Oh analysis may not be good enough. ● Time complexity is relevant, but work complexity is more important. ● Most GPU computing works only talk about actual running time. – Performance Modeling for GPU, anyone? 19-Jan-2011 Computing Students talk 26
  • 27.
    Where does thisgo? ● Intel/AMD already have 6 core 12 threads processors (maybe more). ● SeaMicro has a server with 512 Atom dual-core processors. ● AMD Fusion: CPU + GPU. ● The GPU may not stay forever, but massively-multithreaded is definitely the future of computing. 19-Jan-2011 Computing Students talk 27
  • 28.
    Where to start? ● Check your PC. – If it's not at the age of being able to go to a Primary school, there's a high chance it has a GPU. ● Go to NVIDIA/ATI website, download some development toolkit, and you're ready to go. 19-Jan-2011 Computing Students talk 28
  • 29.
    THANK YOU ● Any questions? Just ask. ● Any suggestion? What are you waiting for. ● Any problem or solution to discuss? Let's have a private talk somewhere (j/k) 19-Jan-2011 Computing Students talk 29