GPU Computing
  Motivation
Computing Challenge



                                graphic




        Task Computing      Data Computing



© NVIDIA Corporation 2007
Extreme Growth in Raw Data
                  YouTube Bandwidth Growth                                                 Walmart Transaction Tracking
      Millions




                                                                       Millions
                                        Source: Alexa, YouTube 2006                                                            Source: Hedburg, CPI, Walmart



                   BP Oil and Gas Active Data                                                        NOAA Weather Data
                                                                                                           NOAA NASA Weather Data in Petabytes
                                                                                   90
                                                                                   80
                                                                                   70
Terabytes




                                                                                   60
                                                                       Petabytes
                                                                                   50
                                                                                   40
                                                                                   30
                                                                                   20
                                                                                   10
                                                                                   0
                                                                                        2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
                                 Source: Jim Farnsworth, BP May 2005
     © NVIDIA Corporation 2007                                                                                       Source: John Bates, NOAA Nat. Climate Center
Computational Horsepower


         GPU is a massively parallel computation engine
                   High memory bandwidth (5-10x CPU)
                   High floating-point performance (5-10x CPU)




© NVIDIA Corporation 2007
Benchmarking: CPU vs. GPU Computing




G80 vs. Core2 Duo 2.66 GHz
Measured against commercial CPU benchmarks when possible

    © NVIDIA Corporation 2007
“Free” Massively Parallel Processors




   It’s not science fiction, it’s just funded by them
                   Asst Master Chief Harvard
Success
Stories
Success Stories: Data to Design
       Acceleware EM Field simulation technology for the GPU
                3D Finite-Difference and Finite-Element (FDTD)
                Modeling of:
                          Cell phone irradiation
                          MRI Design / Modeling
                          Printed Circuit Boards
                          Radar Cross Section (Military)

              700                                     20X
              600

              500

              400
Performance
 (Mcells/s)                                10X
                                                            Pacemaker with Transmit Antenna
              300

              200
                                   5X
              100
                         1X
               0
                    CPU           1 GPU   2 GPUs   4 GPUs
                  3.2 GHz
      © NVIDIA Corporation 2007
EvolvedMachines
130X Speed up
Simulate brain circuitry
Sensory computing: vision, olfactory




                            EvolvedMachines

© NVIDIA Corporation 2007
Matlab: Language of Science

10X with MATLAB CPU+GPU




    Pseudo-spectral simulation of 2D Isotropic turbulence
      http://developer.nvidia.com/object/matlab_cuda.html
    http://www.amath.washington.edu/courses/571-winter-2006/matlab/FS_2Dturb.m
 © NVIDIA Corporation 2007
MATLAB Example:
Advection of an elliptic vortex
256x256 mesh, 512 RK4 steps, Linux, MATLAB file
http://www.amath.washington.edu/courses/571-winter-2006/matlab/FS_vortex.m



                                                        Matlab
                                                        168 seconds




                                                        Matlab with CUDA
                                                        (single precision FFTs)
                                                        20 seconds


 © NVIDIA Corporation 2007
MATLAB Example:
Pseudo-spectral simulation of 2D Isotropic turbulence

 512x512 mesh, 400 RK4 steps, Windows XP, MATLAB file
 http://www.amath.washington.edu/courses/571-winter-2006/matlab/FS_2Dturb.m



                                                         MATLAB
                                                         992 seconds




                                                         MATLAB with CUDA
                                                         (single precision FFTs)
                                                         93 seconds


 © NVIDIA Corporation 2007
NAMD/VMD Molecular Dynamics

 240X speedup
 Computational biology




© NVIDIA Corporation 2007   http://www.ks.uiuc.edu/Research/vmd/projects/ece498/lecture/
Molecular Dynamics Example


         Case study: molecular dynamics research
         at U. Illinois Urbana-Champaign
               (Scientist-sponsored) course project for CS 498AL:
               Programming Massively Parallel Multiprocessors (Kirk/Hwu)
               Next slides stolen from a nice description of problem,
               algorithms, and iterative optimization process available at:
           http://www.ks.uiuc.edu/Research/vmd/projects/ece498/lecture/




© NVIDIA Corporation 2007
© NVIDIA Corporation 2007
Molecular Modeling: Ion Placement

         Biomolecular simulations
         attempt to replicate in vivo
         conditions in silico.
         Model structures are
         initially constructed in
         vacuum
         Solvent (water) and ions are
         added as necessary for the
         required biological
         conditions
         Computational
         requirements scale with the
         size of the simulated
         structure


© NVIDIA Corporation 2007
Evolution of Ion Placement Code
             First implementation was sequential
             Virus structure with 10^6 atoms would require 10
             CPU days
             Tuned for Intel C/C++ vectorization+SSE, ~20x
             speedup
             Parallelized /w pthreads: high data parallelism =
             linear speedup
             Parallelized GPU accelerated implementation: 3
             GeForce 8800GTX cards outrun ~300 Itanium2
             CPUs!
             Virus structure now runs in 25 seconds on 3 GPUs!
             Further speedups should still be possible…


© NVIDIA Corporation 2007
Multi-GPU CUDA
Coulombic Potential Map Performance



         Host: Intel Core 2 Quad,
         8GB RAM, ~$3,000
         3 GPUs: NVIDIA GeForce
         8800GTX, ~$550 each
         32-bit RHEL4 Linux
         (want 64-bit CUDA!!)
         235 GFLOPS per GPU for
         current version of
         coulombic potential map
         kernel
         705 GFLOPS total for
         multithreaded multi-GPU
         version                    Three GeForce 8800GTX GPUs
                                    in a single machine, cost ~$4,650

© NVIDIA Corporation 2007
Professor
Partnership
NVIDIA Professor Partnership

         Support faculty research & teaching efforts
                   Small equipment gifts (1-2 GPUs)
                   Significant discounts on GPU purchases           Easy
                            Especially Quadro, Tesla equipment
                            Useful for cost matching
                   Research contracts
                   Small cash grants (typically ~$25K gifts)
                                                                    Competitive
                   Medium-scale equipment donations
                   (10-30 GPUs)
         Informal proposals, reviewed quarterly
                   Focus areas: GPU computing, especially with an
                   educational mission or component

           http://www.nvidia.com/page/professor_partnership.html
© NVIDIA Corporation 2007

Example Application of GPU

  • 1.
    GPU Computing Motivation
  • 2.
    Computing Challenge graphic Task Computing Data Computing © NVIDIA Corporation 2007
  • 3.
    Extreme Growth inRaw Data YouTube Bandwidth Growth Walmart Transaction Tracking Millions Millions Source: Alexa, YouTube 2006 Source: Hedburg, CPI, Walmart BP Oil and Gas Active Data NOAA Weather Data NOAA NASA Weather Data in Petabytes 90 80 70 Terabytes 60 Petabytes 50 40 30 20 10 0 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 Source: Jim Farnsworth, BP May 2005 © NVIDIA Corporation 2007 Source: John Bates, NOAA Nat. Climate Center
  • 4.
    Computational Horsepower GPU is a massively parallel computation engine High memory bandwidth (5-10x CPU) High floating-point performance (5-10x CPU) © NVIDIA Corporation 2007
  • 5.
    Benchmarking: CPU vs.GPU Computing G80 vs. Core2 Duo 2.66 GHz Measured against commercial CPU benchmarks when possible © NVIDIA Corporation 2007
  • 6.
    “Free” Massively ParallelProcessors It’s not science fiction, it’s just funded by them Asst Master Chief Harvard
  • 7.
  • 8.
    Success Stories: Datato Design Acceleware EM Field simulation technology for the GPU 3D Finite-Difference and Finite-Element (FDTD) Modeling of: Cell phone irradiation MRI Design / Modeling Printed Circuit Boards Radar Cross Section (Military) 700 20X 600 500 400 Performance (Mcells/s) 10X Pacemaker with Transmit Antenna 300 200 5X 100 1X 0 CPU 1 GPU 2 GPUs 4 GPUs 3.2 GHz © NVIDIA Corporation 2007
  • 9.
    EvolvedMachines 130X Speed up Simulatebrain circuitry Sensory computing: vision, olfactory EvolvedMachines © NVIDIA Corporation 2007
  • 10.
    Matlab: Language ofScience 10X with MATLAB CPU+GPU Pseudo-spectral simulation of 2D Isotropic turbulence http://developer.nvidia.com/object/matlab_cuda.html http://www.amath.washington.edu/courses/571-winter-2006/matlab/FS_2Dturb.m © NVIDIA Corporation 2007
  • 11.
    MATLAB Example: Advection ofan elliptic vortex 256x256 mesh, 512 RK4 steps, Linux, MATLAB file http://www.amath.washington.edu/courses/571-winter-2006/matlab/FS_vortex.m Matlab 168 seconds Matlab with CUDA (single precision FFTs) 20 seconds © NVIDIA Corporation 2007
  • 12.
    MATLAB Example: Pseudo-spectral simulationof 2D Isotropic turbulence 512x512 mesh, 400 RK4 steps, Windows XP, MATLAB file http://www.amath.washington.edu/courses/571-winter-2006/matlab/FS_2Dturb.m MATLAB 992 seconds MATLAB with CUDA (single precision FFTs) 93 seconds © NVIDIA Corporation 2007
  • 13.
    NAMD/VMD Molecular Dynamics 240X speedup Computational biology © NVIDIA Corporation 2007 http://www.ks.uiuc.edu/Research/vmd/projects/ece498/lecture/
  • 14.
    Molecular Dynamics Example Case study: molecular dynamics research at U. Illinois Urbana-Champaign (Scientist-sponsored) course project for CS 498AL: Programming Massively Parallel Multiprocessors (Kirk/Hwu) Next slides stolen from a nice description of problem, algorithms, and iterative optimization process available at: http://www.ks.uiuc.edu/Research/vmd/projects/ece498/lecture/ © NVIDIA Corporation 2007
  • 15.
  • 16.
    Molecular Modeling: IonPlacement Biomolecular simulations attempt to replicate in vivo conditions in silico. Model structures are initially constructed in vacuum Solvent (water) and ions are added as necessary for the required biological conditions Computational requirements scale with the size of the simulated structure © NVIDIA Corporation 2007
  • 17.
    Evolution of IonPlacement Code First implementation was sequential Virus structure with 10^6 atoms would require 10 CPU days Tuned for Intel C/C++ vectorization+SSE, ~20x speedup Parallelized /w pthreads: high data parallelism = linear speedup Parallelized GPU accelerated implementation: 3 GeForce 8800GTX cards outrun ~300 Itanium2 CPUs! Virus structure now runs in 25 seconds on 3 GPUs! Further speedups should still be possible… © NVIDIA Corporation 2007
  • 18.
    Multi-GPU CUDA Coulombic PotentialMap Performance Host: Intel Core 2 Quad, 8GB RAM, ~$3,000 3 GPUs: NVIDIA GeForce 8800GTX, ~$550 each 32-bit RHEL4 Linux (want 64-bit CUDA!!) 235 GFLOPS per GPU for current version of coulombic potential map kernel 705 GFLOPS total for multithreaded multi-GPU version Three GeForce 8800GTX GPUs in a single machine, cost ~$4,650 © NVIDIA Corporation 2007
  • 19.
  • 20.
    NVIDIA Professor Partnership Support faculty research & teaching efforts Small equipment gifts (1-2 GPUs) Significant discounts on GPU purchases Easy Especially Quadro, Tesla equipment Useful for cost matching Research contracts Small cash grants (typically ~$25K gifts) Competitive Medium-scale equipment donations (10-30 GPUs) Informal proposals, reviewed quarterly Focus areas: GPU computing, especially with an educational mission or component http://www.nvidia.com/page/professor_partnership.html © NVIDIA Corporation 2007