Your SlideShare is downloading. ×
GPU Computing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

GPU Computing

1,801
views

Published on

Published in: Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,801
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
57
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • - GPU-GDRAM ist weiterhin unterteilt, entsprechend der physikalischen Architektur der Verarbeitungseinheit
  • Transcript

    • 1. Parallel Computing on GPUs
      Christian Kehl
      01.01.2011
    • 2. Overview
      Basics of Parallel Computing
      Brief Historyof SIMD vs. MIMD Architectures
      OpenCL
      Common Application Domain
      Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP
      2
    • 3. Basics of Parallel Computing
      Ref.: René Fink, „Untersuchungen zur Parallelverarbeitung mit wissenschaftlich-technischen Berechnungsumgebungen“, Diss Uni Rostock 2007
      3
    • 4. Basics of Parallel Computing
      4
    • 5. Overview
      Basics of Parallel Computing
      Brief Historyof SIMD vs. MIMD Architectures
      OpenCL
      Common Application Domain
      Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP
      5
    • 6. Brief Historyof SIMD vs. MIMD Architectures
      6
    • 7. Brief Historyof SIMD vs. MIMD Architectures
      7
    • 8. Brief Historyof SIMD vs. MIMD Architectures
      8
    • 9. Brief Historyof SIMD vs. MIMD Architectures
      2004– programmable GPU Core via Shader Technology
      2007 – CUDA (Compute Unified Device Architecture) Release 1.0
      December 2008 – First Open Compute Language Spec
      March 2009 – Uniform Shader, first BETA Releases of OpenCL
      August 2009 – Release and Implementation of
      OpenCL 1.0
      9
    • 10. Brief Historyof SIMD vs. MIMD Architectures
      SIMD technologies in GPUs:
      Vector processing (ILLIAC IV)
      mathematical operation units (ILLIAC IV)
      Pipelining (CRAY-1)
      local memory caching (CRAY-1)
      atomic instructions (CRAY-1)
      synchronized instruction execution and memory access (MASPAR)
      10
    • 11. Overview
      Basics of Parallel Computing
      Brief Historyof SIMD vs. MIMD Architectures
      OpenCL
      Common Application Domain
      Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP
      11
    • 12. Platform Model
      OpenCL
      One Host + one or more Compute Devices
      EachCompute Deviceis composed of one or moreCompute Units
      EachCompute Unitis further divided into one or moreProcessing Elements
      12
    • 13. Kernel Execution
      OpenCL
      Total number of work-items = Gx * Gy
      Size of each work-group = Sx * Sy
      Global ID can be computed from work-group ID and local ID
      13
    • 14. Memory Management
      OpenCL
      14
    • 15. Memory Management
      OpenCL
      15
    • 16. Memory Model
      OpenCL
      Address spaces
      Private - private to a work-item
      Local - local to a work-group
      Global - accessible by all work-items in all work-groups
      Constant - read only global space
      16
    • 17. Programming Language
      OpenCL
      Every GPU Computing technology natively written in C/C++ (Host)
      Host-Code Bindings to several other languages are existing (Fortran, Java, C#, Ruby)
      Device Code exclusively written in standard C + Extensions
      17
    • 18. Language Restrictions
      OpenCL
      Pointers to functions not allowed
      Pointers to pointers allowed within a kernel, but not as an argument
      Bit-fields not supported
      Variable-length arrays and structures not supported
      Recursion not supported
      Writes to a pointer of types less than 32-bit not supported
      Double types not supported, but reserved
      3D Image writes not supported
      Some restrictions are addressed through extensions
      18
    • 19. Overview
      Basics of Parallel Computing
      Brief Historyof SIMD vs. MIMD Architectures
      OpenCL
      Common Application Domain
      Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP
      19
    • 20. Common Application Domain
      Multimedia Data and Tasks best-suitedfor SIMD Processing
      Multimedia Data – sequentialBytestreams; each Byte independent
      Image Processing in particularsuitedfor GPUs
      original GPU task: „Compute <several FLOP> forevery Pixel ofthescreen“ ( Computer Graphics)
      same taskforimages, onlyFLOP‘sare different
      20
    • 21. Common Application Domain –
      Image Processing
      possiblefeaturesrealizable on the GPU
      contrast- andluminanceconfiguration
      gammascaling
      (pixel-by-pixel-) histogramscaling
      convolutionfiltering
      edgehighlighting
      negative image / imageinversion

      21
    • 22. Inversion
      Image Processing
      simple example: Inversion
      implementationanduseof a frameworkforswitchingbetween different GPGPU technologies
      creationof a commandqueueforeach GPU
      reading GPU kernel via kernelfile on-the-fly
      creationofbuffersforinputandoutputimage
      memorycopyofinputimagedatato global GPU memory
      setofkernelargumentsandkernelexecution
      memorycopyof GPU outputbufferdatatonewimage
      22
    • 23. Image Processing
      Inversion
      evaluatedandconfirmedminimumspeedup – G80 GPU OpenCL VS. 8-core-CPU OpenMP
      4 : 1
      23
    • 24. GPU Computing
      Case Study: Monte Carlo-Study of a Spring-Mass-System on GPUs
    • 25. Overview
      Basics of Parallel Computing
      Brief Historyof SIMD vs. MIMD Architectures
      OpenCL
      Common Application Domain
      Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP
      25
    • 26. MC Study of a SMS using OpenCL andOpenMP
      Task
      Modelling
      Euler as simple ODE solver
      Existing MIMD Solutions
      An SIMD-Approach
      OpenMP
      Result Plots
      Speed-Up-Study
      ParallizationConclusions
      Resumée
      26
    • 27. Task
      Spring-Mass-System definedby a differential equation
      Behaviorofthesystem must besimulatedovervaryingdampingvalues
      Therefore: numericalsolution in t; tε[0.0 … 2] sec. for a stepsize h=1/1000
      Analysis ofcomputation time andspeed-upfor different computearchitectures
      27
    • 28. Task
      based on Simulation News Europe (SNE) CP2:
      1000 simulationiterationsoversimulationhorizonwithgenerateddampingvalues (Monte-Carlo Study)
      consequtiveaveragingfor s(t)
      tε[0 … 2] sec; h=0.01  200 steps
      28
    • 29. Task
      on presentarchitecturestoolightweighted
      -> Modification:
      5000 iterationswith Monte-Carlo
      h=0.001  2000 steps
      Aimof Analysis: Knowledgeabout spring behaviorfor different dampingvalues (trajectoryarray)
      29
    • 30. Task
      Simple Spring-Mass-System
      d … dampingconstant
      c … spring constant
      Movement equationderivedbyNewton‘s 2ndaxiom
      Modelling needed -> „Massenfreischnitt“
      massismoved
      forcebalancing Equation
      30
    • 31. MC Study of a SMS using OpenCL andOpenMP
      31
      Task
      Modelling
      Euler as simple ODE solver
      Existing MIMD Solutions
      An SIMD-Approach
      OpenMP
      Result Plots
      Speed-Up-Study
      ParallizationConclusions
      Resumée
    • 32. Modelling
      numericalintegrationbased on 2nd order differential equation
      DE order n  n DEs 1st order
      32
    • 33. Modelling
      Transformation bysubstitution
      33
      • randomdampingparameter d forintervallimits [800;1200];
      • 34. 5000 iterations
    • MC Study of a SMS using OpenCL andOpenMP
      34
      Task
      Modelling
      Euler as simple ODE solver
      Existing MIMD Solutions
      An SIMD-Approach
      OpenMP
      Result Plots
      Speed-Up-Study
      ParallizationConclusions
      Resumée
    • 35. Euler as simple ODE solver
      numericalintegrationby explicit Euler method
      35
    • 36. MC Study of a SMS using OpenCL andOpenMP
      36
      Task
      Modelling
      Euler as simple ODE solver
      Existing MIMD Solutions
      An SIMD-Approach
      OpenMP
      Result Plots
      Speed-Up-Study
      ParallizationConclusions
      Resumée
    • 37. existing MIMD Solutions
      37
    • 38. existing MIMD Solutions
      Approach can not beappliedto GPU Architectures
      MIMD-Requirements:
      each PE withowninstructionflow
      each PE canaccess RAM individually
      GPU Architecture -> SIMD
      each PE computesthe same instructionatthe same time
      each PE hastobeatthe same instructionforaccessing RAM
       Therefore: Development SIMD-Approach
      38
    • 39. MC Study of a SMS using OpenCL andOpenMP
      39
      Task
      Modelling
      Euler as simple ODE solver
      Existing MIMD Solutions
      An SIMD-Approach
      OpenMP
      Result Plots
      Speed-Up-Study
      ParallizationConclusions
      Resumée
    • 40. An SIMD Approach
      S.P./R.F.:
      simultaneousexecutionofsequential Simulation withvarying d-Parameter on spatiallydistributedPE‘s
      Averagingdependend on trajectories
      C.K.:
      simultaneouscomputationwith all d-Parameters for time tn, iterative repetitionuntiltend
      Averagingdependend on steps
      40
    • 41. An SIMD-Approach
      41
    • 42. MC Study of a SMS using OpenCL andOpenMP
      42
      Task
      Modelling
      Euler as simple ODE solver
      Existing MIMD Solutions
      An SIMD-Approach
      OpenMP
      Result Plots
      Speed-Up-Study
      ParallizationConclusions
      Resumée
    • 43. OpenMP
      Parallization Technology based on sharedmemoryprinciple
      synchronizationhiddenfordeveloper
      threadmanagementcontrolable
      For System-V-based OS:
      parallizationbyprocessforking
      For Windows-based OS:
      parallizationbyWinThreadcreation (AMD Study/Intel Tech Paper)
      43
    • 44. OpenMP
      in C/C++: pragma-basedpreprocessordirectives
      in C# representedby ParallelLoops
      morethan just parallizing Loops (AMD Tech Report)
      Literature:
      AMD/Intel Tech Papers
      Thomas Rauber, „Parallele Programmierung“
      Barbara Chapman, „UsingOpenMP: Portable Shared Memory Parallel Programming“
      44
    • 45. MC Study of a SMS using OpenCL andOpenMP
      45
      Task
      Modelling
      Euler as simple ODE solver
      Existing MIMD Solutions
      An SIMD-Approach
      OpenMP
      Result Plot
      Speed-Up-Study
      ParallizationConclusions
      Resumée
    • 46. Result Plot
      resultingtrajectoryfor all technologies
      46
    • 47. MC Study of a SMS using OpenCL andOpenMP
      47
      Task
      Modelling
      Euler as simple ODE solver
      Existing MIMD Solutions
      An SIMD-Approach
      OpenMP
      Result Plots
      Speed-Up-Study
      ParallizationConclusions
      Resumée
    • 48. Speed-Up Study
      48
      OpenMP – own Study – Comparison CPU/GPU
      SIMD Single: presented SIMD approach on CPU
      SIMD OpenMP: presented SIMD approachparallized on CPU
      SIMD OpenCL: Controlofnumberofexecutingunits not possible, thereforeonly 1 value
    • 49. Speed-Up Study
      49
      SIMD OpenCL
      SIMD single
      MIMD single
      SIMD OpenMP
      MIMD OpenMP
    • 50. MC Study of a SMS using OpenCL andOpenMP
      50
      Task
      Modelling
      Euler as simple ODE solver
      Existing MIMD Solutions
      An SIMD-Approach
      OpenMP
      Result Plots
      Speed-Up-Study
      ParallizationConclusions
      Resumée
    • 51. ParallizationConclusions
      problemunsuitedfor SIMD parallization
      On-GPU-Reductiontoo time expensive,
      Therefore:
      Euler computation on GPU
      Averagecomputation on CPU
      most time intensive operation: MemCopybetween GPU and Main Memory
      formorecomplexproblems oder different ODE solverprocedurespeed-upbehaviorcanchange
      51
    • 52. ParallizationConclusion
      MIMD-Approach S.P./R.F. efficientfor SNE CP2
      OpenMPrealizationfor MIMD- and SIMD-Approach possible (anddone)
      OpenMP MIMD realizationalmost linear speedup
      moreset Threads than PEs physicallyavailableleadstosignificant Thread-Overhead
      OpenMPchoosesautomaticallynumberthreadstophysicalavailable PEs fordynamicassignement
      52
    • 53. MC Study of a SMS using OpenCL andOpenMP
      53
      Task
      Modelling
      Euler as simple ODE solver
      Existing MIMD Solutions
      An SIMD-Approach
      OpenMP
      Result Plots
      Speed-Up-Study
      ParallizationConclusions
      Resumée
    • 54. Resumée
      taskcanbesolved on CPUs and GPUs
      For GPU Computing newapproachesandalgorithmportingrequired
      although GPUs have massive numberof parallel operatingcores, speed-up not foreveryapplicationdomainpossible
      54
    • 55. Resumée
      Advantages GPU Computing:
      forsuitedproblems (e.g. Multimedia) very fast andscalable
      cheap HPC technology in comparisontoscientificsupercomputers
      energy-efficient
      massive computing power in smallsize
      Disadvantage GPU Computing:
      limited instructionset
      strictly SIMD
      SIMD Algorithmdevelopmenthard
      noexecutionsupervision (e.g. segmentation/page fault)
      55
    • 56. Overview
      Basics of Parallel Computing
      Brief Historyof SIMD vs. MIMD Architectures
      OpenCL
      Common Application Domain
      Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP
      56