SlideShare a Scribd company logo
Parallel Computing on GPUs Christian Kehl 01.01.2011
Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 2
Basics of Parallel Computing Ref.: René Fink, „Untersuchungen zur Parallelverarbeitung mit wissenschaftlich-technischen Berechnungsumgebungen“, Diss Uni Rostock 2007 3
Basics of Parallel Computing 4
Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 5
Brief Historyof SIMD vs. MIMD Architectures 6
Brief Historyof SIMD vs. MIMD Architectures 7
Brief Historyof SIMD vs. MIMD Architectures 8
Brief Historyof SIMD vs. MIMD Architectures 2004– programmable GPU Core via Shader Technology 2007 – CUDA (Compute Unified Device Architecture) Release 1.0 December 2008 – First Open Compute Language Spec March 2009 – Uniform Shader, first BETA Releases of OpenCL August 2009 – Release and Implementation of  OpenCL 1.0 9
Brief Historyof SIMD vs. MIMD Architectures SIMD technologies in GPUs: Vector processing (ILLIAC IV) mathematical operation units (ILLIAC IV) Pipelining (CRAY-1) local memory caching (CRAY-1) atomic instructions (CRAY-1) synchronized instruction execution and memory access (MASPAR) 10
Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 11
Platform Model OpenCL One Host + one or more Compute Devices EachCompute Deviceis composed of one or moreCompute Units EachCompute Unitis further divided into one or moreProcessing Elements 12
Kernel Execution OpenCL Total number of work-items = Gx * Gy Size of each work-group = Sx * Sy Global ID can be computed from work-group ID and local ID 13
Memory Management OpenCL 14
Memory Management OpenCL 15
Memory Model OpenCL Address spaces Private - private to a work-item Local - local to a work-group Global - accessible by all work-items in all work-groups Constant - read only global space 16
Programming Language OpenCL Every GPU Computing technology natively written in C/C++ (Host) Host-Code Bindings to several other languages are existing (Fortran, Java, C#, Ruby) Device Code exclusively written in standard C + Extensions 17
Language Restrictions OpenCL Pointers to functions not allowed Pointers to pointers allowed within a kernel, but not as an argument Bit-fields not supported Variable-length arrays and structures not supported Recursion not supported Writes to a pointer of types less than 32-bit not supported Double types not supported, but reserved 3D Image writes not supported Some restrictions are addressed through extensions 18
Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 19
Common Application Domain Multimedia Data and Tasks best-suitedfor SIMD Processing Multimedia Data – sequentialBytestreams; each Byte independent Image Processing in particularsuitedfor GPUs original GPU task: „Compute <several FLOP> forevery Pixel ofthescreen“ ( Computer Graphics) same taskforimages, onlyFLOP‘sare different 20
Common Application Domain –  Image Processing possiblefeaturesrealizable on the GPU contrast- andluminanceconfiguration gammascaling (pixel-by-pixel-) histogramscaling convolutionfiltering edgehighlighting negative image / imageinversion … 21
Inversion Image Processing simple example: Inversion implementationanduseof a frameworkforswitchingbetween different GPGPU technologies creationof a commandqueueforeach GPU reading GPU kernel via kernelfile on-the-fly creationofbuffersforinputandoutputimage memorycopyofinputimagedatato global GPU memory setofkernelargumentsandkernelexecution memorycopyof GPU outputbufferdatatonewimage 22
Image Processing Inversion evaluatedandconfirmedminimumspeedup – G80 GPU OpenCL   VS.   8-core-CPU OpenMP 		4	   :			1 23
GPU Computing Case Study: Monte Carlo-Study of a Spring-Mass-System on GPUs
Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 25
MC Study of a SMS using OpenCL andOpenMP Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée 26
Task Spring-Mass-System definedby a differential equation Behaviorofthesystem must besimulatedovervaryingdampingvalues Therefore: numericalsolution in t; tε[0.0 … 2] sec. for a stepsize h=1/1000 Analysis ofcomputation time andspeed-upfor different computearchitectures 27
Task based on Simulation News Europe (SNE) CP2: 1000 simulationiterationsoversimulationhorizonwithgenerateddampingvalues (Monte-Carlo Study) consequtiveaveragingfor s(t) tε[0 … 2] sec; h=0.01  200 steps 28
Task on presentarchitecturestoolightweighted 	-> Modification: 5000 iterationswith Monte-Carlo h=0.001  2000 steps Aimof Analysis: Knowledgeabout spring behaviorfor different dampingvalues (trajectoryarray) 29
Task Simple Spring-Mass-System 	d … dampingconstant 	c … spring constant Movement equationderivedbyNewton‘s 2ndaxiom Modelling needed -> „Massenfreischnitt“ massismoved forcebalancing Equation 30
MC Study of a SMS using OpenCL andOpenMP 31 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
Modelling numericalintegrationbased on 2nd order differential equation DE order n  n DEs 1st order 32
Modelling Transformation bysubstitution 33 ,[object Object]
 5000 iterations,[object Object]
Euler as simple ODE solver numericalintegrationby explicit Euler method 35
MC Study of a SMS using OpenCL andOpenMP 36 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
existing  MIMD Solutions 37
existing  MIMD Solutions Approach can not beappliedto GPU Architectures MIMD-Requirements: each PE withowninstructionflow each PE canaccess RAM individually GPU Architecture -> SIMD each PE computesthe same instructionatthe same time each PE hastobeatthe same instructionforaccessing RAM  Therefore: Development SIMD-Approach 38
MC Study of a SMS using OpenCL andOpenMP 39 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
An SIMD Approach S.P./R.F.: simultaneousexecutionofsequential Simulation withvarying d-Parameter on spatiallydistributedPE‘s Averagingdependend on trajectories C.K.: simultaneouscomputationwith all d-Parameters for time tn, iterative repetitionuntiltend Averagingdependend on steps 40
An SIMD-Approach 41
MC Study of a SMS using OpenCL andOpenMP 42 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
OpenMP Parallization Technology based on sharedmemoryprinciple synchronizationhiddenfordeveloper threadmanagementcontrolable For System-V-based OS: parallizationbyprocessforking For Windows-based OS: parallizationbyWinThreadcreation (AMD Study/Intel Tech Paper) 43
OpenMP in C/C++: pragma-basedpreprocessordirectives in C# representedby ParallelLoops morethan just parallizing Loops (AMD Tech Report) Literature: AMD/Intel Tech Papers Thomas Rauber, „Parallele Programmierung“ Barbara Chapman, „UsingOpenMP: Portable Shared Memory Parallel Programming“ 44
MC Study of a SMS using OpenCL andOpenMP 45 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plot Speed-Up-Study ParallizationConclusions Resumée
Result Plot resultingtrajectoryfor all technologies 46
MC Study of a SMS using OpenCL andOpenMP 47 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
Speed-Up Study 48 OpenMP – own Study – Comparison CPU/GPU SIMD Single: presented SIMD approach on CPU SIMD OpenMP: presented SIMD approachparallized on CPU SIMD OpenCL: Controlofnumberofexecutingunits not possible, thereforeonly 1 value
Speed-Up Study 49 SIMD OpenCL SIMD single MIMD single SIMD OpenMP MIMD OpenMP
MC Study of a SMS using OpenCL andOpenMP 50 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
ParallizationConclusions problemunsuitedfor SIMD parallization On-GPU-Reductiontoo time expensive,  Therefore: Euler computation on GPU Averagecomputation on CPU most time intensive operation: MemCopybetween GPU and Main Memory formorecomplexproblems oder different ODE solverprocedurespeed-upbehaviorcanchange 51
ParallizationConclusion MIMD-Approach S.P./R.F. efficientfor SNE CP2 OpenMPrealizationfor MIMD- and SIMD-Approach possible (anddone) OpenMP MIMD realizationalmost linear speedup moreset Threads than PEs physicallyavailableleadstosignificant Thread-Overhead OpenMPchoosesautomaticallynumberthreadstophysicalavailable PEs fordynamicassignement 52
MC Study of a SMS using OpenCL andOpenMP 53 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
Resumée taskcanbesolved on CPUs and GPUs For GPU Computing newapproachesandalgorithmportingrequired although GPUs have massive numberof parallel operatingcores, speed-up not foreveryapplicationdomainpossible 54
Resumée Advantages GPU Computing: forsuitedproblems (e.g. Multimedia) very fast andscalable cheap HPC technology in comparisontoscientificsupercomputers energy-efficient massive computing power in smallsize Disadvantage GPU Computing: limited instructionset strictly SIMD SIMD Algorithmdevelopmenthard noexecutionsupervision (e.g. segmentation/page fault) 55

More Related Content

What's hot

OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
Facultad de Informática UCM
 
Arm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler support
Linaro
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Marina Kolpakova
 
Code GPU with CUDA - SIMT
Code GPU with CUDA - SIMTCode GPU with CUDA - SIMT
Code GPU with CUDA - SIMT
Marina Kolpakova
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
Chiou-Nan Chen
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flow
Marina Kolpakova
 
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
RISC-V International
 
ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 release
Luba Tang
 
Q4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-VectorizerQ4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-Vectorizer
Linaro
 
Parallel program design
Parallel program designParallel program design
Parallel program design
ZongYing Lyu
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Yusuke Izawa
 
Two-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One EngineTwo-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One Engine
Yusuke Izawa
 
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
Enrique Monzo Solves
 
Programming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & ProductivityProgramming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & Productivity
Linaro
 
Tools and Methods for Continuously Expanding Software Applications
Tools and Methods for Continuously Expanding Software ApplicationsTools and Methods for Continuously Expanding Software Applications
Tools and Methods for Continuously Expanding Software Applications
InfinIT - Innovationsnetværket for it
 
Tridiagonal solver in gpu
Tridiagonal solver in gpuTridiagonal solver in gpu
Tridiagonal solver in gpu
Hunan University
 
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
LAS16-501: Introduction to LLVM - Projects, Components, Integration, InternalsLAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
Linaro
 
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Takahiro Harada
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me
_xhr_
 
"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel
Edge AI and Vision Alliance
 

What's hot (20)

OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
 
Arm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler support
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
 
Code GPU with CUDA - SIMT
Code GPU with CUDA - SIMTCode GPU with CUDA - SIMT
Code GPU with CUDA - SIMT
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flow
 
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Claire
 
ONNC - 0.9.1 release
ONNC - 0.9.1 releaseONNC - 0.9.1 release
ONNC - 0.9.1 release
 
Q4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-VectorizerQ4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-Vectorizer
 
Parallel program design
Parallel program designParallel program design
Parallel program design
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
 
Two-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One EngineTwo-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One Engine
 
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
High Speed Decoding of Non-Binary Irregular LDPC Codes Using GPUs (Paper)
 
Programming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & ProductivityProgramming Languages & Tools for Higher Performance & Productivity
Programming Languages & Tools for Higher Performance & Productivity
 
Tools and Methods for Continuously Expanding Software Applications
Tools and Methods for Continuously Expanding Software ApplicationsTools and Methods for Continuously Expanding Software Applications
Tools and Methods for Continuously Expanding Software Applications
 
Tridiagonal solver in gpu
Tridiagonal solver in gpuTridiagonal solver in gpu
Tridiagonal solver in gpu
 
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
LAS16-501: Introduction to LLVM - Projects, Components, Integration, InternalsLAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
LAS16-501: Introduction to LLVM - Projects, Components, Integration, Internals
 
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me
 
"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel"Making OpenCV Code Run Fast," a Presentation from Intel
"Making OpenCV Code Run Fast," a Presentation from Intel
 

Viewers also liked

Notes2StudyGST-160511
Notes2StudyGST-160511Notes2StudyGST-160511
Notes2StudyGST-160511
xiaozhong hua
 
Haskell Accelerate
Haskell  AccelerateHaskell  Accelerate
Haskell Accelerate
Steve Severance
 
gtkgst video in your widgets!
gtkgst video in your widgets!gtkgst video in your widgets!
gtkgst video in your widgets!
ystreet00
 
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
npinto
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
Rajiv Kumar
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
Khan Mostafa
 
OpenGL 3.2 and More
OpenGL 3.2 and MoreOpenGL 3.2 and More
OpenGL 3.2 and More
Mark Kilgard
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An Introduction
Dhan V Sagar
 
Graphics processing unit (gpu)
Graphics processing unit (gpu)Graphics processing unit (gpu)
Graphics processing unit (gpu)
junliwanag
 
Introduction to Computing on GPU
Introduction to Computing on GPUIntroduction to Computing on GPU
Introduction to Computing on GPU
Ilya Kuzovkin
 
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsOpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUs
Mark Kilgard
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPU
Chetan Gole
 
GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)
self employed
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
Sandeep Singh
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
Domino Data Lab
 

Viewers also liked (15)

Notes2StudyGST-160511
Notes2StudyGST-160511Notes2StudyGST-160511
Notes2StudyGST-160511
 
Haskell Accelerate
Haskell  AccelerateHaskell  Accelerate
Haskell Accelerate
 
gtkgst video in your widgets!
gtkgst video in your widgets!gtkgst video in your widgets!
gtkgst video in your widgets!
 
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
OpenGL 3.2 and More
OpenGL 3.2 and MoreOpenGL 3.2 and More
OpenGL 3.2 and More
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An Introduction
 
Graphics processing unit (gpu)
Graphics processing unit (gpu)Graphics processing unit (gpu)
Graphics processing unit (gpu)
 
Introduction to Computing on GPU
Introduction to Computing on GPUIntroduction to Computing on GPU
Introduction to Computing on GPU
 
OpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUsOpenGL 4.5 Update for NVIDIA GPUs
OpenGL 4.5 Update for NVIDIA GPUs
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPU
 
GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)GRAPHICS PROCESSING UNIT (GPU)
GRAPHICS PROCESSING UNIT (GPU)
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
 

Similar to GPU Computing

Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
Anil Bohare
 
Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)
Anil Madhavapeddy
 
High-Performance Computing and OpenSolaris
High-Performance Computing and OpenSolarisHigh-Performance Computing and OpenSolaris
High-Performance Computing and OpenSolaris
José Maria Silveira Neto
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
Tigabu Yaya
 
Parallel computation
Parallel computationParallel computation
Parallel computation
Jayanti Prasad Ph.D.
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
Jayanti Prasad Ph.D.
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
Unai Lopez-Novoa
 
20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris
imec.archive
 
Multicore
MulticoreMulticore
Mathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakesMathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakes
Alexander Krizhanovsky
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
Hajime Tazaki
 
EEDC Programming Models
EEDC Programming ModelsEEDC Programming Models
EEDC Programming Models
Roger Rafanell Mas
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
AMD Developer Central
 
Madeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable HardwareMadeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable Hardware
ESUG
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
Martin Peniak
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
Roger Rafanell Mas
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Intel® Software
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
Ofer Rosenberg
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
Sagar Dolas
 
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
inside-BigData.com
 

Similar to GPU Computing (20)

Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)
 
High-Performance Computing and OpenSolaris
High-Performance Computing and OpenSolarisHigh-Performance Computing and OpenSolaris
High-Performance Computing and OpenSolaris
 
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
 
20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris
 
Multicore
MulticoreMulticore
Multicore
 
Mathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakesMathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakes
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
 
EEDC Programming Models
EEDC Programming ModelsEEDC Programming Models
EEDC Programming Models
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
Madeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable HardwareMadeo - a CAD Tool for reconfigurable Hardware
Madeo - a CAD Tool for reconfigurable Hardware
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
 

More from Christian Kehl

From noisy object surface scans to conformal unstructured grids of multiple m...
From noisy object surface scans to conformal unstructured grids of multiple m...From noisy object surface scans to conformal unstructured grids of multiple m...
From noisy object surface scans to conformal unstructured grids of multiple m...
Christian Kehl
 
Cuberilles Statistical Volume Visualisation for Medical and Geological Data
Cuberilles Statistical Volume Visualisation for Medical and Geological DataCuberilles Statistical Volume Visualisation for Medical and Geological Data
Cuberilles Statistical Volume Visualisation for Medical and Geological Data
Christian Kehl
 
Mobile Outcrop Geology using tablets
Mobile Outcrop Geology using tabletsMobile Outcrop Geology using tablets
Mobile Outcrop Geology using tablets
Christian Kehl
 
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
Christian Kehl
 
Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...
Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...
Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...
Christian Kehl
 
Conformal multi-material mesh generation from labelled medical volumes (Dec 2...
Conformal multi-material mesh generation from labelled medical volumes (Dec 2...Conformal multi-material mesh generation from labelled medical volumes (Dec 2...
Conformal multi-material mesh generation from labelled medical volumes (Dec 2...
Christian Kehl
 
Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...
Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...
Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...
Christian Kehl
 
Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)
Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)
Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)
Christian Kehl
 
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
Christian Kehl
 
WP 4 – Interactive simulation and 3D visualization for water policy developme...
WP 4 – Interactive simulation and 3D visualization for water policy developme...WP 4 – Interactive simulation and 3D visualization for water policy developme...
WP 4 – Interactive simulation and 3D visualization for water policy developme...
Christian Kehl
 
Topology-conform segmented volume meshing of volume images (Oct 2012)
Topology-conform segmented volume meshing of volume images (Oct 2012)Topology-conform segmented volume meshing of volume images (Oct 2012)
Topology-conform segmented volume meshing of volume images (Oct 2012)
Christian Kehl
 
Master Thesis: Conformal multi-material mesh generation from labelled medical...
Master Thesis: Conformal multi-material mesh generation from labelled medical...Master Thesis: Conformal multi-material mesh generation from labelled medical...
Master Thesis: Conformal multi-material mesh generation from labelled medical...
Christian Kehl
 
nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...
nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...
nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...
Christian Kehl
 
LiDAR acquisition
LiDAR acquisitionLiDAR acquisition
LiDAR acquisition
Christian Kehl
 
Fluid simulation
Fluid simulationFluid simulation
Fluid simulation
Christian Kehl
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video Encoding
Christian Kehl
 
Depth image recognition using isomorphic graph theory
Depth image recognition using isomorphic graph theoryDepth image recognition using isomorphic graph theory
Depth image recognition using isomorphic graph theory
Christian Kehl
 
Graph theory - Traveling Salesman and Chinese Postman
Graph theory - Traveling Salesman and Chinese PostmanGraph theory - Traveling Salesman and Chinese Postman
Graph theory - Traveling Salesman and Chinese Postman
Christian Kehl
 
Computer Graphics Modellering engels
Computer Graphics Modellering engelsComputer Graphics Modellering engels
Computer Graphics Modellering engels
Christian Kehl
 
Video-Konvertierung über GPGPU mit RIA-FrontEnd
Video-Konvertierung über GPGPU mit RIA-FrontEndVideo-Konvertierung über GPGPU mit RIA-FrontEnd
Video-Konvertierung über GPGPU mit RIA-FrontEndChristian Kehl
 

More from Christian Kehl (20)

From noisy object surface scans to conformal unstructured grids of multiple m...
From noisy object surface scans to conformal unstructured grids of multiple m...From noisy object surface scans to conformal unstructured grids of multiple m...
From noisy object surface scans to conformal unstructured grids of multiple m...
 
Cuberilles Statistical Volume Visualisation for Medical and Geological Data
Cuberilles Statistical Volume Visualisation for Medical and Geological DataCuberilles Statistical Volume Visualisation for Medical and Geological Data
Cuberilles Statistical Volume Visualisation for Medical and Geological Data
 
Mobile Outcrop Geology using tablets
Mobile Outcrop Geology using tabletsMobile Outcrop Geology using tablets
Mobile Outcrop Geology using tablets
 
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieva...
 
Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...
Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...
Distributed Rendering and Collaborative User Navigation- and Scene Manipulati...
 
Conformal multi-material mesh generation from labelled medical volumes (Dec 2...
Conformal multi-material mesh generation from labelled medical volumes (Dec 2...Conformal multi-material mesh generation from labelled medical volumes (Dec 2...
Conformal multi-material mesh generation from labelled medical volumes (Dec 2...
 
Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...
Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...
Interactive Simulation and Visualization of Large-Scale Flooding Scenarios (J...
 
Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)
Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)
Efficient Navigation in Temporal, Multi-Dimensional Point Sets (April 2013)
 
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
Smooth, Interactive Rendering and On-line Modification of Large-Scale, Geospa...
 
WP 4 – Interactive simulation and 3D visualization for water policy developme...
WP 4 – Interactive simulation and 3D visualization for water policy developme...WP 4 – Interactive simulation and 3D visualization for water policy developme...
WP 4 – Interactive simulation and 3D visualization for water policy developme...
 
Topology-conform segmented volume meshing of volume images (Oct 2012)
Topology-conform segmented volume meshing of volume images (Oct 2012)Topology-conform segmented volume meshing of volume images (Oct 2012)
Topology-conform segmented volume meshing of volume images (Oct 2012)
 
Master Thesis: Conformal multi-material mesh generation from labelled medical...
Master Thesis: Conformal multi-material mesh generation from labelled medical...Master Thesis: Conformal multi-material mesh generation from labelled medical...
Master Thesis: Conformal multi-material mesh generation from labelled medical...
 
nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...
nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...
nteractive visual analysis of flood scnarios using large-scale LiDAR point cl...
 
LiDAR acquisition
LiDAR acquisitionLiDAR acquisition
LiDAR acquisition
 
Fluid simulation
Fluid simulationFluid simulation
Fluid simulation
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video Encoding
 
Depth image recognition using isomorphic graph theory
Depth image recognition using isomorphic graph theoryDepth image recognition using isomorphic graph theory
Depth image recognition using isomorphic graph theory
 
Graph theory - Traveling Salesman and Chinese Postman
Graph theory - Traveling Salesman and Chinese PostmanGraph theory - Traveling Salesman and Chinese Postman
Graph theory - Traveling Salesman and Chinese Postman
 
Computer Graphics Modellering engels
Computer Graphics Modellering engelsComputer Graphics Modellering engels
Computer Graphics Modellering engels
 
Video-Konvertierung über GPGPU mit RIA-FrontEnd
Video-Konvertierung über GPGPU mit RIA-FrontEndVideo-Konvertierung über GPGPU mit RIA-FrontEnd
Video-Konvertierung über GPGPU mit RIA-FrontEnd
 

Recently uploaded

Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5
sayalidalavi006
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
simonomuemu
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 

Recently uploaded (20)

Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5Community pharmacy- Social and preventive pharmacy UNIT 5
Community pharmacy- Social and preventive pharmacy UNIT 5
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 

GPU Computing

  • 1. Parallel Computing on GPUs Christian Kehl 01.01.2011
  • 2. Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 2
  • 3. Basics of Parallel Computing Ref.: René Fink, „Untersuchungen zur Parallelverarbeitung mit wissenschaftlich-technischen Berechnungsumgebungen“, Diss Uni Rostock 2007 3
  • 4. Basics of Parallel Computing 4
  • 5. Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 5
  • 6. Brief Historyof SIMD vs. MIMD Architectures 6
  • 7. Brief Historyof SIMD vs. MIMD Architectures 7
  • 8. Brief Historyof SIMD vs. MIMD Architectures 8
  • 9. Brief Historyof SIMD vs. MIMD Architectures 2004– programmable GPU Core via Shader Technology 2007 – CUDA (Compute Unified Device Architecture) Release 1.0 December 2008 – First Open Compute Language Spec March 2009 – Uniform Shader, first BETA Releases of OpenCL August 2009 – Release and Implementation of OpenCL 1.0 9
  • 10. Brief Historyof SIMD vs. MIMD Architectures SIMD technologies in GPUs: Vector processing (ILLIAC IV) mathematical operation units (ILLIAC IV) Pipelining (CRAY-1) local memory caching (CRAY-1) atomic instructions (CRAY-1) synchronized instruction execution and memory access (MASPAR) 10
  • 11. Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 11
  • 12. Platform Model OpenCL One Host + one or more Compute Devices EachCompute Deviceis composed of one or moreCompute Units EachCompute Unitis further divided into one or moreProcessing Elements 12
  • 13. Kernel Execution OpenCL Total number of work-items = Gx * Gy Size of each work-group = Sx * Sy Global ID can be computed from work-group ID and local ID 13
  • 16. Memory Model OpenCL Address spaces Private - private to a work-item Local - local to a work-group Global - accessible by all work-items in all work-groups Constant - read only global space 16
  • 17. Programming Language OpenCL Every GPU Computing technology natively written in C/C++ (Host) Host-Code Bindings to several other languages are existing (Fortran, Java, C#, Ruby) Device Code exclusively written in standard C + Extensions 17
  • 18. Language Restrictions OpenCL Pointers to functions not allowed Pointers to pointers allowed within a kernel, but not as an argument Bit-fields not supported Variable-length arrays and structures not supported Recursion not supported Writes to a pointer of types less than 32-bit not supported Double types not supported, but reserved 3D Image writes not supported Some restrictions are addressed through extensions 18
  • 19. Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 19
  • 20. Common Application Domain Multimedia Data and Tasks best-suitedfor SIMD Processing Multimedia Data – sequentialBytestreams; each Byte independent Image Processing in particularsuitedfor GPUs original GPU task: „Compute <several FLOP> forevery Pixel ofthescreen“ ( Computer Graphics) same taskforimages, onlyFLOP‘sare different 20
  • 21. Common Application Domain – Image Processing possiblefeaturesrealizable on the GPU contrast- andluminanceconfiguration gammascaling (pixel-by-pixel-) histogramscaling convolutionfiltering edgehighlighting negative image / imageinversion … 21
  • 22. Inversion Image Processing simple example: Inversion implementationanduseof a frameworkforswitchingbetween different GPGPU technologies creationof a commandqueueforeach GPU reading GPU kernel via kernelfile on-the-fly creationofbuffersforinputandoutputimage memorycopyofinputimagedatato global GPU memory setofkernelargumentsandkernelexecution memorycopyof GPU outputbufferdatatonewimage 22
  • 23. Image Processing Inversion evaluatedandconfirmedminimumspeedup – G80 GPU OpenCL VS. 8-core-CPU OpenMP 4 : 1 23
  • 24. GPU Computing Case Study: Monte Carlo-Study of a Spring-Mass-System on GPUs
  • 25. Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 25
  • 26. MC Study of a SMS using OpenCL andOpenMP Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée 26
  • 27. Task Spring-Mass-System definedby a differential equation Behaviorofthesystem must besimulatedovervaryingdampingvalues Therefore: numericalsolution in t; tε[0.0 … 2] sec. for a stepsize h=1/1000 Analysis ofcomputation time andspeed-upfor different computearchitectures 27
  • 28. Task based on Simulation News Europe (SNE) CP2: 1000 simulationiterationsoversimulationhorizonwithgenerateddampingvalues (Monte-Carlo Study) consequtiveaveragingfor s(t) tε[0 … 2] sec; h=0.01  200 steps 28
  • 29. Task on presentarchitecturestoolightweighted -> Modification: 5000 iterationswith Monte-Carlo h=0.001  2000 steps Aimof Analysis: Knowledgeabout spring behaviorfor different dampingvalues (trajectoryarray) 29
  • 30. Task Simple Spring-Mass-System d … dampingconstant c … spring constant Movement equationderivedbyNewton‘s 2ndaxiom Modelling needed -> „Massenfreischnitt“ massismoved forcebalancing Equation 30
  • 31. MC Study of a SMS using OpenCL andOpenMP 31 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
  • 32. Modelling numericalintegrationbased on 2nd order differential equation DE order n  n DEs 1st order 32
  • 33.
  • 34.
  • 35. Euler as simple ODE solver numericalintegrationby explicit Euler method 35
  • 36. MC Study of a SMS using OpenCL andOpenMP 36 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
  • 37. existing MIMD Solutions 37
  • 38. existing MIMD Solutions Approach can not beappliedto GPU Architectures MIMD-Requirements: each PE withowninstructionflow each PE canaccess RAM individually GPU Architecture -> SIMD each PE computesthe same instructionatthe same time each PE hastobeatthe same instructionforaccessing RAM  Therefore: Development SIMD-Approach 38
  • 39. MC Study of a SMS using OpenCL andOpenMP 39 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
  • 40. An SIMD Approach S.P./R.F.: simultaneousexecutionofsequential Simulation withvarying d-Parameter on spatiallydistributedPE‘s Averagingdependend on trajectories C.K.: simultaneouscomputationwith all d-Parameters for time tn, iterative repetitionuntiltend Averagingdependend on steps 40
  • 42. MC Study of a SMS using OpenCL andOpenMP 42 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
  • 43. OpenMP Parallization Technology based on sharedmemoryprinciple synchronizationhiddenfordeveloper threadmanagementcontrolable For System-V-based OS: parallizationbyprocessforking For Windows-based OS: parallizationbyWinThreadcreation (AMD Study/Intel Tech Paper) 43
  • 44. OpenMP in C/C++: pragma-basedpreprocessordirectives in C# representedby ParallelLoops morethan just parallizing Loops (AMD Tech Report) Literature: AMD/Intel Tech Papers Thomas Rauber, „Parallele Programmierung“ Barbara Chapman, „UsingOpenMP: Portable Shared Memory Parallel Programming“ 44
  • 45. MC Study of a SMS using OpenCL andOpenMP 45 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plot Speed-Up-Study ParallizationConclusions Resumée
  • 46. Result Plot resultingtrajectoryfor all technologies 46
  • 47. MC Study of a SMS using OpenCL andOpenMP 47 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
  • 48. Speed-Up Study 48 OpenMP – own Study – Comparison CPU/GPU SIMD Single: presented SIMD approach on CPU SIMD OpenMP: presented SIMD approachparallized on CPU SIMD OpenCL: Controlofnumberofexecutingunits not possible, thereforeonly 1 value
  • 49. Speed-Up Study 49 SIMD OpenCL SIMD single MIMD single SIMD OpenMP MIMD OpenMP
  • 50. MC Study of a SMS using OpenCL andOpenMP 50 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
  • 51. ParallizationConclusions problemunsuitedfor SIMD parallization On-GPU-Reductiontoo time expensive, Therefore: Euler computation on GPU Averagecomputation on CPU most time intensive operation: MemCopybetween GPU and Main Memory formorecomplexproblems oder different ODE solverprocedurespeed-upbehaviorcanchange 51
  • 52. ParallizationConclusion MIMD-Approach S.P./R.F. efficientfor SNE CP2 OpenMPrealizationfor MIMD- and SIMD-Approach possible (anddone) OpenMP MIMD realizationalmost linear speedup moreset Threads than PEs physicallyavailableleadstosignificant Thread-Overhead OpenMPchoosesautomaticallynumberthreadstophysicalavailable PEs fordynamicassignement 52
  • 53. MC Study of a SMS using OpenCL andOpenMP 53 Task Modelling Euler as simple ODE solver Existing MIMD Solutions An SIMD-Approach OpenMP Result Plots Speed-Up-Study ParallizationConclusions Resumée
  • 54. Resumée taskcanbesolved on CPUs and GPUs For GPU Computing newapproachesandalgorithmportingrequired although GPUs have massive numberof parallel operatingcores, speed-up not foreveryapplicationdomainpossible 54
  • 55. Resumée Advantages GPU Computing: forsuitedproblems (e.g. Multimedia) very fast andscalable cheap HPC technology in comparisontoscientificsupercomputers energy-efficient massive computing power in smallsize Disadvantage GPU Computing: limited instructionset strictly SIMD SIMD Algorithmdevelopmenthard noexecutionsupervision (e.g. segmentation/page fault) 55
  • 56. Overview Basics of Parallel Computing Brief Historyof SIMD vs. MIMD Architectures OpenCL Common Application Domain Monte Carlo-Study of a Spring-Mass-System using OpenCL andOpenMP 56

Editor's Notes

  1. - GPU-GDRAM ist weiterhin unterteilt, entsprechend der physikalischen Architektur der Verarbeitungseinheit