SlideShare a Scribd company logo
Centre for Development of Advanced Computing
OpenMP
Samir Shaikh
C-DAC Pune
Centre for Development of Advanced Computing
Contents
➢ Basic Architecture
➢ Sequential Program Execution
➢ Introduction to OpenMP
➢ OpenMP Programming Model
➢ OpenMP Syntax and Execution
➢ OpenMP Directives and Clauses
➢ Race Condition
➢ Environment variables
➢ OpenMP examples
➢ Good things and Limitation of OpenMP
➢ References
Centre for Development of Advanced Computing
Parallel programing Architectures/Model ..
❏ Shared-memory Model ❏ Distributed-memory Model
❖ openMP
❖ Pthreads ...
❖ MPI - Message Passing
Interface
❏ Hybrid model
Centre for Development of Advanced Computing
Basic Architecture
Centre for Development of Advanced Computing
Sequential Program Execution
When you run sequential
program
➢ Instructions executed in serial
➢ Other cores are idle
Waste of available resource...
We want all cores to be used
to execute program.
HOW ?
Centre for Development of Advanced Computing
OpenMP Introduction
➢ Open Specification for Multi Processing.
➢ It is an specification for
➢ OpenMP is an Application Program Interface
(API) for writing multi-threaded, shared
memory parallelism.
➢ Easy to create multi-threaded programs in
C,C++ and Fortran.
Centre for Development of Advanced Computing
OpenMP Versions
➢ Fortran 1.0 was released in October 1997
○ C/C++ 1.0 was approved in November 1998
○ version 2.5 joint specification in May 2005
○ OpenMP 3.0 API released May 2008
○ Version 4.0 released in July 2013
○ Version 4.5 released in Nov 2015
Centre for Development of Advanced Computing
Why Choose OpenMP ?
➢ Portable
○ standardized for shared memory architectures
➢ Simple and Quick
○ Relatively easy to do parallelization for small parts of
an application at a time
○ Incremental parallelization
○ Supports both fine grained and coarse grained
parallelism
➢ Compact API
○ Simple and limited set of directives
○ Not automatic parallelization
Centre for Development of Advanced Computing
Execution Model
➢ OpenMP program starts single threaded
➢ To create additional threads, user starts a parallel region
○ additional threads are launched to create a team
○ original (master) thread is part of the team
○ threads “go away” at the end of the parallel region:
usually sleep or spin
➢ Repeat parallel regions as necessary
Fork-join model
Centre for Development of Advanced Computing
OpenMP Programming Model
#pragma omp parallel
{
……
…...
}
main (..)
{
#pragma omp parallel
{
……
…...
}
}
Centre for Development of Advanced Computing
Communication Between Threads
➢ Shared Memory Model
○ threads read and write shared variable no need for explicit
message passing
○ use synchronization to protect against race conditions
○ change storage attributes for minimizing synchronization and
improving cache reuse
Centre for Development of Advanced Computing
When to use openMP ?
➢ Target platform is multi core or
multiprocessor.
➢ Application is cross-platform
➢ Parallelizable loop
➢ Last-minute optimization
Centre for Development of Advanced Computing
OpenMP Basic Syntax
➢ Header file
#include <omp.h>
➢ Parallel region:
#pragma omp construct_name [clause, clause...]
{
// .. Do some work here
}
// end of parallel region/block
Environment variables and functions
➢ In Fortran:
!$OMP construct [clause [clause] .. ]
C$OMP construct [clause [clause] ..]
*$OMP construct [clause [clause] ..]
Centre for Development of Advanced Computing
Executing OpenMP Program
Compilation
gcc –fopenmp <program name> –o <execcutable>
gfortran –fopenmp <program name> –o <execcutable>
ifort <program name> -qopenmp –o <execcutable>
icc <program name> -qopenmp –o <execcutable>
Execution:
./ <executable-name>
Centre for Development of Advanced Computing
Compiler support for OpenMP
Compiler Flag
GNU (gcc/g++) -fopenmp
Intel (icc) -qopenmp
Sun -xopenmp
Many more….
Centre for Development of Advanced Computing
OpenMP Language extensions
Centre for Development of Advanced Computing
OpenMP Constructs
➢ Parallel Regions
#pragma omp parallel
➢ Worksharing
#pragma omp for, #pragma omp sections
➢ Data Environment
#pragma omp parallel shared/private (...)
➢ Synchronization
#pragma omp barrier, #pragma omp critical
Centre for Development of Advanced Computing
Parallel Region
➢ Fork a team of N threads {0.... N-1}
➢ Without it, all codes are sequential
Centre for Development of Advanced Computing
Loop Constructs: Parallel do/for
In C:
#pragma omp parallel for
for(i=0; i<n; i++)
a[i] = b[i] + c[i]
In Fortran:
!$omp parallel do
Do i=1, n
a(i) = b(i) +c(i)
enddo
Centre for Development of Advanced Computing
Clauses for Parallel constructs
➢ shared
➢ private
➢ firstprivate
➢ lastprivate
➢ nowait
#pragma omp parallel [clause, clause, ...]
➢ if
➢ reduction
➢ Schedule
➢ default
Centre for Development of Advanced Computing
private Clause
➢ The values of private data are undefined upon
entry to and exit from the specific construct.
➢ Loop iteration variable is private by default.
Example:
#prgma omp parallel for private(a)
for(i=0; i<n; i++)
{
a=i+1;
printf(“Thread %d has value of a=%d for i=%dn”,
omp_get_thread_num(),a,i)
}
Centre for Development of Advanced Computing
firstprivate Clause
➢ The clause combines behavior of private
clause with automatic initialization of the
variables in its list.
Example:
b=50,n=1858;
printf(“Before parallel loop: b=%d ,n=%dn”,b,n)
#pragma omp parallel for private(i), firstprivate(b), lastprivate(a)
for(i=0; i<n; i++)
{
a=i+b;
}
c=a+b;
Centre for Development of Advanced Computing
lastprivate Clause
➢ Performs finalization of private variables.
➢ Each thread has its own copy.
Example:
b=50,n=1858;
printf(“Before parallel loop: b=%d ,n=%dn”,b,n)
#pragma omp parallel for private(i), firstprivate(b), lastprivate(a)
for(i=0; i<n; i++)
{
a=i+b;
}
c=a+b;
Centre for Development of Advanced Computing
shared Clause
➢ Shared among team of threads.
➢ Each thread can modify shared variables.
➢ Data corruption is possible when multiple threads
attempt to update the same memory location.
➢ Data correctness is user’s responsibility.
Centre for Development of Advanced Computing
default Clause
➢ Defines the default data scope within parallel
region.
➢ default(private | shared | none).
Centre for Development of Advanced Computing
nowait Clause
➢ Allow threads that finish earlier to proceed
without waiting.
➢ If specified, then threads do not synchronize
at the end of parallel loop.
#pragma omp parallel nowait
Centre for Development of Advanced Computing
reduction Clause
➢ Performs a reduction on variables subject to
given operator.
#pragma omp parallel for reduction(+ : result)
for (unsigned i = 0; i < v.size(); i++)
{
result += v[i];
...
Centre for Development of Advanced Computing
schedule Clause
➢ Specifies how loop iteration are divided
among team of threads.
➢ Supported Scheduling Classes:
○ Static
○ Dynamic
○ Guided
○ runtime
#pragma omp parallel for schedule (type,[chunk size])
{
// ...some stuff
}
Centre for Development of Advanced Computing
schedule Clause.. cont
schedule(static, [n])
➢ Each thread is assigned chunks in “round robin” fashion, known as
block cyclic scheduling.
➢ If n has not been specified, it will contain
CEILING(number_of_iterations / number_of_threads) iterations.
○ Example: loop of length 16, 3 threads, chunk size 2:
Centre for Development of Advanced Computing
schedule Clause.. cont
schedule(dynamic, [n])
➢ Iteration of loop are divided into chunks containing n iterations
each.
➢ Default chunk size is 1.
!$omp do schedule (dynamic)
do i=1,8
… (loop body)
end do
!$omp end do
Centre for Development of Advanced Computing
schedule(guided, [n])
➢ If you specify n, that is the minimum chunk size that each thread
should posses.
➢ Size of each successive chunks is exponentially decreasing.
➢ initial chunk size
max((num_of_iterations/num_of_threads),n)
Subsequent chunks consist of
max(remaining_iterations/number_of_threads,n) iterations.
Schedule(runtime): export OMP_SCHEDULE “STATIC, 4”
➢ Determine the scheduling type at run time by the
OMP_SCHEDULE environment variable.
schedule Clause.. cont
Centre for Development of Advanced Computing
Work sharing : Section Directive
➢ Each section is executed exactly once and one thread executes
one section
#pragma omp parallel
#pragma omp sections
{
#pragma omp section
x_calculation();
#pragma omp section
y_calculation();
#pragma omp section
z_calculation();
}
Centre for Development of Advanced Computing
Designated section is executed by single thread only.
#pragma omp single
{
a =10;
}
#pragma omp for
{
for (i=0;i<N;i++)
b[i] = a;
}
Work sharing : Single Directive
Centre for Development of Advanced Computing
➢ Similar to single, but code block will be executed by the master
thread only.
#pragma omp master
{
a =10;
}
#pragma omp for
{
for (i=0;i<N;i++)
b[i] = a;
}
Work sharing : Master
C/C++:
#pragma omp master
----- block of code--
Centre for Development of Advanced Computing
Race condition
Finding the largest
element in a list of
numbers.
Max = 10
!$omp parallel do
do i=1,n
if(a(i) .gt. Max) then
Max = a(i)
endif
enddo
Thread 0
Read a(i) value = 12
Read Max value = 10
If (a(i) > Max) (12 > 10)
Max = a(i) (i.e. 12)
Thread 1
Read a(i) value = 11
Read Max value = 10
If (a(i) > Max) (11 > 10)
Max = a(i) (i.e. 11
Centre for Development of Advanced Computing
Synchronization: Critical Section
Directive restrict access to the enclosed code to only one thread at a
time. cnt = 0, f=7;
#pragma omp parallel
{
#pragma omp for
for(i=0;i<20;i++)
{
if(b[i] == 0)
{
#pragma omp
critical
cnt++;
}
a[i] = b[i] + f*(i+1);
}
}
Centre for Development of Advanced Computing
Synchronization:Barrier Directive
Synchronizes all the threads in a team.
Centre for Development of Advanced Computing
int x=2;
#pragma omp parallel shared(x)
{
int tid = omp_get_thread_num();
if(tid == 0)
x=5;
else
printf(“[1] thread %2d:
x=%dn”,tid,x);
#pragma omp barrier
printf(“[2] thread %2d:
x=%dn”,tid,x);
}
Synchronization:Barrier Directive
Some threads may still have
x=2 here
Cache flush + thread
synchronization
All threads have x=5 here
Centre for Development of Advanced Computing
➢ Mini Critical section.
➢ Specific memory location must be updated atomically.
Synchronization:Atomic Directive
C:
#pragma omp atomic
----- Single line code--
Centre for Development of Advanced Computing
Environment Variable
➢ To set number of threads during execution
export OMP_NUM_THREADS=4
➢ To allow run time system to determine the number of
threads
export OMP_DYNAMIC=TRUE
➢ To allow nesting of parallel region
export OMP_NESTED=TRUE
➢ Get thread ID
omp_get_thread_num()
Centre for Development of Advanced Computing
Control the Number of Threads
➢ Parallel region
#pragma omp parallel num_threads(integer)
➢ Run-time function/ICV
omp_set_num_threads(integer)
➢ Environment Variable
OMP_NUM_THREADS
Higher Priority
Centre for Development of Advanced Computing
Good Things about OpenMP
➢ Incrementally Parallelization of sequential code.
➢ Leave thread management to compiler.
➢ Directly supported by compiler.
Centre for Development of Advanced Computing
Limitations
➢ Internal details are hidden
➢ Run efficiently in shared-memory multiprocessor
platforms only but not on distributed memory
➢ Limited performance by memory architecture
Centre for Development of Advanced Computing
References
https://computing.llnl.gov/tutorials/openMP/
http://wiki.scinethpc.ca/wiki/images/9/9b/D s-openmp.pdf
www.openmp.org/
http://openmp.org/sc13/OpenMP4.0_Intro_Y onghongYan_SC13.pdf
Centre for Development of Advanced Computing
Thank You
Centre for Development of Advanced Computing
Task Construct
int fib(int n)
{
int i, j;
if (n<2) return n;
else
{
#pragma omp task shared(i) firstprivate(n)
i=fib(n-1);
#pragma omp task shared(j) firstprivate(n)
j=fib(n-2);
#pragma omp taskwait
return i+j;
}
}
● Tasks in OpenMP are code blocks that the compiler wraps up and
makes available to be executed in parallel.
Centre for Development of Advanced Computing
Task Construct
1. Taskgroup Construct
• #pragma omp taskgroup
2.Taskloop Construct
• #pragma omp taskloop num_tasks (x)
3.Taskyield Construct
• #pragma omp taskyield
4.Taskwait Construct
• #pragma omp taskwait
5. Task dependency
• #pragma omp task depend(in: x)
6.Task priority
• #pragma omp task priority(1)
Centre for Development of Advanced Computing
Taskloop Construct
#pragma omp taskgroup
{
for (int tmp = 0; tmp < 32;
tmp++)
#pragma omp task
for (long l = tmp * 32; l
< tmp * 32 + 32; l++)
do_something (l);
}
#pragma omp taskloop
num_tasks (32)
for (long l = 0; l <
1024; l++)
do_something (l);
OpenMP 4.0 OpenMP 4.5
Centre for Development of Advanced Computing
OpenMP 4.5 Features
• Explicit map clause
• Target Construct
• Target data Construct
• Target Update Construct
• Declare target
• Teams construct
• Device run time routines
• Target Memory and device pointer routines
• Thread Affinity
• Linear clause in loop construct
• Asynchronous target execution with nowait clause
• Doacross loop construct
• SIMD construct
Centre for Development of Advanced Computing
Environment Variable
➢ To set number of threads during execution
export OMP_NUM_THREADS=4
➢ To allow run time system to determine the number of
threads
export OMP_DYNAMIC=TRUE
➢ To allow nesting of parallel region
export OMP_NESTED=TRUE
➢ Get thread ID
omp_get_thread_num()
Centre for Development of Advanced Computing
Control the Number of Threads
➢ Parallel region
#pragma omp parallel num_threads(integer)
➢ Run-time function/ICV
omp_set_num_threads(integer)
➢ Environment Variable
OMP_NUM_THREADS
Higher Priority
Centre for Development of Advanced Computing
OPENMP 3.1 TO 4+ FEATURES
Openmp 3.0 Openmp 4.0 to 4.0.2 Openmp 4.0.2 to 4.5
Parallel construct
Worksharing
construct
Synronization
construct Data
sharing clauses Task
Construct
Proc_bind clause
Task group construct
Task dependences
Target construct
Target update
construct
Declare target
construct Teams
construct Device
runtime routines
Distribute construct
Task Priority
Task loop construct
Device memory
routines and device
pointers Do across
loop construct Linear
clause in loop
construct
Cancellation
Construct
Centre for Development of Advanced Computing
Good Things about OpenMP
➢ Incrementally Parallelization of sequential code.
➢ Leave thread management to compiler.
➢ Directly supported by compiler.
Centre for Development of Advanced Computing
Limitations
➢ Internal details are hidden
➢ Run efficiently in shared-memory multiprocessor
platforms only but not on distributed memory
➢ Limited performance by memory architecture
Centre for Development of Advanced Computing
References
https://computing.llnl.gov/tutorials/openMP/
http://wiki.scinethpc.ca/wiki/images/9/9b/D s-openmp.pdf
www.openmp.org/
http://openmp.org/sc13/OpenMP4.0_Intro_Y onghongYan_SC13.pdf
Centre for Development of Advanced Computing
References
https://www.openmp.org/#
https://www.openmp.org/resources/openmp-benchmarks/
https://www.openmp.org/resources/openmp-presentations/
https://www.openmp.org/resources/openmp-presentations/sc-19-
booth-talks/
https://www.openmp.org/events/sc21/
Centre for Development of Advanced Computing
Thank You

More Related Content

Similar to openmpfinal.pdf

CS4961-L9.ppt
CS4961-L9.pptCS4961-L9.ppt
CS4961-L9.ppt
MarlonMagtibay2
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
CSUC - Consorci de Serveis Universitaris de Catalunya
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
Akhila Prabhakaran
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
CSUC - Consorci de Serveis Universitaris de Catalunya
 
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
javier ramirez
 
Faster computation with matlab
Faster computation with matlabFaster computation with matlab
Faster computation with matlab
Muhammad Alli
 
Open mp directives
Open mp directivesOpen mp directives
Open mp directives
Prabhakaran V M
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
Jeff Larkin
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Intel® Software
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDB
Linaro
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
Zvi Avraham
 
OpenMP
OpenMPOpenMP
OpenMP
Eric Cheng
 
Lecture8
Lecture8Lecture8
Lecture8
tt_aljobory
 
openmp final2.pptx
openmp final2.pptxopenmp final2.pptx
openmp final2.pptx
GopalPatidar13
 
Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
Anshul Sharma
 
Smoking docker
Smoking dockerSmoking docker
Smoking docker
Workhorse Computing
 
25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx
GopalPatidar13
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
Dhanashree Prasad
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
AMD Developer Central
 
Lecture6
Lecture6Lecture6
Lecture6
tt_aljobory
 

Similar to openmpfinal.pdf (20)

CS4961-L9.ppt
CS4961-L9.pptCS4961-L9.ppt
CS4961-L9.ppt
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
How a BEAM runner executes a pipeline. Apache BEAM Summit London 2018
 
Faster computation with matlab
Faster computation with matlabFaster computation with matlab
Faster computation with matlab
 
Open mp directives
Open mp directivesOpen mp directives
Open mp directives
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDB
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
OpenMP
OpenMPOpenMP
OpenMP
 
Lecture8
Lecture8Lecture8
Lecture8
 
openmp final2.pptx
openmp final2.pptxopenmp final2.pptx
openmp final2.pptx
 
Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
 
Smoking docker
Smoking dockerSmoking docker
Smoking docker
 
25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
Lecture6
Lecture6Lecture6
Lecture6
 

Recently uploaded

Blood finder application project report (1).pdf
Blood finder application project report (1).pdfBlood finder application project report (1).pdf
Blood finder application project report (1).pdf
Kamal Acharya
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
PriyankaKilaniya
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
b0754201
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
vmspraneeth
 
paper relate Chozhavendhan et al. 2020.pdf
paper relate Chozhavendhan et al. 2020.pdfpaper relate Chozhavendhan et al. 2020.pdf
paper relate Chozhavendhan et al. 2020.pdf
ShurooqTaib
 
ELS: 2.4.1 POWER ELECTRONICS Course objectives: This course will enable stude...
ELS: 2.4.1 POWER ELECTRONICS Course objectives: This course will enable stude...ELS: 2.4.1 POWER ELECTRONICS Course objectives: This course will enable stude...
ELS: 2.4.1 POWER ELECTRONICS Course objectives: This course will enable stude...
Kuvempu University
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
snaprevwdev
 
Properties of Fluids, Fluid Statics, Pressure Measurement
Properties of Fluids, Fluid Statics, Pressure MeasurementProperties of Fluids, Fluid Statics, Pressure Measurement
Properties of Fluids, Fluid Statics, Pressure Measurement
Indrajeet sahu
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
PreethaV16
 
309475979-Creativity-Innovation-notes-IV-Sem-2016-pdf.pdf
309475979-Creativity-Innovation-notes-IV-Sem-2016-pdf.pdf309475979-Creativity-Innovation-notes-IV-Sem-2016-pdf.pdf
309475979-Creativity-Innovation-notes-IV-Sem-2016-pdf.pdf
Sou Tibon
 
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)3rd International Conference on Artificial Intelligence Advances (AIAD 2024)
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)
GiselleginaGloria
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
sydezfe
 
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
DharmaBanothu
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Transcat
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
DELTA V MES EMERSON EDUARDO RODRIGUES ENGINEER
DELTA V MES EMERSON EDUARDO RODRIGUES ENGINEERDELTA V MES EMERSON EDUARDO RODRIGUES ENGINEER
DELTA V MES EMERSON EDUARDO RODRIGUES ENGINEER
EMERSON EDUARDO RODRIGUES
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
IJCNCJournal
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 

Recently uploaded (20)

Blood finder application project report (1).pdf
Blood finder application project report (1).pdfBlood finder application project report (1).pdf
Blood finder application project report (1).pdf
 
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
 
paper relate Chozhavendhan et al. 2020.pdf
paper relate Chozhavendhan et al. 2020.pdfpaper relate Chozhavendhan et al. 2020.pdf
paper relate Chozhavendhan et al. 2020.pdf
 
ELS: 2.4.1 POWER ELECTRONICS Course objectives: This course will enable stude...
ELS: 2.4.1 POWER ELECTRONICS Course objectives: This course will enable stude...ELS: 2.4.1 POWER ELECTRONICS Course objectives: This course will enable stude...
ELS: 2.4.1 POWER ELECTRONICS Course objectives: This course will enable stude...
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
 
Properties of Fluids, Fluid Statics, Pressure Measurement
Properties of Fluids, Fluid Statics, Pressure MeasurementProperties of Fluids, Fluid Statics, Pressure Measurement
Properties of Fluids, Fluid Statics, Pressure Measurement
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
 
309475979-Creativity-Innovation-notes-IV-Sem-2016-pdf.pdf
309475979-Creativity-Innovation-notes-IV-Sem-2016-pdf.pdf309475979-Creativity-Innovation-notes-IV-Sem-2016-pdf.pdf
309475979-Creativity-Innovation-notes-IV-Sem-2016-pdf.pdf
 
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)3rd International Conference on Artificial Intelligence Advances (AIAD 2024)
3rd International Conference on Artificial Intelligence Advances (AIAD 2024)
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
 
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
DELTA V MES EMERSON EDUARDO RODRIGUES ENGINEER
DELTA V MES EMERSON EDUARDO RODRIGUES ENGINEERDELTA V MES EMERSON EDUARDO RODRIGUES ENGINEER
DELTA V MES EMERSON EDUARDO RODRIGUES ENGINEER
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 

openmpfinal.pdf

  • 1. Centre for Development of Advanced Computing OpenMP Samir Shaikh C-DAC Pune
  • 2. Centre for Development of Advanced Computing Contents ➢ Basic Architecture ➢ Sequential Program Execution ➢ Introduction to OpenMP ➢ OpenMP Programming Model ➢ OpenMP Syntax and Execution ➢ OpenMP Directives and Clauses ➢ Race Condition ➢ Environment variables ➢ OpenMP examples ➢ Good things and Limitation of OpenMP ➢ References
  • 3. Centre for Development of Advanced Computing Parallel programing Architectures/Model .. ❏ Shared-memory Model ❏ Distributed-memory Model ❖ openMP ❖ Pthreads ... ❖ MPI - Message Passing Interface ❏ Hybrid model
  • 4. Centre for Development of Advanced Computing Basic Architecture
  • 5. Centre for Development of Advanced Computing Sequential Program Execution When you run sequential program ➢ Instructions executed in serial ➢ Other cores are idle Waste of available resource... We want all cores to be used to execute program. HOW ?
  • 6. Centre for Development of Advanced Computing OpenMP Introduction ➢ Open Specification for Multi Processing. ➢ It is an specification for ➢ OpenMP is an Application Program Interface (API) for writing multi-threaded, shared memory parallelism. ➢ Easy to create multi-threaded programs in C,C++ and Fortran.
  • 7. Centre for Development of Advanced Computing OpenMP Versions ➢ Fortran 1.0 was released in October 1997 ○ C/C++ 1.0 was approved in November 1998 ○ version 2.5 joint specification in May 2005 ○ OpenMP 3.0 API released May 2008 ○ Version 4.0 released in July 2013 ○ Version 4.5 released in Nov 2015
  • 8. Centre for Development of Advanced Computing Why Choose OpenMP ? ➢ Portable ○ standardized for shared memory architectures ➢ Simple and Quick ○ Relatively easy to do parallelization for small parts of an application at a time ○ Incremental parallelization ○ Supports both fine grained and coarse grained parallelism ➢ Compact API ○ Simple and limited set of directives ○ Not automatic parallelization
  • 9. Centre for Development of Advanced Computing Execution Model ➢ OpenMP program starts single threaded ➢ To create additional threads, user starts a parallel region ○ additional threads are launched to create a team ○ original (master) thread is part of the team ○ threads “go away” at the end of the parallel region: usually sleep or spin ➢ Repeat parallel regions as necessary Fork-join model
  • 10. Centre for Development of Advanced Computing OpenMP Programming Model #pragma omp parallel { …… …... } main (..) { #pragma omp parallel { …… …... } }
  • 11. Centre for Development of Advanced Computing Communication Between Threads ➢ Shared Memory Model ○ threads read and write shared variable no need for explicit message passing ○ use synchronization to protect against race conditions ○ change storage attributes for minimizing synchronization and improving cache reuse
  • 12. Centre for Development of Advanced Computing When to use openMP ? ➢ Target platform is multi core or multiprocessor. ➢ Application is cross-platform ➢ Parallelizable loop ➢ Last-minute optimization
  • 13. Centre for Development of Advanced Computing OpenMP Basic Syntax ➢ Header file #include <omp.h> ➢ Parallel region: #pragma omp construct_name [clause, clause...] { // .. Do some work here } // end of parallel region/block Environment variables and functions ➢ In Fortran: !$OMP construct [clause [clause] .. ] C$OMP construct [clause [clause] ..] *$OMP construct [clause [clause] ..]
  • 14. Centre for Development of Advanced Computing Executing OpenMP Program Compilation gcc –fopenmp <program name> –o <execcutable> gfortran –fopenmp <program name> –o <execcutable> ifort <program name> -qopenmp –o <execcutable> icc <program name> -qopenmp –o <execcutable> Execution: ./ <executable-name>
  • 15. Centre for Development of Advanced Computing Compiler support for OpenMP Compiler Flag GNU (gcc/g++) -fopenmp Intel (icc) -qopenmp Sun -xopenmp Many more….
  • 16. Centre for Development of Advanced Computing OpenMP Language extensions
  • 17. Centre for Development of Advanced Computing OpenMP Constructs ➢ Parallel Regions #pragma omp parallel ➢ Worksharing #pragma omp for, #pragma omp sections ➢ Data Environment #pragma omp parallel shared/private (...) ➢ Synchronization #pragma omp barrier, #pragma omp critical
  • 18. Centre for Development of Advanced Computing Parallel Region ➢ Fork a team of N threads {0.... N-1} ➢ Without it, all codes are sequential
  • 19. Centre for Development of Advanced Computing Loop Constructs: Parallel do/for In C: #pragma omp parallel for for(i=0; i<n; i++) a[i] = b[i] + c[i] In Fortran: !$omp parallel do Do i=1, n a(i) = b(i) +c(i) enddo
  • 20. Centre for Development of Advanced Computing Clauses for Parallel constructs ➢ shared ➢ private ➢ firstprivate ➢ lastprivate ➢ nowait #pragma omp parallel [clause, clause, ...] ➢ if ➢ reduction ➢ Schedule ➢ default
  • 21. Centre for Development of Advanced Computing private Clause ➢ The values of private data are undefined upon entry to and exit from the specific construct. ➢ Loop iteration variable is private by default. Example: #prgma omp parallel for private(a) for(i=0; i<n; i++) { a=i+1; printf(“Thread %d has value of a=%d for i=%dn”, omp_get_thread_num(),a,i) }
  • 22. Centre for Development of Advanced Computing firstprivate Clause ➢ The clause combines behavior of private clause with automatic initialization of the variables in its list. Example: b=50,n=1858; printf(“Before parallel loop: b=%d ,n=%dn”,b,n) #pragma omp parallel for private(i), firstprivate(b), lastprivate(a) for(i=0; i<n; i++) { a=i+b; } c=a+b;
  • 23. Centre for Development of Advanced Computing lastprivate Clause ➢ Performs finalization of private variables. ➢ Each thread has its own copy. Example: b=50,n=1858; printf(“Before parallel loop: b=%d ,n=%dn”,b,n) #pragma omp parallel for private(i), firstprivate(b), lastprivate(a) for(i=0; i<n; i++) { a=i+b; } c=a+b;
  • 24. Centre for Development of Advanced Computing shared Clause ➢ Shared among team of threads. ➢ Each thread can modify shared variables. ➢ Data corruption is possible when multiple threads attempt to update the same memory location. ➢ Data correctness is user’s responsibility.
  • 25. Centre for Development of Advanced Computing default Clause ➢ Defines the default data scope within parallel region. ➢ default(private | shared | none).
  • 26. Centre for Development of Advanced Computing nowait Clause ➢ Allow threads that finish earlier to proceed without waiting. ➢ If specified, then threads do not synchronize at the end of parallel loop. #pragma omp parallel nowait
  • 27. Centre for Development of Advanced Computing reduction Clause ➢ Performs a reduction on variables subject to given operator. #pragma omp parallel for reduction(+ : result) for (unsigned i = 0; i < v.size(); i++) { result += v[i]; ...
  • 28. Centre for Development of Advanced Computing schedule Clause ➢ Specifies how loop iteration are divided among team of threads. ➢ Supported Scheduling Classes: ○ Static ○ Dynamic ○ Guided ○ runtime #pragma omp parallel for schedule (type,[chunk size]) { // ...some stuff }
  • 29. Centre for Development of Advanced Computing schedule Clause.. cont schedule(static, [n]) ➢ Each thread is assigned chunks in “round robin” fashion, known as block cyclic scheduling. ➢ If n has not been specified, it will contain CEILING(number_of_iterations / number_of_threads) iterations. ○ Example: loop of length 16, 3 threads, chunk size 2:
  • 30. Centre for Development of Advanced Computing schedule Clause.. cont schedule(dynamic, [n]) ➢ Iteration of loop are divided into chunks containing n iterations each. ➢ Default chunk size is 1. !$omp do schedule (dynamic) do i=1,8 … (loop body) end do !$omp end do
  • 31. Centre for Development of Advanced Computing schedule(guided, [n]) ➢ If you specify n, that is the minimum chunk size that each thread should posses. ➢ Size of each successive chunks is exponentially decreasing. ➢ initial chunk size max((num_of_iterations/num_of_threads),n) Subsequent chunks consist of max(remaining_iterations/number_of_threads,n) iterations. Schedule(runtime): export OMP_SCHEDULE “STATIC, 4” ➢ Determine the scheduling type at run time by the OMP_SCHEDULE environment variable. schedule Clause.. cont
  • 32. Centre for Development of Advanced Computing Work sharing : Section Directive ➢ Each section is executed exactly once and one thread executes one section #pragma omp parallel #pragma omp sections { #pragma omp section x_calculation(); #pragma omp section y_calculation(); #pragma omp section z_calculation(); }
  • 33. Centre for Development of Advanced Computing Designated section is executed by single thread only. #pragma omp single { a =10; } #pragma omp for { for (i=0;i<N;i++) b[i] = a; } Work sharing : Single Directive
  • 34. Centre for Development of Advanced Computing ➢ Similar to single, but code block will be executed by the master thread only. #pragma omp master { a =10; } #pragma omp for { for (i=0;i<N;i++) b[i] = a; } Work sharing : Master C/C++: #pragma omp master ----- block of code--
  • 35. Centre for Development of Advanced Computing Race condition Finding the largest element in a list of numbers. Max = 10 !$omp parallel do do i=1,n if(a(i) .gt. Max) then Max = a(i) endif enddo Thread 0 Read a(i) value = 12 Read Max value = 10 If (a(i) > Max) (12 > 10) Max = a(i) (i.e. 12) Thread 1 Read a(i) value = 11 Read Max value = 10 If (a(i) > Max) (11 > 10) Max = a(i) (i.e. 11
  • 36. Centre for Development of Advanced Computing Synchronization: Critical Section Directive restrict access to the enclosed code to only one thread at a time. cnt = 0, f=7; #pragma omp parallel { #pragma omp for for(i=0;i<20;i++) { if(b[i] == 0) { #pragma omp critical cnt++; } a[i] = b[i] + f*(i+1); } }
  • 37. Centre for Development of Advanced Computing Synchronization:Barrier Directive Synchronizes all the threads in a team.
  • 38. Centre for Development of Advanced Computing int x=2; #pragma omp parallel shared(x) { int tid = omp_get_thread_num(); if(tid == 0) x=5; else printf(“[1] thread %2d: x=%dn”,tid,x); #pragma omp barrier printf(“[2] thread %2d: x=%dn”,tid,x); } Synchronization:Barrier Directive Some threads may still have x=2 here Cache flush + thread synchronization All threads have x=5 here
  • 39. Centre for Development of Advanced Computing ➢ Mini Critical section. ➢ Specific memory location must be updated atomically. Synchronization:Atomic Directive C: #pragma omp atomic ----- Single line code--
  • 40. Centre for Development of Advanced Computing Environment Variable ➢ To set number of threads during execution export OMP_NUM_THREADS=4 ➢ To allow run time system to determine the number of threads export OMP_DYNAMIC=TRUE ➢ To allow nesting of parallel region export OMP_NESTED=TRUE ➢ Get thread ID omp_get_thread_num()
  • 41. Centre for Development of Advanced Computing Control the Number of Threads ➢ Parallel region #pragma omp parallel num_threads(integer) ➢ Run-time function/ICV omp_set_num_threads(integer) ➢ Environment Variable OMP_NUM_THREADS Higher Priority
  • 42. Centre for Development of Advanced Computing Good Things about OpenMP ➢ Incrementally Parallelization of sequential code. ➢ Leave thread management to compiler. ➢ Directly supported by compiler.
  • 43. Centre for Development of Advanced Computing Limitations ➢ Internal details are hidden ➢ Run efficiently in shared-memory multiprocessor platforms only but not on distributed memory ➢ Limited performance by memory architecture
  • 44. Centre for Development of Advanced Computing References https://computing.llnl.gov/tutorials/openMP/ http://wiki.scinethpc.ca/wiki/images/9/9b/D s-openmp.pdf www.openmp.org/ http://openmp.org/sc13/OpenMP4.0_Intro_Y onghongYan_SC13.pdf
  • 45. Centre for Development of Advanced Computing Thank You
  • 46. Centre for Development of Advanced Computing Task Construct int fib(int n) { int i, j; if (n<2) return n; else { #pragma omp task shared(i) firstprivate(n) i=fib(n-1); #pragma omp task shared(j) firstprivate(n) j=fib(n-2); #pragma omp taskwait return i+j; } } ● Tasks in OpenMP are code blocks that the compiler wraps up and makes available to be executed in parallel.
  • 47. Centre for Development of Advanced Computing Task Construct 1. Taskgroup Construct • #pragma omp taskgroup 2.Taskloop Construct • #pragma omp taskloop num_tasks (x) 3.Taskyield Construct • #pragma omp taskyield 4.Taskwait Construct • #pragma omp taskwait 5. Task dependency • #pragma omp task depend(in: x) 6.Task priority • #pragma omp task priority(1)
  • 48. Centre for Development of Advanced Computing Taskloop Construct #pragma omp taskgroup { for (int tmp = 0; tmp < 32; tmp++) #pragma omp task for (long l = tmp * 32; l < tmp * 32 + 32; l++) do_something (l); } #pragma omp taskloop num_tasks (32) for (long l = 0; l < 1024; l++) do_something (l); OpenMP 4.0 OpenMP 4.5
  • 49. Centre for Development of Advanced Computing OpenMP 4.5 Features • Explicit map clause • Target Construct • Target data Construct • Target Update Construct • Declare target • Teams construct • Device run time routines • Target Memory and device pointer routines • Thread Affinity • Linear clause in loop construct • Asynchronous target execution with nowait clause • Doacross loop construct • SIMD construct
  • 50. Centre for Development of Advanced Computing Environment Variable ➢ To set number of threads during execution export OMP_NUM_THREADS=4 ➢ To allow run time system to determine the number of threads export OMP_DYNAMIC=TRUE ➢ To allow nesting of parallel region export OMP_NESTED=TRUE ➢ Get thread ID omp_get_thread_num()
  • 51. Centre for Development of Advanced Computing Control the Number of Threads ➢ Parallel region #pragma omp parallel num_threads(integer) ➢ Run-time function/ICV omp_set_num_threads(integer) ➢ Environment Variable OMP_NUM_THREADS Higher Priority
  • 52. Centre for Development of Advanced Computing OPENMP 3.1 TO 4+ FEATURES Openmp 3.0 Openmp 4.0 to 4.0.2 Openmp 4.0.2 to 4.5 Parallel construct Worksharing construct Synronization construct Data sharing clauses Task Construct Proc_bind clause Task group construct Task dependences Target construct Target update construct Declare target construct Teams construct Device runtime routines Distribute construct Task Priority Task loop construct Device memory routines and device pointers Do across loop construct Linear clause in loop construct Cancellation Construct
  • 53. Centre for Development of Advanced Computing Good Things about OpenMP ➢ Incrementally Parallelization of sequential code. ➢ Leave thread management to compiler. ➢ Directly supported by compiler.
  • 54. Centre for Development of Advanced Computing Limitations ➢ Internal details are hidden ➢ Run efficiently in shared-memory multiprocessor platforms only but not on distributed memory ➢ Limited performance by memory architecture
  • 55. Centre for Development of Advanced Computing References https://computing.llnl.gov/tutorials/openMP/ http://wiki.scinethpc.ca/wiki/images/9/9b/D s-openmp.pdf www.openmp.org/ http://openmp.org/sc13/OpenMP4.0_Intro_Y onghongYan_SC13.pdf
  • 56. Centre for Development of Advanced Computing References https://www.openmp.org/# https://www.openmp.org/resources/openmp-benchmarks/ https://www.openmp.org/resources/openmp-presentations/ https://www.openmp.org/resources/openmp-presentations/sc-19- booth-talks/ https://www.openmp.org/events/sc21/
  • 57. Centre for Development of Advanced Computing Thank You