Section J
OpenMP and Auto-Parallellization
Parallelization
Two means to obtain coarse-grained parallelization in
  Open64:
1.       OpenMP
     
          Option to...
Support of Coarse-grained Parallellism

Parallellized code outlined to its own function
Outlined function nested inside or...
Parallel Runtime Execution
Effected by spawning threads to execute parallel regions

Default number of threads is number o...
OpenMP Compilation Process
1.       Front-ends compile OpenMP directives into WHIRL OMP pragmas and
         region nodes
...
Early MP
Refer to when MP lowering is applied relative to LNO
Default compilation uses late MP
 MP lowering applied after...
Upcoming SlideShare
Loading in …5
×

J Openmp

1,211 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,211
On SlideShare
0
From Embeds
0
Number of Embeds
427
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

J Openmp

  1. 1. Section J OpenMP and Auto-Parallellization
  2. 2. Parallelization Two means to obtain coarse-grained parallelization in Open64: 1. OpenMP  Option to enable directives: -mp  OpenMP 2.5 in Fortran, C, & C++ 1. Autoparallelization  Option to enable: -apo  LNO detects parallel loops and insert directives Can specify both in same compile
  3. 3. Support of Coarse-grained Parallellism Parallellized code outlined to its own function Outlined function nested inside original routine Local variables in original routine accessed via static links Pointer to outlined function passed to synchronization routines in libopenmp to schedule parallel execution A copy of parallellized code left in place for serial execution
  4. 4. Parallel Runtime Execution Effected by spawning threads to execute parallel regions Default number of threads is number of CPUs libopenmp (PathScale proprietary) contains:  thread handling and synchronization routines  routines corresponding to OpenMP intrinsics Runtime behaviour controlled by env variables  E.g. specify affinity among threads and processors
  5. 5. OpenMP Compilation Process 1. Front-ends compile OpenMP directives into WHIRL OMP pragmas and region nodes 2. OpenMP Pre-lowering (be/be/omp_lower.cxx):  Run after VHO and before IPL  Perform preliminary lowering of OpenMP pragmas – Replace intrinsics by calls to OpenMP library routines 1. MP Lowering (be/com/wn_mp.cxx)  Run after LNO and before WOPT  Parallel regions outlined into separate routines  Outlined routines are nested inside original function  Contains up-level reference to parent’s locals  Insert code to decide between serial and parallel execution at runtmie  If parallel, call OpenMP runtime library to spawn threads to execute outlined parallel code  If serial, execute serial copy of code left in place  Most OpenMP pragmas deleted
  6. 6. Early MP Refer to when MP lowering is applied relative to LNO Default compilation uses late MP  MP lowering applied after LNO  LNO run in presence of MP pragmas  Presence of MP pragmas make LNO’s transformation more conservative -OPT:early_mp=on effects early MP  LNO invoked for outlined parallel routines  Absence of MP pragmas enable more aggressive optimizations  Additional calls may suppress some optimizations  Does not apply under -apo

×