Introduction to OpenMPPresenter: Vengada Karthik Rangaraju           Fall 2012 Term       September 13th, 2012
What is openMP?•   Open Standard for Shared Memory Multiprocessing•   Goal: Exploit multicore hardware with shared memory•...
Shared Memory Architecture in a    Multi-Core Environment
The key components of the API and its             functions• Compiler Directives   - Spawning parallel regions (threads)  ...
The key components of the API and its             functions• Runtime Library Routines   - Setting & querying no. of thread...
The key components of the API and its             functions• Environment Variables   - Setting no. of threads   - Specifyi...
Goals• Standardization• Ease of Use• Portability
Paradigm for using openMP          Write sequential              program         Find parallelizable        portions of pr...
Compiler translation#pragma omp <directive-type> <directive-clauses></n>{………..// Block of code executed as per instruction...
Basic Example in C{… //Sequential} #pragma omp parallel //fork{printf(“Hello from thread   %d.n”,omp_get_thread_num());} /...
What exactly happens when lines of    code are executed in parallel?• A team of threads are created• Each thread can have ...
openMP LifeCycle – Petrinet model
Compiler directives – The Multi Core           Magic Spells !  <directive type>   Description  parallel           Each thr...
Compiler directives – The Multi Core             Magic Spells !• Types of workshare directives   for                      ...
Compiler directives – The Multi Core             Magic Spells !• Clauses associated with each directive    <directive type...
Compiler directives – The Multi Core             Magic Spells !• Clauses associated with each directive   <directive type>...
Compiler directives – The Multi Core             Magic Spells !• Clauses associated with each directive   <directive type>...
Matrix Multiplication using loop                directive #pragma omp parallel private(i,j,k){  #pragma omp for  for(i=0;i...
Scheduling Parallel Loops•   Static•   Dynamic•   Guided•   Automatic•   Runtime
Scheduling Parallel Loops• Static - Amount of work/iteration - same         - Set of contiguous chunks in RR fashion      ...
Scheduling Parallel Loops• Dynamic - Amount of work/iteration - Varies           - Each thread will grab chunk of         ...
Scheduling Parallel Loops• Runtime - Schedule determined using an            environment variable. Library            rout...
Matrix Multiplication using loop      directive – with a schedule #pragma omp parallel private(i,j,k){  #pragma omp for sc...
openMP worshare directive – sections int g; void foo(int m, int n){      int p,i;        #pragma omp sections firstprivate...
Parallelizing when the no.of Iterations        is unknown[dynamic] !• openMP has a directive called task
Explicit Tasks void processList(Node* list){    #pragma omp parallel    pragma omp single    {       Node *currentNode = l...
Explicit Tasks – Petrinet Model
Synchronization•   Barrier•   Critical•   Atomic•   Flush
Performing Reductions• A loop containing reduction will always be  sequential, since each iteration would form a  result d...
Without using reduction#pragma omp parallel shared(array,sum)firstprivate(local_sum){    #pragma omp for private(i,j)    f...
Using Reductions in openMPsum=0;#pragma omp parallel shared(array){  #pragma omp for reduction(+:sum) private(i,j)  for(i=...
Programming for performance• Use of IF clause before creating parallel  regions• Understanding Cache Coherence• Judicious ...
References• NUMA UMA   http://vvirtual.wordpress.com/2011/06/13/what-is-numa/   http://www.e-zest.net/blog/non-uniform-mem...
Interesting links• openMP official page   http://openmp.org/wp/• 32 openMP Traps for C++ Developers   http://www.viva64.co...
Upcoming SlideShare
Loading in...5
×

Presentation on Shared Memory Parallel Programming

721

Published on

This presentation deals with how one can utilize multiple cores, while working with C/C++ applications using an API called OpenMP. It's a shared memory programming model, built on top of POSIX thread. Also the fork-join model, parallel design pattern are discussed using PetriNets.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
721
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Presentation on Shared Memory Parallel Programming

  1. 1. Introduction to OpenMPPresenter: Vengada Karthik Rangaraju Fall 2012 Term September 13th, 2012
  2. 2. What is openMP?• Open Standard for Shared Memory Multiprocessing• Goal: Exploit multicore hardware with shared memory• Programmer’s view: The openMP API• Structure: Three primary API components: – Compiler directives, – Runtime Library routines and – Environment Variables
  3. 3. Shared Memory Architecture in a Multi-Core Environment
  4. 4. The key components of the API and its functions• Compiler Directives - Spawning parallel regions (threads) - Synchronizing - Dividing blocks of code among threads - Distributing loop iterations
  5. 5. The key components of the API and its functions• Runtime Library Routines - Setting & querying no. of threads - Nested parallelism - Control over locks - Thread information
  6. 6. The key components of the API and its functions• Environment Variables - Setting no. of threads - Specifying how loop iterations are divided - Thread processor binding - Enabling/Disabling dynamic threads - Nested parallelism
  7. 7. Goals• Standardization• Ease of Use• Portability
  8. 8. Paradigm for using openMP Write sequential program Find parallelizable portions of program Insert calls to Insert runtime library directives/pragmas + routines and modify into existing code environment variables, if desired Use openMP’s extended Compiler What happens here? Compile and run !
  9. 9. Compiler translation#pragma omp <directive-type> <directive-clauses></n>{………..// Block of code executed as per instruction !}
  10. 10. Basic Example in C{… //Sequential} #pragma omp parallel //fork{printf(“Hello from thread %d.n”,omp_get_thread_num());} //join{… //Sequential}
  11. 11. What exactly happens when lines of code are executed in parallel?• A team of threads are created• Each thread can have its own set of private variables• All threads can have shared variables• Original thread : Master Thread• Fork-Join Model• Nested Parallelism
  12. 12. openMP LifeCycle – Petrinet model
  13. 13. Compiler directives – The Multi Core Magic Spells ! <directive type> Description parallel Each thread will perform same computation as others(replicated computations) for / sections These are called workshare directives. Portions of overall work divided among threads(different computations). They don’t create threads. It has to be enclosed inside a parallel directive for threads to takeover the divided work.
  14. 14. Compiler directives – The Multi Core Magic Spells !• Types of workshare directives for Countable iteration[static] sections One or more sequential sections of code, executed by a single thread single Serializes a section of code
  15. 15. Compiler directives – The Multi Core Magic Spells !• Clauses associated with each directive <directive type> <directive clause> parallel If(expression) private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) shared(var1,var2,..) NUM_THREADS(integer value)
  16. 16. Compiler directives – The Multi Core Magic Spells !• Clauses associated with each directive <directive type> <directive clause> for schedule(type, chunk) private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) shared(var1,var2,..) collapse(n) nowait Reduction(operator:list)
  17. 17. Compiler directives – The Multi Core Magic Spells !• Clauses associated with each directive <directive type> <directive clause> sections private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) reduction(operator:list) nowait
  18. 18. Matrix Multiplication using loop directive #pragma omp parallel private(i,j,k){ #pragma omp for for(i=0;i<N;i++) for(k=0;k<K;k++) for(j=0;j<M;j++) C[i][j]=C[i][j]+A[i][k]*B[k][j];}
  19. 19. Scheduling Parallel Loops• Static• Dynamic• Guided• Automatic• Runtime
  20. 20. Scheduling Parallel Loops• Static - Amount of work/iteration - same - Set of contiguous chunks in RR fashion - 1 Chunk = x iterations
  21. 21. Scheduling Parallel Loops• Dynamic - Amount of work/iteration - Varies - Each thread will grab chunk of iterations and return to grab another chunk when it has executed them.• Guided - Same as dynamic, only difference, - a good proportion of iterations remaining are shared among each thread.
  22. 22. Scheduling Parallel Loops• Runtime - Schedule determined using an environment variable. Library routine provided !• Automatic - Implementation chooses any schedule
  23. 23. Matrix Multiplication using loop directive – with a schedule #pragma omp parallel private(i,j,k){ #pragma omp for schedule(static) for(i=0;i<N;i++) for(k=0;k<K;k++) for(j=0;j<M;j++) C[i][j]=C[i][j]+A[i][k]*B[k][j];}
  24. 24. openMP worshare directive – sections int g; void foo(int m, int n){ int p,i; #pragma omp sections firstprivate(g) nowait { #pragma omp section { p=f1(g); for(i=0;i<m;i++) do_stuff; } #pragma omp section { p=f2(g); for(i=0;i<n;i++) do_other_stuff; } }return;}
  25. 25. Parallelizing when the no.of Iterations is unknown[dynamic] !• openMP has a directive called task
  26. 26. Explicit Tasks void processList(Node* list){ #pragma omp parallel pragma omp single { Node *currentNode = list; while(currentNode) { #pragma omp task firstprivate(currentNode) doWork(currentNode); currentNode=currentNode->next; } }}
  27. 27. Explicit Tasks – Petrinet Model
  28. 28. Synchronization• Barrier• Critical• Atomic• Flush
  29. 29. Performing Reductions• A loop containing reduction will always be sequential, since each iteration would form a result depending on previous iteration.• openMP allows these loops to be parallelized as long as the developer says, loop contains reduction and indicates the variable and kind of reduction via “Clauses”
  30. 30. Without using reduction#pragma omp parallel shared(array,sum)firstprivate(local_sum){ #pragma omp for private(i,j) for(i=0;i<max_i;i++) { for(j=0;j<max_j;++j) local_sum+=array[i][j]; }}#pragma omp criticalsum+=local_sum;}
  31. 31. Using Reductions in openMPsum=0;#pragma omp parallel shared(array){ #pragma omp for reduction(+:sum) private(i,j) for(i=0;i<max_i;i++) { for(j=0;j<max_j;++j) sum+=array[i][j]; }}
  32. 32. Programming for performance• Use of IF clause before creating parallel regions• Understanding Cache Coherence• Judicious use of parallel and flush• Critical and atomic - know the difference !• Avoid unnecessary computations in critical region• Use of barrier - a starvation alert !
  33. 33. References• NUMA UMA http://vvirtual.wordpress.com/2011/06/13/what-is-numa/ http://www.e-zest.net/blog/non-uniform-memory-architecture-numa/• openMP basics https://computing.llnl.gov/tutorials/openMP/• Workshop on openMP SMP, by Tim Mattson from Intel (video) http://www.youtube.com/watch?v=TzERa9GA6vY
  34. 34. Interesting links• openMP official page http://openmp.org/wp/• 32 openMP Traps for C++ Developers http://www.viva64.com/en/a/0054/#ID0EMULM
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×