SlideShare a Scribd company logo
1 of 34
Introduction to OpenMP


Presenter: Vengada Karthik Rangaraju

           Fall 2012 Term

       September 13th, 2012
What is openMP?

•   Open Standard for Shared Memory Multiprocessing
•   Goal: Exploit multicore hardware with shared memory
•   Programmer’s view: The openMP API
•   Structure: Three primary API components:
    – Compiler directives,
    – Runtime Library routines and
    – Environment Variables
Shared Memory Architecture in a
    Multi-Core Environment
The key components of the API and its
             functions

• Compiler Directives
   - Spawning parallel regions (threads)
   - Synchronizing
   - Dividing blocks of code among threads
   - Distributing loop iterations
The key components of the API and its
             functions

• Runtime Library Routines
   - Setting & querying no. of threads
   - Nested parallelism
   - Control over locks
   - Thread information
The key components of the API and its
             functions

• Environment Variables
   - Setting no. of threads
   - Specifying how loop iterations are divided
   - Thread processor binding
   - Enabling/Disabling dynamic threads
   - Nested parallelism
Goals
• Standardization
• Ease of Use
• Portability
Paradigm for using openMP
          Write sequential
              program


         Find parallelizable
        portions of program

                                       Insert calls to
               Insert                 runtime library
        directives/pragmas     +   routines and modify
         into existing code            environment
                                    variables, if desired

          Use openMP’s
        extended Compiler
                                      What happens
                                         here?

        Compile and run !
Compiler translation


#pragma omp <directive-type> <directive-clauses></n>
{
……
…..// Block of code executed as per instruction !
}
Basic Example in C
{
… //Sequential
}
 #pragma omp parallel //fork
{
printf(“Hello from thread
   %d.n”,omp_get_thread_num());
} //join
{
… //Sequential
}
What exactly happens when lines of
    code are executed in parallel?


• A team of threads are created
• Each thread can have its own set of private
  variables
• All threads can have shared variables
• Original thread : Master Thread
• Fork-Join Model
• Nested Parallelism
openMP LifeCycle – Petrinet model
Compiler directives – The Multi Core
           Magic Spells !
  <directive type>   Description
  parallel           Each thread will perform
                     same computation as
                     others(replicated
                     computations)
  for / sections     These are called workshare
                     directives. Portions of
                     overall work divided among
                     threads(different
                     computations). They don’t
                     create threads. It has to be
                     enclosed inside a parallel
                     directive for threads to
                     takeover the divided work.
Compiler directives – The Multi Core
             Magic Spells !

• Types of workshare directives

   for                      Countable iteration[static]

   sections                 One or more sequential
                            sections of code, executed
                            by a single thread

   single                   Serializes a section of code
Compiler directives – The Multi Core
             Magic Spells !
• Clauses associated with each directive


    <directive type>       <directive clause>
    parallel               If(expression)
                           private(var1,var2,…)
                           firstprivate(var1,var2,..)
                           lastprivate(var1,var2,..)
                           shared(var1,var2,..)
                           NUM_THREADS(integer value)
Compiler directives – The Multi Core
             Magic Spells !
• Clauses associated with each directive

   <directive type>       <directive clause>
   for                    schedule(type, chunk)
                          private(var1,var2,…)
                          firstprivate(var1,var2,..)
                          lastprivate(var1,var2,..)
                          shared(var1,var2,..)
                          collapse(n)
                          nowait
                          Reduction(operator:list)
Compiler directives – The Multi Core
             Magic Spells !
• Clauses associated with each directive



   <directive type>       <directive clause>
   sections               private(var1,var2,…)
                          firstprivate(var1,var2,..)
                          lastprivate(var1,var2,..)
                          reduction(operator:list)
                          nowait
Matrix Multiplication using loop
                directive
 #pragma omp parallel private(i,j,k)
{
  #pragma omp for
  for(i=0;i<N;i++)
      for(k=0;k<K;k++)
            for(j=0;j<M;j++)
                  C[i][j]=C[i][j]+A[i][k]*B[k][j];
}
Scheduling Parallel Loops
•   Static
•   Dynamic
•   Guided
•   Automatic
•   Runtime
Scheduling Parallel Loops
• Static - Amount of work/iteration - same
         - Set of contiguous chunks in RR fashion
         - 1 Chunk = x iterations
Scheduling Parallel Loops
• Dynamic - Amount of work/iteration - Varies
           - Each thread will grab chunk of
             iterations and return to grab another
             chunk when it has executed them.
• Guided - Same as dynamic, only difference,
         - a good proportion of iterations
            remaining are shared among each
            thread.
Scheduling Parallel Loops
• Runtime - Schedule determined using an
            environment variable. Library
            routine provided !
• Automatic - Implementation chooses any
               schedule
Matrix Multiplication using loop
      directive – with a schedule
 #pragma omp parallel private(i,j,k)
{
  #pragma omp for schedule(static)
  for(i=0;i<N;i++)
      for(k=0;k<K;k++)
            for(j=0;j<M;j++)
                  C[i][j]=C[i][j]+A[i][k]*B[k][j];
}
openMP worshare directive – sections
 int g;
 void foo(int m, int n)
{
      int p,i;
        #pragma omp sections firstprivate(g) nowait
        {
            #pragma omp section
            {
               p=f1(g);
               for(i=0;i<m;i++)
               do_stuff;
            }
            #pragma omp section
            {
               p=f2(g);
               for(i=0;i<n;i++)
               do_other_stuff;
            }
        }
return;
}
Parallelizing when the no.of Iterations
        is unknown[dynamic] !


• openMP has a directive called task
Explicit Tasks
 void processList(Node* list)
{
    #pragma omp parallel
    pragma omp single
    {
       Node *currentNode = list;
       while(currentNode)
        {
           #pragma omp task firstprivate(currentNode)
           doWork(currentNode);
          currentNode=currentNode->next;
        }
     }
}
Explicit Tasks – Petrinet Model
Synchronization
•   Barrier
•   Critical
•   Atomic
•   Flush
Performing Reductions
• A loop containing reduction will always be
  sequential, since each iteration would form a
  result depending on previous iteration.
• openMP allows these loops to be parallelized
  as long as the developer says, loop contains
  reduction and indicates the variable and kind
  of reduction via “Clauses”
Without using reduction
#pragma omp parallel shared(array,sum)
firstprivate(local_sum)
{
    #pragma omp for private(i,j)
    for(i=0;i<max_i;i++)
    {
          for(j=0;j<max_j;++j)
          local_sum+=array[i][j];
    }
}
#pragma omp critical
sum+=local_sum;
}
Using Reductions in openMP
sum=0;
#pragma omp parallel shared(array)
{
  #pragma omp for reduction(+:sum) private(i,j)
  for(i=0;i<max_i;i++)
  {
       for(j=0;j<max_j;++j)
       sum+=array[i][j];
  }
}
Programming for performance
• Use of IF clause before creating parallel
  regions
• Understanding Cache Coherence
• Judicious use of parallel and flush
• Critical and atomic - know the difference !
• Avoid unnecessary computations in critical
  region
• Use of barrier - a starvation alert !
References
• NUMA UMA

   http://vvirtual.wordpress.com/2011/06/13/what-is-numa/

   http://www.e-zest.net/blog/non-uniform-memory-architecture-numa/

• openMP basics

   https://computing.llnl.gov/tutorials/openMP/

• Workshop on openMP SMP, by Tim Mattson from Intel (video)

  http://www.youtube.com/watch?v=TzERa9GA6vY
Interesting links

• openMP official page

   http://openmp.org/wp/

• 32 openMP Traps for C++ Developers

   http://www.viva64.com/en/a/0054/#ID0EMULM

More Related Content

What's hot

Virtual memory
Virtual memoryVirtual memory
Virtual memory
Anuj Modi
 

What's hot (20)

Parallel and distributed Computing
Parallel and distributed Computing Parallel and distributed Computing
Parallel and distributed Computing
 
Distributed shred memory architecture
Distributed shred memory architectureDistributed shred memory architecture
Distributed shred memory architecture
 
Asymptotic analysis of parallel programs
Asymptotic analysis of parallel programsAsymptotic analysis of parallel programs
Asymptotic analysis of parallel programs
 
Communication costs in parallel machines
Communication costs in parallel machinesCommunication costs in parallel machines
Communication costs in parallel machines
 
Parallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptx
Parallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptxParallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptx
Parallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptx
 
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
 
Distributed Computing ppt
Distributed Computing pptDistributed Computing ppt
Distributed Computing ppt
 
Lamport’s algorithm for mutual exclusion
Lamport’s algorithm for mutual exclusionLamport’s algorithm for mutual exclusion
Lamport’s algorithm for mutual exclusion
 
Data decomposition techniques
Data decomposition techniquesData decomposition techniques
Data decomposition techniques
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computing
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updated
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Synchronization Pradeep K Sinha
Synchronization Pradeep K SinhaSynchronization Pradeep K Sinha
Synchronization Pradeep K Sinha
 
6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
 
Mainframe systems
Mainframe systemsMainframe systems
Mainframe systems
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
advanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelismadvanced computer architesture-conditions of parallelism
advanced computer architesture-conditions of parallelism
 

Viewers also liked

Biref Introduction to OpenMP
Biref Introduction to OpenMPBiref Introduction to OpenMP
Biref Introduction to OpenMP
JerryHe
 

Viewers also liked (14)

Intro to OpenMP
Intro to OpenMPIntro to OpenMP
Intro to OpenMP
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
 
OpenMp
OpenMpOpenMp
OpenMp
 
OpenMP
OpenMPOpenMP
OpenMP
 
Open mp intro_01
Open mp intro_01Open mp intro_01
Open mp intro_01
 
Open mp
Open mpOpen mp
Open mp
 
Openmp combined
Openmp combinedOpenmp combined
Openmp combined
 
Wolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat DresdenWolfgang Lehner Technische Universitat Dresden
Wolfgang Lehner Technische Universitat Dresden
 
Biref Introduction to OpenMP
Biref Introduction to OpenMPBiref Introduction to OpenMP
Biref Introduction to OpenMP
 
Openmp
OpenmpOpenmp
Openmp
 
Parallel-kmeans
Parallel-kmeansParallel-kmeans
Parallel-kmeans
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
OpenMP
OpenMPOpenMP
OpenMP
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 

Similar to Presentation on Shared Memory Parallel Programming

Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
Anshul Sharma
 
CUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU ComputingCUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU Computing
Jeff Larkin
 

Similar to Presentation on Shared Memory Parallel Programming (20)

Lecture7
Lecture7Lecture7
Lecture7
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
MPI n OpenMP
MPI n OpenMPMPI n OpenMP
MPI n OpenMP
 
Open MP cheet sheet
Open MP cheet sheetOpen MP cheet sheet
Open MP cheet sheet
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)
 
Lecture8
Lecture8Lecture8
Lecture8
 
Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Lecture6
Lecture6Lecture6
Lecture6
 
Nbvtalkataitamimageprocessingconf
NbvtalkataitamimageprocessingconfNbvtalkataitamimageprocessingconf
Nbvtalkataitamimageprocessingconf
 
Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5
 
openmp.ppt
openmp.pptopenmp.ppt
openmp.ppt
 
openmp.ppt
openmp.pptopenmp.ppt
openmp.ppt
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
 
Parllelizaion
ParllelizaionParllelizaion
Parllelizaion
 
(3) cpp procedural programming
(3) cpp procedural programming(3) cpp procedural programming
(3) cpp procedural programming
 
CUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU ComputingCUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU Computing
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkR
 
Cc module 3.pptx
Cc module 3.pptxCc module 3.pptx
Cc module 3.pptx
 

Recently uploaded

Contoh Aksi Nyata Refleksi Diri ( NUR ).pdf
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdfContoh Aksi Nyata Refleksi Diri ( NUR ).pdf
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdf
cupulin
 
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MysoreMuleSoftMeetup
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
Peter Brusilovsky
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
httgc7rh9c
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
AnaAcapella
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
EADTU
 

Recently uploaded (20)

OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17
 
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdf
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdfPUBLIC FINANCE AND TAXATION COURSE-1-4.pdf
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdf
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
dusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningdusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learning
 
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdf
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdfContoh Aksi Nyata Refleksi Diri ( NUR ).pdf
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdf
 
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
 
How to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptxHow to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptx
 
Pharmaceutical Biotechnology VI semester.pdf
Pharmaceutical Biotechnology VI semester.pdfPharmaceutical Biotechnology VI semester.pdf
Pharmaceutical Biotechnology VI semester.pdf
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
Ernest Hemingway's For Whom the Bell Tolls
Ernest Hemingway's For Whom the Bell TollsErnest Hemingway's For Whom the Bell Tolls
Ernest Hemingway's For Whom the Bell Tolls
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111
 

Presentation on Shared Memory Parallel Programming

  • 1. Introduction to OpenMP Presenter: Vengada Karthik Rangaraju Fall 2012 Term September 13th, 2012
  • 2. What is openMP? • Open Standard for Shared Memory Multiprocessing • Goal: Exploit multicore hardware with shared memory • Programmer’s view: The openMP API • Structure: Three primary API components: – Compiler directives, – Runtime Library routines and – Environment Variables
  • 3. Shared Memory Architecture in a Multi-Core Environment
  • 4. The key components of the API and its functions • Compiler Directives - Spawning parallel regions (threads) - Synchronizing - Dividing blocks of code among threads - Distributing loop iterations
  • 5. The key components of the API and its functions • Runtime Library Routines - Setting & querying no. of threads - Nested parallelism - Control over locks - Thread information
  • 6. The key components of the API and its functions • Environment Variables - Setting no. of threads - Specifying how loop iterations are divided - Thread processor binding - Enabling/Disabling dynamic threads - Nested parallelism
  • 7. Goals • Standardization • Ease of Use • Portability
  • 8. Paradigm for using openMP Write sequential program Find parallelizable portions of program Insert calls to Insert runtime library directives/pragmas + routines and modify into existing code environment variables, if desired Use openMP’s extended Compiler What happens here? Compile and run !
  • 9. Compiler translation #pragma omp <directive-type> <directive-clauses></n> { …… …..// Block of code executed as per instruction ! }
  • 10. Basic Example in C { … //Sequential } #pragma omp parallel //fork { printf(“Hello from thread %d.n”,omp_get_thread_num()); } //join { … //Sequential }
  • 11. What exactly happens when lines of code are executed in parallel? • A team of threads are created • Each thread can have its own set of private variables • All threads can have shared variables • Original thread : Master Thread • Fork-Join Model • Nested Parallelism
  • 12. openMP LifeCycle – Petrinet model
  • 13. Compiler directives – The Multi Core Magic Spells ! <directive type> Description parallel Each thread will perform same computation as others(replicated computations) for / sections These are called workshare directives. Portions of overall work divided among threads(different computations). They don’t create threads. It has to be enclosed inside a parallel directive for threads to takeover the divided work.
  • 14. Compiler directives – The Multi Core Magic Spells ! • Types of workshare directives for Countable iteration[static] sections One or more sequential sections of code, executed by a single thread single Serializes a section of code
  • 15. Compiler directives – The Multi Core Magic Spells ! • Clauses associated with each directive <directive type> <directive clause> parallel If(expression) private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) shared(var1,var2,..) NUM_THREADS(integer value)
  • 16. Compiler directives – The Multi Core Magic Spells ! • Clauses associated with each directive <directive type> <directive clause> for schedule(type, chunk) private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) shared(var1,var2,..) collapse(n) nowait Reduction(operator:list)
  • 17. Compiler directives – The Multi Core Magic Spells ! • Clauses associated with each directive <directive type> <directive clause> sections private(var1,var2,…) firstprivate(var1,var2,..) lastprivate(var1,var2,..) reduction(operator:list) nowait
  • 18. Matrix Multiplication using loop directive #pragma omp parallel private(i,j,k) { #pragma omp for for(i=0;i<N;i++) for(k=0;k<K;k++) for(j=0;j<M;j++) C[i][j]=C[i][j]+A[i][k]*B[k][j]; }
  • 19. Scheduling Parallel Loops • Static • Dynamic • Guided • Automatic • Runtime
  • 20. Scheduling Parallel Loops • Static - Amount of work/iteration - same - Set of contiguous chunks in RR fashion - 1 Chunk = x iterations
  • 21. Scheduling Parallel Loops • Dynamic - Amount of work/iteration - Varies - Each thread will grab chunk of iterations and return to grab another chunk when it has executed them. • Guided - Same as dynamic, only difference, - a good proportion of iterations remaining are shared among each thread.
  • 22. Scheduling Parallel Loops • Runtime - Schedule determined using an environment variable. Library routine provided ! • Automatic - Implementation chooses any schedule
  • 23. Matrix Multiplication using loop directive – with a schedule #pragma omp parallel private(i,j,k) { #pragma omp for schedule(static) for(i=0;i<N;i++) for(k=0;k<K;k++) for(j=0;j<M;j++) C[i][j]=C[i][j]+A[i][k]*B[k][j]; }
  • 24. openMP worshare directive – sections int g; void foo(int m, int n) { int p,i; #pragma omp sections firstprivate(g) nowait { #pragma omp section { p=f1(g); for(i=0;i<m;i++) do_stuff; } #pragma omp section { p=f2(g); for(i=0;i<n;i++) do_other_stuff; } } return; }
  • 25. Parallelizing when the no.of Iterations is unknown[dynamic] ! • openMP has a directive called task
  • 26. Explicit Tasks void processList(Node* list) { #pragma omp parallel pragma omp single { Node *currentNode = list; while(currentNode) { #pragma omp task firstprivate(currentNode) doWork(currentNode); currentNode=currentNode->next; } } }
  • 27. Explicit Tasks – Petrinet Model
  • 28. Synchronization • Barrier • Critical • Atomic • Flush
  • 29. Performing Reductions • A loop containing reduction will always be sequential, since each iteration would form a result depending on previous iteration. • openMP allows these loops to be parallelized as long as the developer says, loop contains reduction and indicates the variable and kind of reduction via “Clauses”
  • 30. Without using reduction #pragma omp parallel shared(array,sum) firstprivate(local_sum) { #pragma omp for private(i,j) for(i=0;i<max_i;i++) { for(j=0;j<max_j;++j) local_sum+=array[i][j]; } } #pragma omp critical sum+=local_sum; }
  • 31. Using Reductions in openMP sum=0; #pragma omp parallel shared(array) { #pragma omp for reduction(+:sum) private(i,j) for(i=0;i<max_i;i++) { for(j=0;j<max_j;++j) sum+=array[i][j]; } }
  • 32. Programming for performance • Use of IF clause before creating parallel regions • Understanding Cache Coherence • Judicious use of parallel and flush • Critical and atomic - know the difference ! • Avoid unnecessary computations in critical region • Use of barrier - a starvation alert !
  • 33. References • NUMA UMA http://vvirtual.wordpress.com/2011/06/13/what-is-numa/ http://www.e-zest.net/blog/non-uniform-memory-architecture-numa/ • openMP basics https://computing.llnl.gov/tutorials/openMP/ • Workshop on openMP SMP, by Tim Mattson from Intel (video) http://www.youtube.com/watch?v=TzERa9GA6vY
  • 34. Interesting links • openMP official page http://openmp.org/wp/ • 32 openMP Traps for C++ Developers http://www.viva64.com/en/a/0054/#ID0EMULM