Multi-Processor
computing with
OpenMP
T: 051 401 9700 coetzeesj@ufs.ac.za http://www.ufs.ac.za
Progress on Multi-Core Processors
Dual Core
â—ŹApril 16, 2005 - Intel releases Pentium Extreme Edition 840
â—ŹJune 5, 2005 - AMD releases Athlon 64 X2
Quad Core
â—ŹNovember 19, 2007 - AMD releases Phenom X4
â—ŹNovember 17, 2008 - Intel releases Core i7
Hex Core
â—ŹApril 27, 2010 - AMD releases Phenom II X6
Octa Core
â—ŹYesterday (October 12, 2011) - AMD releases FX-8150
Bulldozer
Why?
â—ŹPreviously multiple processors was only available to high
end servers
â—ŹDifficult to scale to high clock speeds with current transistor
technologies
â—ŹBetter manufacturing processes creates smaller transistors
Limitations
â—ŹDeveloping software that scales across multiple processors
is difficult to develop.
â—ŹMemory access should be governed to protect data that is
being accessed by different parts of the program at the same
point in time.
What is Parallelization?
â—Ź"Something" is parallel if there is a certain level of
independence in the order of operations
â—ŹParallelization is an optimization technique to reduce the
execution time of an application or part thereof.
Scalability
The more independent the parts of the application the more
scalable the application become. Applications that scale almost
linearly is called "embarrassingly parallel" applications.
Amdahl's Law
Assume our program has a parallel fraction "f"
This implies the execution time T(1) = f*T(1) + (1-f)*T(1)
On P processors T(P) = (f/P)*T(1) + (1-f)*T(1)
Amdahl's Law: S(P) = 1/(f/P + 1-f)
Amdahl's Law
Parallel Programming
Distributed Memory:
â—ŹSockets
â—ŹPVM - Parallel Virtual
Machine (obsolete)
â—ŹMPI - Message Passing
Interface
Shared Memory:
â—ŹPosix Threads
â—ŹOpenMP
â—ŹAutomatic Parallelization
(Compiler optimizations)
OpenMP
â—ŹDe-facto standard Application Programming Interface to write
shared memory parallel applications in C, C++, and Fortran
â—ŹConsists of:
â—‹Compiler directives
â—‹Run time routines
â—‹Environment variables
â—ŹSpecification maintained by the OpenMP Architecture
Review Board
â—ŹRelease dates:
â—‹Version 1.0 - October 1997
â—‹Version 2.0 - November 2000
â—‹Version 3.0 - May 2008
Advantages of OpenMP
â—ŹGood performance
â—ŹMature standard
â—ŹSupported by all major compilers
â—‹GNU Compiler Collection (GCC)
â—‹Intel Compiler (ICC)
â—‹Microsoft Visual C++ (2005 and up)
â—‹Portland Group Compiler
â—ŹRequires little programming effort and change to code.
â—ŹAllows the program to be parallelized incrementally.
OpenMP Execution Model
OpenMP uses fork and join model. Application runs in serial
until execution hits an area of application that can run parallel.
In the parallel region, openMP creates worker threads to
execute concurrently with master thread. At the end of the
parallel section, openMP synchronise the data of the threads,
and execution continues on master thread.
Data-sharing
â—ŹIn OpenMP data needs to be "labeled"
â—‹Shared
â– All threads can read and write the data, unless
protected through a specific OpenMP construct
â– Changes made a visible to all threads
â– Not necessarily immediately, unless forced through a
specific OpenMP construct
â—‹Private
â– Data only available to thread
â– Changes only visible to thread owning the data
OpenMP example
For-loop with independent
iterations
For-loop parallelized using
OpenMP
for (int i=0; i < n;
i++)
c[i] = a[i] + b[i];
#pragma omp parallel
for
for (int i=0; i < n;
i++)
c[i] = a[i] + b[i];
OpenMP computing Pi
Currently the more
preferred solutions for
calculation pi is
numerical integration of
Monte Carlo Approach
By using a pseudo random number
generator you can calculate pi by
determining the percentage of darts
that are inside the circle. To make
calculations simpler, we only use
the top right quadrant, and multiply
our findings by 4.
T: 051 401 9700 coetzeesj@ufs.ac.za http://www.ufs.ac.za

Multi-Processor computing with OpenMP

  • 1.
    Multi-Processor computing with OpenMP T: 051401 9700 coetzeesj@ufs.ac.za http://www.ufs.ac.za
  • 2.
    Progress on Multi-CoreProcessors Dual Core â—ŹApril 16, 2005 - Intel releases Pentium Extreme Edition 840 â—ŹJune 5, 2005 - AMD releases Athlon 64 X2 Quad Core â—ŹNovember 19, 2007 - AMD releases Phenom X4 â—ŹNovember 17, 2008 - Intel releases Core i7 Hex Core â—ŹApril 27, 2010 - AMD releases Phenom II X6 Octa Core â—ŹYesterday (October 12, 2011) - AMD releases FX-8150 Bulldozer
  • 3.
    Why? â—ŹPreviously multiple processorswas only available to high end servers â—ŹDifficult to scale to high clock speeds with current transistor technologies â—ŹBetter manufacturing processes creates smaller transistors
  • 4.
    Limitations â—ŹDeveloping software thatscales across multiple processors is difficult to develop. â—ŹMemory access should be governed to protect data that is being accessed by different parts of the program at the same point in time.
  • 5.
    What is Parallelization? â—Ź"Something"is parallel if there is a certain level of independence in the order of operations â—ŹParallelization is an optimization technique to reduce the execution time of an application or part thereof.
  • 6.
    Scalability The more independentthe parts of the application the more scalable the application become. Applications that scale almost linearly is called "embarrassingly parallel" applications. Amdahl's Law Assume our program has a parallel fraction "f" This implies the execution time T(1) = f*T(1) + (1-f)*T(1) On P processors T(P) = (f/P)*T(1) + (1-f)*T(1) Amdahl's Law: S(P) = 1/(f/P + 1-f)
  • 7.
  • 8.
    Parallel Programming Distributed Memory: â—ŹSockets â—ŹPVM- Parallel Virtual Machine (obsolete) â—ŹMPI - Message Passing Interface Shared Memory: â—ŹPosix Threads â—ŹOpenMP â—ŹAutomatic Parallelization (Compiler optimizations)
  • 9.
    OpenMP â—ŹDe-facto standard ApplicationProgramming Interface to write shared memory parallel applications in C, C++, and Fortran â—ŹConsists of: â—‹Compiler directives â—‹Run time routines â—‹Environment variables â—ŹSpecification maintained by the OpenMP Architecture Review Board â—ŹRelease dates: â—‹Version 1.0 - October 1997 â—‹Version 2.0 - November 2000 â—‹Version 3.0 - May 2008
  • 10.
    Advantages of OpenMP â—ŹGoodperformance â—ŹMature standard â—ŹSupported by all major compilers â—‹GNU Compiler Collection (GCC) â—‹Intel Compiler (ICC) â—‹Microsoft Visual C++ (2005 and up) â—‹Portland Group Compiler â—ŹRequires little programming effort and change to code. â—ŹAllows the program to be parallelized incrementally.
  • 11.
    OpenMP Execution Model OpenMPuses fork and join model. Application runs in serial until execution hits an area of application that can run parallel. In the parallel region, openMP creates worker threads to execute concurrently with master thread. At the end of the parallel section, openMP synchronise the data of the threads, and execution continues on master thread.
  • 12.
    Data-sharing â—ŹIn OpenMP dataneeds to be "labeled" â—‹Shared â– All threads can read and write the data, unless protected through a specific OpenMP construct â– Changes made a visible to all threads â– Not necessarily immediately, unless forced through a specific OpenMP construct â—‹Private â– Data only available to thread â– Changes only visible to thread owning the data
  • 13.
    OpenMP example For-loop withindependent iterations For-loop parallelized using OpenMP for (int i=0; i < n; i++) c[i] = a[i] + b[i]; #pragma omp parallel for for (int i=0; i < n; i++) c[i] = a[i] + b[i];
  • 14.
    OpenMP computing Pi Currentlythe more preferred solutions for calculation pi is numerical integration of
  • 15.
    Monte Carlo Approach Byusing a pseudo random number generator you can calculate pi by determining the percentage of darts that are inside the circle. To make calculations simpler, we only use the top right quadrant, and multiply our findings by 4.
  • 16.
    T: 051 4019700 coetzeesj@ufs.ac.za http://www.ufs.ac.za