PROGRAMMING USING MPI
AND OPENMP
Topics Covered
▶ MPI
▶ MPI Principles
▶ Building blocks
▶ The Message Passing Interface(MPI)
▶Overlapping Communication and
Computation
▶Collective Communication
Operations
▶ Composite Synchronization
Constructs
▶ Pros and Cons of MPI
▶ OpenMP
▶ Threading
▶Parallel Programming
Model
▶ Combining MPI and
OpenMP
▶Shared Memory
Programming
▶ Pros and Cons of OpenMP
What is MPI???
▶ Message Passing Interface (MPI) is a language-independent
communications protocol used to program parallel computers. Both
point-to-point and collective communication are supported.
▶ MPI "is a message-passing application programmer interface, together
with protocol and semantic specifications for how its features must
behave in any implementation." So, MPI is a specification, not an
implementation.
▶ MPI's goals are high performance, scalability, and portability.
MPI Principles
memory
while it
as many
▶ MPI-1 model has no shared memory concept.
▶ MPI-2 has only a limited distributed shared
concept.
▶ MPI-3 includes new Fortran 2008 bindings,
removes deprecated C++ bindings as well
deprecated routines and MPI objects.
MPI Building Blocks
▶ Since interactions are accomplished by sending and receiving messages,
the basic operations in the message-passing programming paradigm are
SEND and RECEIVE.
▶ In their simplest form, the prototypes of these operations are defined as
follows:
▶ send(void *sendbuf, int nelems, int dest)
▶ receive(void *recvbuf, int nelems, int source)
▶ The sendbuf points to a buffer that stores the data to be sent, recvbuf
points to a buffer that stores the data to be received, nelems is the
number of data units to be sent and received, dest is the identifier of the
process that receives the data, and source is the identifier of the process
that sends the data.
MPI: the Message Passing Interface
▶ MPI defines a standard library for message-passing that can be used to
develop portable message-passing programs using either C or Fortran.
▶ The MPI standard defines both the syntax as well as the semantics of a
core set of library routines that are very useful in writing message-
passing programs.
▶ The MPI library contains over 125 routines.
▶ These routines are used to initialize and terminate the MPI library, to
get information about the parallel computing environment, and to send
and receive messages.
Pros and Cons of MPI
▶ Pros
▶ Does not require shared memory architectures which are more expensive
than distributed memory architectures
▶ Can be used on a wider range of problems since it exploits both task
parallelism and data parallelism
▶ Can run on both shared memory and distributed memory architectures
▶ Highly portable with specific optimization for the implementation on most
hardware
▶ Cons
▶ Requires more programming changes to go from serial to parallel version
▶ Can be harder to debug
What is OpenMP???
Multi-Processing) is an application programming
▶ OpenMP
interface
(Open
(API) that supports multi-platform shared memory
multiprocessing programming in C, C++, and Fortran, on most
platforms, processor architectures and operating systems, including
Solaris, AIX, HP-UX, Linux, MacOS, and Windows.
▶ OpenMP uses a portable, scalable model that gives programmers a
simple and flexible interface for developing parallel applications for
platforms ranging from the standard desktop computer to the
supercomputer.
What is OpenMP???
▶ OpenMP is basically an add on in compiler. It is available in GCC (gnu
compiler) , Intel compiler and with other compilers.
▶ OpenMP target shared memory systems i.e. where processor shared the
main memory.
▶OpenMP is based on thread approach . It launches a single process which in
turn can create n number of thread as desired. It is based on what is called
"fork and join method" i.e. depending on particular task it can launch
desired number of thread as directed by user.
Threading
▶ A thread is a single stream of control in the flow of a program.
▶ Static Threads
▶ All work is allocated and assigned at runtime
▶ Dynamic Threads
▶ Consists of one Master and a pool of threads
▶ The pool is assigned some of the work at runtime, but not all of it
▶ When a thread from the pool becomes idle, the Master gives it a new
assignment
▶ “Round-robin assignments”
Parallel Programing Model
▶ OpenMP uses the fork-join model of parallel execution.
▶ All OpenMP programs begin with a single master thread.
▶ The master thread executes sequentially until a parallel region is
encountered, when it creates a team of parallel threads (FORK).
▶ When the team threads complete the parallel region, they synchronize and
terminate, leaving only the master thread that executes sequentially
(JOIN).
Variables
▶ 2 types of Variables
▶ Private
▶ Shared
▶ Private Variables
▶ Variables in a thread’s private space can only be accessed by the thread
▶ Private variable has a different address in the execution context of every
thread.
▶ Clause : private «variable list»
▶ Shared Variables
▶ Variables in the global data space are accessed by all parallel threads.
▶Shared-variable has the same address in the execution context of every
thread. All threads have access to shared variables.
Variables
▶ A thread can access its own private variables, but cannot access the
private variable of another thread.
▶ In parallel for pragma, variables are by default shared, except
loop index variable which is private.
OpenMP Functions
▶ omp_get_num_procs ()
▶ Returns the number of CPUs in the multiprocessor on which this thread is
executing,
▶ The integer returned by this function may be less than the total number of
physical processors in the multiprocessor, depending on how the run-time
system gives processes access to processors.
▶ e.g. int t= omp_get_num_procs();
▶ omp_get_num_threads()
▶ Returns the number of threads active in the current parallel region
▶ t=omp_get_num_threads();
OpenMP Functions Contd.
▶ omp_set_num_threads()
▶ Allows to set the number of threads executing the parallel sections of code
▶ Setting the number of threads equal to the number of available CPUs
▶ e.g. omp_set_num_threads(t);
▶ omp_get _thread_num()
▶ Returns the thread identification number, from 0 to n-1 where n are
number of active threads.
▶ tid = omp_get_thread_num();
Compiling and running OpenMP
$gcc -o hello_omp hello_omp.c –fopenmp
Compiling
$./hello_omp
Running
Shared Memory Programing
▶ The underlying hardware is assumed to be a collection
of processors, each with access to the same shared
memory.
▶ Because they have access 10 the same memory
locations, processors can interact and synchronize with
each other through shared variables.
▶ The standard view of parallelism in a shared memory program is
fork/join parallelism.
▶ When the program begins execution, only a single thread, called the
master thread, is active.
▶ The master thread executes the sequential portions of the algorithm. At
those points where parallel operations arc required, the master thread
forks (creates or awakens) additional threads.
▶ The master thread and the created threads work concurrently through
the parallel section, At the end of the parallel code the created threads
die or are suspended, and the flow of control returns to the single
master thread. This is called a join.
Shared Memory Programing
▶ The shared-memory model is
characterized by forkjoin parallelism,
in which parallelism comes and goes.
▶ At the beginning of execution only a
single thread, called the master thread,
is active.
▶ The master thread executes the serial
portions 0f the program. It forks
additional threads to help it execute
parallel portions of the program.
▶ These threads are deactivated when
serial execution resumes.
Shared Memory Programing
▶ A key difference, then, between the shared-memory model and the
message passing model is that in the message-passing model all
processes typically remain active throughout the execution of the
program, whereas in the shared-memory model the number of
active threads is one at the program's start and finish and may
change dynamically throughout the execution of the program.
▶ Parallel shared-memory programs range from those with only a
single fork/join around a single loop to those in which most of the
code segments are executed in parallel. Hence the shared-memory
model supports incremental paI1lllelization, the process of
transforming a sequential program into a parallel program one
block of code at a line.
Shared Memory Programing
Pros and Cons of OpenMP
▶ Pros
▶ Considered by some to be easier to program and debug (compared to
MPI)
▶ Data layout and decomposition is handled automatically by directives.
▶ Allows incremental parallelism: directives can be added incrementally,
so the program can be parallelized one portion after another and thus
no dramatic change to code is needed.
▶ Unified code for both serial and parallel applications: OpenMP
constructs are treated as comments when sequential compilers are
used.
▶ Original (serial) code statements need not, in general, be modified when
parallelized with OpenMP. This reduces the chance of inadvertently
introducing bugs and helps maintenance as well.
▶ Both coarse-grained and fine-grained parallelism are possible
Pros and Cons of OpenMP
▶ Cons
▶ Currently only runs efficiently in shared-memory multiprocessor
platforms
▶ Requires a compiler that supports OpenMP.
▶ Scalability is limited by memory architecture.
▶ Reliable error handling is missing.
▶ Lacks fine-grained mechanisms to control thread-processor
mapping.
▶ Synchronization between subsets of threads is not allowed.
▶ Mostly used for loop parallelization
▶ Can be difficult to debug, due to implicit communication between
threads via shared variables.

openmp final2.pptx

  • 1.
  • 2.
    Topics Covered ▶ MPI ▶MPI Principles ▶ Building blocks ▶ The Message Passing Interface(MPI) ▶Overlapping Communication and Computation ▶Collective Communication Operations ▶ Composite Synchronization Constructs ▶ Pros and Cons of MPI ▶ OpenMP ▶ Threading ▶Parallel Programming Model ▶ Combining MPI and OpenMP ▶Shared Memory Programming ▶ Pros and Cons of OpenMP
  • 3.
    What is MPI??? ▶Message Passing Interface (MPI) is a language-independent communications protocol used to program parallel computers. Both point-to-point and collective communication are supported. ▶ MPI "is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation." So, MPI is a specification, not an implementation. ▶ MPI's goals are high performance, scalability, and portability.
  • 4.
    MPI Principles memory while it asmany ▶ MPI-1 model has no shared memory concept. ▶ MPI-2 has only a limited distributed shared concept. ▶ MPI-3 includes new Fortran 2008 bindings, removes deprecated C++ bindings as well deprecated routines and MPI objects.
  • 5.
    MPI Building Blocks ▶Since interactions are accomplished by sending and receiving messages, the basic operations in the message-passing programming paradigm are SEND and RECEIVE. ▶ In their simplest form, the prototypes of these operations are defined as follows: ▶ send(void *sendbuf, int nelems, int dest) ▶ receive(void *recvbuf, int nelems, int source) ▶ The sendbuf points to a buffer that stores the data to be sent, recvbuf points to a buffer that stores the data to be received, nelems is the number of data units to be sent and received, dest is the identifier of the process that receives the data, and source is the identifier of the process that sends the data.
  • 6.
    MPI: the MessagePassing Interface ▶ MPI defines a standard library for message-passing that can be used to develop portable message-passing programs using either C or Fortran. ▶ The MPI standard defines both the syntax as well as the semantics of a core set of library routines that are very useful in writing message- passing programs. ▶ The MPI library contains over 125 routines. ▶ These routines are used to initialize and terminate the MPI library, to get information about the parallel computing environment, and to send and receive messages.
  • 7.
    Pros and Consof MPI ▶ Pros ▶ Does not require shared memory architectures which are more expensive than distributed memory architectures ▶ Can be used on a wider range of problems since it exploits both task parallelism and data parallelism ▶ Can run on both shared memory and distributed memory architectures ▶ Highly portable with specific optimization for the implementation on most hardware ▶ Cons ▶ Requires more programming changes to go from serial to parallel version ▶ Can be harder to debug
  • 8.
    What is OpenMP??? Multi-Processing)is an application programming ▶ OpenMP interface (Open (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most platforms, processor architectures and operating systems, including Solaris, AIX, HP-UX, Linux, MacOS, and Windows. ▶ OpenMP uses a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the standard desktop computer to the supercomputer.
  • 9.
    What is OpenMP??? ▶OpenMP is basically an add on in compiler. It is available in GCC (gnu compiler) , Intel compiler and with other compilers. ▶ OpenMP target shared memory systems i.e. where processor shared the main memory. ▶OpenMP is based on thread approach . It launches a single process which in turn can create n number of thread as desired. It is based on what is called "fork and join method" i.e. depending on particular task it can launch desired number of thread as directed by user.
  • 10.
    Threading ▶ A threadis a single stream of control in the flow of a program. ▶ Static Threads ▶ All work is allocated and assigned at runtime ▶ Dynamic Threads ▶ Consists of one Master and a pool of threads ▶ The pool is assigned some of the work at runtime, but not all of it ▶ When a thread from the pool becomes idle, the Master gives it a new assignment ▶ “Round-robin assignments”
  • 11.
    Parallel Programing Model ▶OpenMP uses the fork-join model of parallel execution. ▶ All OpenMP programs begin with a single master thread. ▶ The master thread executes sequentially until a parallel region is encountered, when it creates a team of parallel threads (FORK). ▶ When the team threads complete the parallel region, they synchronize and terminate, leaving only the master thread that executes sequentially (JOIN).
  • 12.
    Variables ▶ 2 typesof Variables ▶ Private ▶ Shared ▶ Private Variables ▶ Variables in a thread’s private space can only be accessed by the thread ▶ Private variable has a different address in the execution context of every thread. ▶ Clause : private «variable list» ▶ Shared Variables ▶ Variables in the global data space are accessed by all parallel threads. ▶Shared-variable has the same address in the execution context of every thread. All threads have access to shared variables.
  • 13.
    Variables ▶ A threadcan access its own private variables, but cannot access the private variable of another thread. ▶ In parallel for pragma, variables are by default shared, except loop index variable which is private.
  • 14.
    OpenMP Functions ▶ omp_get_num_procs() ▶ Returns the number of CPUs in the multiprocessor on which this thread is executing, ▶ The integer returned by this function may be less than the total number of physical processors in the multiprocessor, depending on how the run-time system gives processes access to processors. ▶ e.g. int t= omp_get_num_procs(); ▶ omp_get_num_threads() ▶ Returns the number of threads active in the current parallel region ▶ t=omp_get_num_threads();
  • 15.
    OpenMP Functions Contd. ▶omp_set_num_threads() ▶ Allows to set the number of threads executing the parallel sections of code ▶ Setting the number of threads equal to the number of available CPUs ▶ e.g. omp_set_num_threads(t); ▶ omp_get _thread_num() ▶ Returns the thread identification number, from 0 to n-1 where n are number of active threads. ▶ tid = omp_get_thread_num();
  • 16.
    Compiling and runningOpenMP $gcc -o hello_omp hello_omp.c –fopenmp Compiling $./hello_omp Running
  • 17.
    Shared Memory Programing ▶The underlying hardware is assumed to be a collection of processors, each with access to the same shared memory. ▶ Because they have access 10 the same memory locations, processors can interact and synchronize with each other through shared variables.
  • 18.
    ▶ The standardview of parallelism in a shared memory program is fork/join parallelism. ▶ When the program begins execution, only a single thread, called the master thread, is active. ▶ The master thread executes the sequential portions of the algorithm. At those points where parallel operations arc required, the master thread forks (creates or awakens) additional threads. ▶ The master thread and the created threads work concurrently through the parallel section, At the end of the parallel code the created threads die or are suspended, and the flow of control returns to the single master thread. This is called a join. Shared Memory Programing
  • 19.
    ▶ The shared-memorymodel is characterized by forkjoin parallelism, in which parallelism comes and goes. ▶ At the beginning of execution only a single thread, called the master thread, is active. ▶ The master thread executes the serial portions 0f the program. It forks additional threads to help it execute parallel portions of the program. ▶ These threads are deactivated when serial execution resumes. Shared Memory Programing
  • 20.
    ▶ A keydifference, then, between the shared-memory model and the message passing model is that in the message-passing model all processes typically remain active throughout the execution of the program, whereas in the shared-memory model the number of active threads is one at the program's start and finish and may change dynamically throughout the execution of the program. ▶ Parallel shared-memory programs range from those with only a single fork/join around a single loop to those in which most of the code segments are executed in parallel. Hence the shared-memory model supports incremental paI1lllelization, the process of transforming a sequential program into a parallel program one block of code at a line. Shared Memory Programing
  • 21.
    Pros and Consof OpenMP ▶ Pros ▶ Considered by some to be easier to program and debug (compared to MPI) ▶ Data layout and decomposition is handled automatically by directives. ▶ Allows incremental parallelism: directives can be added incrementally, so the program can be parallelized one portion after another and thus no dramatic change to code is needed. ▶ Unified code for both serial and parallel applications: OpenMP constructs are treated as comments when sequential compilers are used. ▶ Original (serial) code statements need not, in general, be modified when parallelized with OpenMP. This reduces the chance of inadvertently introducing bugs and helps maintenance as well. ▶ Both coarse-grained and fine-grained parallelism are possible
  • 22.
    Pros and Consof OpenMP ▶ Cons ▶ Currently only runs efficiently in shared-memory multiprocessor platforms ▶ Requires a compiler that supports OpenMP. ▶ Scalability is limited by memory architecture. ▶ Reliable error handling is missing. ▶ Lacks fine-grained mechanisms to control thread-processor mapping. ▶ Synchronization between subsets of threads is not allowed. ▶ Mostly used for loop parallelization ▶ Can be difficult to debug, due to implicit communication between threads via shared variables.