The problem of multi threading
● Creating multi threaded application is a pain.
● Code is very verbose.
● Compare a theoretical algorithm for merge sort
of an array with a practical implementation of it
using multi threading.
● The theoretical algorithm could be written in a
couple of lines
● The implementation is riddled with thread
creation, termination, waiting and the algorithm
is hidden inside the code.
● We want to leave multi threaded concerns
aside and write the code the way it is in
● Let the system do all the rest.
● The code will be easy to maintain.
● We lost some control of threading concerns but
gain in maintainability, flexibility and long term
health of our system.
How to deliver the goal...
● One way would be message passing.
● Languages like Erlang follow the “actor model”
and simplify creating of large scalable systems
by sharing nothing and using message passing.
● Languages like Java or Python try to reach the
goal by simplifying memory management and
● Open Mp tries to do the same for C and C++.
● Wikipedia: “OpenMP (Open Multi-Processing) is
an API (application programming interface) that
supports multi-platform shared memory
multiprocessing programming in C, C++, and
Fortran, on most processor architectures and
operating systems, including Linux, Unix, AIX,
Solaris, Mac OS X, and Microsoft Windows
platforms. It consists of a set of compiler
directives, library routines, and environment
variables that influence run-time behavior.”
● The openmp is a non-profit consortium.
● The consortium includes an Architecture
Review Board which comprises personnel
assigned by many corporate players (AMD,
IBM, Intel, Microsoft and Oracle to name just a
● The standard could be reviewed at
● Each compiler vendor implements the standard.
Open mp versions
● Latest official version of openmp is 3.1 (at the
time of writing these slides)
● Most vendors are behind implementing 3.0 or
maybe even 2.x.
The openmp implementation
● An openmp implementation is comprised of
both modification to the compiler to respect a
set of pragmas and a library of functions that
could be used by the end user.
● For instance, the free source implementation of
open mp on Linux is called gomp and is
implementing version 3.0
● It is developed at by the FSF at
How do you use it?
● You mark off parts of your code using various
#pragma directives to indicate parallelization
● The compiler and environment take care of the
rest: Creation of thread pools, Avoids some
data race conditions, Running the threads,
Waiting for the threads to complete (join),
Various multi core concerns and optimizations,
Copying data to the threads (if you say so),
creating the right kind of locks and more
● See example
● #pragma omp parallel – forms a team of
threads and starts parallel execution
● #pragma omp for – specifies that the next loop
should be distributed among and executed by a
team of threads
● #pragma omp sections – a set of blocks of code
to be executed in parallel.
● #pragma omp single – execute the following
block by one thread
#pragma directives (cont)
● #pragma omp critical – next block is a critical
● #pragma omp barrier – a multi threaded barrier
(wait for all threads to complete)
● #pragma omp taskwait – wait for all child tasks
● #pragma omp atomic – next instruction is a
simple arithmetic that should be implemented
#pragma directives (cont)
● #pragma omp flush – makes data of the current
thread available to other threads (machine
● #pragma omp ordered – sub block inside the
loop that should be run in “loop order”.
● #pragma omp threadprivate(list) – replicate the
variables in list for each thread.
Data in openmp
● Since OpenMP is a shared memory
programming model, most variables in OpenMP
code are visible to all threads by default.
● sometimes private variables are necessary to
avoid race conditions and there is a need to
pass values between the sequential part and
the parallel region
● Open mp supplies data sharing directive to say
which data is not shared, shared or how.
● shared(list): declares data in list to be shared
● private(list): each thread will get it's own value
● firstprivate(list): like private but with initialization
from current value
● lastprivate(list): like private but causes the original
item to be updated after the parallel block
● reduction(list): accumulates value from the various
Open MP API
● The open mp supplies a library of two dozen
functions or so to control and get data from the
open mp implementation.
● These are C functions so they can be used in
either C or C++ programs.
● They are called omp_* and implemented in a
library to be linked with.
● The functions deal with number of threads,
scheduling, lock creation/deletion/usage, timing,
level of parallelization and more.
● Leaving number of threads in your openmp application
is a bad practice.
● What happens if you move to a machine with more
● Or if the end user wants to constrain your program to
use only a subset of her resources?
● A common practice in open mp is to leave various bit of
information out of the code.
● In that case the open mp implementation will pick up
various environment variables and will use them as
parameters to control the parallelization.
The environment – few examples
● OMP_NUM_THREADS - Specifies the number of threads to
● OMP_STACKSIZE - Set default thread stack size
● OMP_SCHEDULE - How threads are scheduled
● OMP_MAX_ACTIVE_LEVELS - Set the maximum number of
nested parallel regions
● GOMP_CPU_AFFINITY - Bind threads to specific CPUs
● GOMP_STACKSIZE - Set default thread stack size
Open mp advantages
● Simple: need not deal with message passing as
● Data layout and decomposition is handled
automatically by directives.
● Incremental parallelism: can work on one portion
of the program at one time, no dramatic change
to code is needed.
● Unified code for both serial and parallel
applications: OpenMP constructs are treated as
comments when sequential compilers are used.
Open mp advantages (cont)
● Original (serial) code statements need not, in
general, be modified when parallelized with
OpenMP. This reduces the chance of
inadvertently introducing bugs.
● Both coarse-grained and fine-grained
parallelism are possible
● Biggest advantage: simpler code
Open mp disadvantages
● Risk of introducing difficult to debug
synchronization bugs and race conditions.
● Currently only runs efficiently in shared-memory
multiprocessor platforms (solutions such as
MPI bridges exist but are not transparent)
● Requires a compiler that supports OpenMP.
● Scalability is limited by memory architecture.
● no support for compare-and-swap.
Open mp disadvantages (cont)
● Reliable error handling is missing.
● Can't be used on GPU.
● High chance of accidentally writing false sharing code.
● Multi threaded executables often incur longer startup
times so, they can actually run much slower than if
compiled single-threaded, so, there needs to be a
benefit to being multi threaded.
● Often multi threading is used when there is no benefit
yet the downsides still exist.