Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Open mp intro Mark Veltzer
  2. 2. The problem of multi threading ● Creating multi threaded application is a pain. ● Code is very verbose. ● Compare a theoretical algorithm for merge sort of an array with a practical implementation of it using multi threading. ● The theoretical algorithm could be written in a couple of lines ● The implementation is riddled with thread creation, termination, waiting and the algorithm is hidden inside the code.
  3. 3. The goal ● We want to leave multi threaded concerns aside and write the code the way it is in algorithms books. ● Let the system do all the rest. ● The code will be easy to maintain. ● We lost some control of threading concerns but gain in maintainability, flexibility and long term health of our system.
  4. 4. How to deliver the goal... ● One way would be message passing. ● Languages like Erlang follow the “actor model” and simplify creating of large scalable systems by sharing nothing and using message passing. ● Languages like Java or Python try to reach the goal by simplifying memory management and language. ● Open Mp tries to do the same for C and C++.
  5. 5. Definition ● Wikipedia: “OpenMP (Open Multi-Processing) is an API (application programming interface) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most processor architectures and operating systems, including Linux, Unix, AIX, Solaris, Mac OS X, and Microsoft Windows platforms. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.”
  6. 6. Development ● The openmp is a non-profit consortium. ● The consortium includes an Architecture Review Board which comprises personnel assigned by many corporate players (AMD, IBM, Intel, Microsoft and Oracle to name just a few) ● The standard could be reviewed at ● Each compiler vendor implements the standard.
  7. 7. Open mp versions ● Latest official version of openmp is 3.1 (at the time of writing these slides) ● Most vendors are behind implementing 3.0 or maybe even 2.x.
  8. 8. The openmp implementation ● An openmp implementation is comprised of both modification to the compiler to respect a set of pragmas and a library of functions that could be used by the end user. ● For instance, the free source implementation of open mp on Linux is called gomp and is implementing version 3.0 ● It is developed at by the FSF at
  9. 9. How do you use it? ● You mark off parts of your code using various #pragma directives to indicate parallelization ● The compiler and environment take care of the rest: Creation of thread pools, Avoids some data race conditions, Running the threads, Waiting for the threads to complete (join), Various multi core concerns and optimizations, Copying data to the threads (if you say so), creating the right kind of locks and more ● See example
  10. 10. #pragma directives ● #pragma omp parallel – forms a team of threads and starts parallel execution ● #pragma omp for – specifies that the next loop should be distributed among and executed by a team of threads ● #pragma omp sections – a set of blocks of code to be executed in parallel. ● #pragma omp single – execute the following block by one thread
  11. 11. #pragma directives (cont) ● #pragma omp critical – next block is a critical section ● #pragma omp barrier – a multi threaded barrier (wait for all threads to complete) ● #pragma omp taskwait – wait for all child tasks to complete ● #pragma omp atomic – next instruction is a simple arithmetic that should be implemented atomically.
  12. 12. #pragma directives (cont) ● #pragma omp flush – makes data of the current thread available to other threads (machine memory barrier) ● #pragma omp ordered – sub block inside the loop that should be run in “loop order”. ● #pragma omp threadprivate(list) – replicate the variables in list for each thread.
  13. 13. Data in openmp ● Since OpenMP is a shared memory programming model, most variables in OpenMP code are visible to all threads by default. ● sometimes private variables are necessary to avoid race conditions and there is a need to pass values between the sequential part and the parallel region ● Open mp supplies data sharing directive to say which data is not shared, shared or how.
  14. 14. Data directives ● shared(list): declares data in list to be shared (default) ● private(list): each thread will get it's own value ● firstprivate(list): like private but with initialization from current value ● lastprivate(list): like private but causes the original item to be updated after the parallel block ● reduction(list): accumulates value from the various threads.
  15. 15. Open MP API ● The open mp supplies a library of two dozen functions or so to control and get data from the open mp implementation. ● These are C functions so they can be used in either C or C++ programs. ● They are called omp_* and implemented in a library to be linked with. ● The functions deal with number of threads, scheduling, lock creation/deletion/usage, timing, level of parallelization and more.
  16. 16. The Environment ● Leaving number of threads in your openmp application is a bad practice. ● What happens if you move to a machine with more cores ? ● Or if the end user wants to constrain your program to use only a subset of her resources? ● A common practice in open mp is to leave various bit of information out of the code. ● In that case the open mp implementation will pick up various environment variables and will use them as parameters to control the parallelization.
  17. 17. The environment – few examples ● OMP_NUM_THREADS - Specifies the number of threads to use ● OMP_STACKSIZE - Set default thread stack size ● OMP_SCHEDULE - How threads are scheduled ● OMP_MAX_ACTIVE_LEVELS - Set the maximum number of nested parallel regions ● GOMP_CPU_AFFINITY - Bind threads to specific CPUs ● GOMP_STACKSIZE - Set default thread stack size
  18. 18. Open mp advantages ● Simple: need not deal with message passing as MPI does ● Data layout and decomposition is handled automatically by directives. ● Incremental parallelism: can work on one portion of the program at one time, no dramatic change to code is needed. ● Unified code for both serial and parallel applications: OpenMP constructs are treated as comments when sequential compilers are used.
  19. 19. Open mp advantages (cont) ● Original (serial) code statements need not, in general, be modified when parallelized with OpenMP. This reduces the chance of inadvertently introducing bugs. ● Both coarse-grained and fine-grained parallelism are possible ● Biggest advantage: simpler code
  20. 20. Open mp disadvantages ● Risk of introducing difficult to debug synchronization bugs and race conditions. ● Currently only runs efficiently in shared-memory multiprocessor platforms (solutions such as MPI bridges exist but are not transparent) ● Requires a compiler that supports OpenMP. ● Scalability is limited by memory architecture. ● no support for compare-and-swap.
  21. 21. Open mp disadvantages (cont) ● Reliable error handling is missing. ● Can't be used on GPU. ● High chance of accidentally writing false sharing code. ● Multi threaded executables often incur longer startup times so, they can actually run much slower than if compiled single-threaded, so, there needs to be a benefit to being multi threaded. ● Often multi threading is used when there is no benefit yet the downsides still exist.