Programming using Open Mp


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Programming using Open Mp

  1. 1. 1ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010, Oct 27, 2010Programming with Shared MemoryPart 2Introduction to OpenMP
  2. 2. 2OpenMPAn accepted standard developed in the late 1990s by agroup of industry specialists.Consists of a small set of compiler directives, augmentedwith a small set of library routines and environment variablesusing the base language Fortran and C/C++.Several OpenMP compilers available.
  3. 3. 3Wikipedia OpenMP
  4. 4. 4OpenMP• Uses a thread-based shared memory programmingmodel• OpenMP programs will create multiple threads• All threads have access to global memory• Data can be shared among all threads or private to onethread• Synchronization occurs but often implicit
  5. 5. 5OpenMP uses “fork-join” model but thread-based.Initially, a single thread executed by a master thread.parallel directive creates a team of threads with a specifiedblock of code executed by the multiple threads in parallel.The exact number of threads in the team determined by oneof several ways.Other directives used within a parallel construct to specifyparallel for loops and different blocks of code for threads.
  6. 6. 6parallel regionMultiple threadsparallel regionMaster threadFork/join modelSynchronization
  7. 7. 7For C/C++, the OpenMP directives contained in #pragmastatements.Format:#pragma omp directive_name ...where omp is an OpenMP keyword.May be additional parameters (clauses) after directive namefor different options.Some directives require code to specified in a structuredblock that follows directive and then directive and structuredblock form a “construct”.
  8. 8. 8Parallel Directive#pragma omp parallelstructured_blockcreates multiple threads, each one executing the specifiedstructured_block, (a single statement or a compoundstatement created with { ...} with a single entry point and asingle exit point.)Implicit barrier at end of construct.Directive corresponds to forall construct.
  9. 9. 9Hello world example#pragma omp parallel{printf("Hello World from thread = %dn", omp_get_thread_num(),omp_get_num_threads());}Output from an 8-processor/core machine:Hello World from thread 0 of 8Hello World from thread 4 of 8Hello World from thread 3 of 8Hello World from thread 2 of 8Hello World from thread 7 of 8Hello World from thread 1 of 8Hello World from thread 6 of 8Hello World from thread 5 of 8OpenMPdirective for aparallel regionOpeningbrace must ona new line
  10. 10. 10Private and shared variablesVariables could be declared within each parallel region butOpenMP provides private tid;…#pragma omp parallel private(tid){tid = omp_get_thread_num();printf("Hello World from thread = %dn", tid);}Each threadhas a localvariable tidAlso a shared clause available.
  11. 11. 11
  12. 12. 12Number of threads in a teamEstablished by either:1.num_threads clause after the parallel directive, or2. omp_set_num_threads() library routine being previouslycalled, or3. Environment variable OMP_NUM_THREADS is definedin order given or is system dependent if none of above.Number of threads available can also be altered dynamically toachieve best use of system resources.
  13. 13. 13Work-SharingThree constructs in this classification:sectionsforsingleIn all cases, there is an implicit barrier at end of constructunless a nowait clause included, which overrides the barrier.Note: These constructs do not start a new team of threads.That done by an enclosing parallel construct.
  14. 14. 14SectionsThe construct#pragma omp sections{#pragma omp sectionstructured_block...#pragma omp sectionstructured_block}cause structured blocks to be shared among threads in team.The first section directive optional.Blocksexecuted byavailablethreads
  15. 15. 15Example#pragma omp parallel shared(a,b,c,d,nthreads) private(i,tid){tid = omp_get_thread_num();#pragma omp sections nowait{#pragma omp section{printf("Thread %d doing section 1n",tid);for (i=0; i<N; i++) {c[i] = a[i] + b[i];printf("Thread %d: c[%d]= %fn",tid,i,c[i]);}}#pragma omp section{printf("Thread %d doing section 2n",tid);for (i=0; i<N; i++) {d[i] = a[i] * b[i];printf("Thread %d: d[%d]= %fn",tid,i,d[i]);}}} /* end of sections */} /* end of parallel section */Onethreaddoes thisAnotherthreaddoes this
  16. 16. 16For Loop#pragma omp forfor ( i = 0; …. )causes for loop to be divided into parts and parts sharedamong threads in the team. for loop must be of a simple form.Way for loop divided can be specified by additional “schedule”clause.Exampleschedule (static, chunk_size)for loop divided into sizes specified by chunk_size andallocated to threads in a round robin fashion.For loop of asimple form
  17. 17. 17Example#pragma omp parallel shared(a,b,c,nthreads,chunk) private(i,tid){tid = omp_get_thread_num();if (tid == 0) {nthreads = omp_get_num_threads();printf("Number of threads = %dn", nthreads);}printf("Thread %d starting...n",tid);#pragma omp for schedule(dynamic,chunk)for (i=0; i<N; i++) {c[i] = a[i] + b[i];printf("Thread %d: c[%d]= %fn",tid,i,c[i]);}} /* end of parallel section */For loopExecuted byone thread
  18. 18. 18SingleThe directive#pragma omp singlestructured blockcause the structured block to be executed by one thread only.
  19. 19. 19Combined Parallel Work-sharingConstructsIf a parallel directive is followed by a single for directive, itcan be combined into:#pragma omp parallel for<for loop>with similar effects.
  20. 20. 20If a parallel directive is followed by a single sections directive,it can be combined into#pragma omp parallel sections{#pragma omp sectionstructured_block#pragma omp sectionstructured_block...}with similar effect. (In both cases, the nowait clause is notallowed.)
  21. 21. 21Master DirectiveThe master directive:#pragma omp masterstructured_blockcauses the master thread to execute the structured block.Different to those in the work sharing group in that there isno implied barrier at the end of the construct (nor thebeginning). Other threads encountering this directive willignore it and the associated structured block, and will moveon.
  22. 22. 22Loop Scheduling and PartitioningOpenMP offers scheduling clauses to add to for construct:• Static#pragma omp parallel for schedule (static,chunk_size)Partitions loop iterations into equal sized chunks specified bychunk_size. Chunks assigned to threads in round robinfashion.• Dynamic#pragma omp parallel for schedule (dynamic,chunk_size)Uses internal work queue. Chunk-sized block of loopassigned to threads as they become available.
  23. 23. 23• Guided#pragma omp parallel for schedule (guided,chunk_size)Similar to dynamic but chunk size starts large and gets smallerto reduce time threads have to go to work queue.chunk size = number of iterations remaining2 * number of threads• Runtime#pragma omp parallel for schedule (runtime)Uses OMP_SCEDULE environment variable to specify which ofstatic, dynamic or guided should be used.
  24. 24. 24Reduction clauseUsed combined the result of the iterations into a singlevalue c.f. with MPI _Reduce().Can be used with parallel, for, and sections,Examplesum = 0#pragma omp parallel for reduction(+:sum)for (k = 0; k < 100; k++ ) {sum = sum + funct(k);}Private copy of sum created for each thread by complier.Private copy will be added to sum at end.Eliminates here the need for critical sections.OperationVariable
  25. 25. 25Private variablesprivate clause – creates private copies of variables foreach threadfirstprivate clause - as private clause but initializes eachcopy to the values given immediately prior to parallelconstruct.lastprivate clause – as private but “the value of eachlastprivate variable from the sequentially last iteration ofthe associated loop, or the lexically last section directive,is assigned to the variable’s original object.”
  26. 26. 26Synchronization ConstructsCriticalcritical directive will only allow one thread execute theassociated structured block. When one or more threadsreach the critical directive:#pragma omp critical namestructured_blockthey will wait until no other thread is executing the samecritical section (one with the same name), and then onethread will proceed to execute the structured is optional. All critical sections with no name map toone undefined name.
  27. 27. 27BarrierWhen a thread reaches the barrier#pragma omp barrierit waits until all threads have reached the barrier and then theyall proceed together.There are restrictions on the placement of barrier directive in aprogram. In particular, all threads must be able to reach thebarrier.
  28. 28. 28AtomicThe atomic directive#pragma omp atomicexpression_statementimplements a critical section efficiently when the criticalsection simply updates a variable (adds one, subtracts one,or does some other simple arithmetic operation as definedby expression_statement).
  29. 29. 29FlushA synchronization point which causes thread to have a“consistent” view of certain or all shared variables in memory.All current read and write operations on variables allowed tocomplete and values written back to memory but any memoryoperations in code after flush are not started.Format:#pragma omp flush (variable_list)Only applied to thread executing flush, not to all threads in team.Flush occurs automatically at entry and exit of parallel and criticaldirectives, and at the exit of for, sections, and single (if a no-waitclause is not present).
  30. 30. 30Ordered clauseUsed in conjunction with for and parallel for directives tocause an iteration to be executed in the order that itwould have occurred if written as a sequential loop.
  31. 31. 31More informationFull information on OpenMP at