OpenMP

OpenMP
Speaker ：呂宗螢
Date ： 2007/06/01

Embedded and Parallel Systems Lab2
Outline

OpenMP
 OpenMP 2.5
 Multi-threaded & Share memory
 Fortran 、 C / C++
 基本語法
 #pragma omp directive [clause]
 OpenMP 需求及支援環境
 Windows
 Virtual studio 2005 standard
 Intel ® C++ Compiler 9.1
 Linux
 gcc 4.2.0
 Omni
 Xbox 360 & PS3

Windows
 於程式最前面 #include <omp.h>
 Virtual studio 2005 standard
專案 / 專案屬性 / 組態屬性 /c/c++/ 語言
將 OpenMP 支援改為 yes

Linux
 gcc 4.2
 如果沒有請至 GNU gcc 下載 gcc
http://gcc.gnu.org/
以 gcc 4.2.1 為例
1. 解開 gcc
tar -zxvf gcc-4.2.1.tar.gz
2. 進到該目錄
cd gcc-4.2.1
3. 設定 configure ，並安裝至 /opt/gcc-4.2.1
./configure -prefix=/opt/gcc-4.2.1/
4. 編繹
make
5. 安裝
make install

OpenMP Constructs

Types of Work-Sharing Constructs
 Loop ： shares iterations of a loop
across the team. Represents a type of
"data parallelism".
Source : http://www.llnl.gov/computing/tutorials/openMP/
 Sections ： breaks work into separate,
discrete sections. Each section is executed
by a thread. Can be used to implement a
type of "functional parallelism".

Types of Work-Sharing Constructs
 single ：將程式於一個執行緒執行 ( 於一個子執行緒執行，但不會在
master thread 執行 )
Source : http://www.llnl.gov/computing/tutorials/openMP/

Loop working sharing
#pragma omp parallel for
for( int i , i <10000, i++)
for( int j , j <100 , j++)
function(i);
#pragma omp parallel
{ 大括號必須斷行，不能接於 parallel 後
#pragma omp for
for( int i , i <10000, i++)
for( int j , j <100 , j++)
function(i);
}
=
parallel for 只能使用迴圈的 index 為 int 型態，且執行次數是可預知的
Thread 0 (Master)
for( i = 0 , i <5000, i++)
for( int j , j <100 , j++)
function(i);
Thread 1
for( i = 5000 , i <10000, i++)
for( int j , j <100 , j++)
function(i);
於雙執行緒的 cpu 執行時情形

OpenMP example : log.cpp
#include <omp.h>
#pragma omp parallel for num_threads(2) // 將 for 迴圈平均分給 2 個 threads
for (y=2;y<BufSizeY-2;y++)
for (x=2;x<BufSizeX-2;x++)
for (z=0;z<BufSizeBand;z++) {
addr=(y*BufSizeX+x)*BufSizeBand+z;
ans = (BYTE)(*(InBuf+addr))*16+
(BYTE)(*(InBuf+((y*BufSizeX+x+1)*BufSizeBand+z)))*(-2) +
(BYTE)(*(InBuf+((y*BufSizeX+x-1)*BufSizeBand+z)))*(-2) +
(BYTE)(*(InBuf+(((y+1)*BufSizeX+x)*BufSizeBand+z)))*(-2)+
(BYTE)(*(InBuf+(((y-1)*BufSizeX+x)*BufSizeBand+z)))*(-2)+
(BYTE)(*(InBuf+((y*BufSizeX+x+2)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+((y*BufSizeX+x-2)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+(((y+2)*BufSizeX+x)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+(((y-2)*BufSizeX+x)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+(((y+1)*BufSizeX+x+1)*BufSizeBand+z)))*(-1) +
(BYTE)(*(InBuf+(((y+1)*BufSizeX+x-1)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+(((y-1)*BufSizeX+x+1)*BufSizeBand+z)))*(-1)+
(BYTE)(*(InBuf+(((y-1)*BufSizeX+x-1)*BufSizeBand+z)))*(-1);
*(OutBuf+addr)=abs(ans)/8;
}

Source image Out image
Convert Log Image

Sections Working Share
int main(int argc, char* argv[]) {
#pragma omp parallel sections
{
#pragma omp section
{
toPNG();
}
#pragma omp section
{
toJPG();
}
#pragma omp section
{
toTIF();
}
}
}
Input image
toPNG
toJPG
toTIF

OpenMP notice
int Fe[10];
Fe[0] = 0;
Fe[1] = 1;
#pragma omp parallel for num_threads(2)
for( i = 2; i < 10; ++ i )
Fe[i] = Fe[i-1] + Fe[i-2];
Data dependent
{
#pragma omp for
for( int i = 0; i < 1000000; ++ i )
sum += i;
}
Race conditions

OpenMP notice
 DeadLock
private(me)
{
int me;
me = omp_get_thread_num ();
if (me == 0) goto Master;
#pragma omp barrier
Master:
#pragma omp single
write(*,*) ”done”
}

OpenMP example:matrix(1)
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define RANDOM_SEED 2882 //random seed
#define VECTOR_SIZE 4 //sequare matrix width the same to height
#define MATRIX_SIZE (VECTOR_SIZE * VECTOR_SIZE) //total size of
MATRIX
int main(int argc, char *argv[]){
int i,j,k;
int node_id;
int *AA; //sequence use & check the d2mce right or fault
int *BB; //sequence use
int *CC; //sequence use
int computing;
int _vector_size = VECTOR_SIZE;
int _matrix_size = MATRIX_SIZE;
char c[10];

if(argc > 1){
for( i = 1 ; i < argc ;){
if(strcmp(argv[i],"-s") == 0){
_vector_size = atoi(argv[i+1]);
_matrix_size =_vector_size * _vector_size;
i+=2;
}
else{
printf("the argument only have:n");
printf("-s: the size of vector ex: -s 256n");
return 0;
}
}
}
AA =(int *)malloc(sizeof(int) * _matrix_size);
BB =(int *)malloc(sizeof(int) * _matrix_size);
CC =(int *)malloc(sizeof(int) * _matrix_size);

srand( RANDOM_SEED );
/* create matrix A and Matrix B */
for( i=0 ; i< _matrix_size ; i++){
AA[i] = rand()%10;
BB[i] = rand()%10;
}
/* computing C = A * B */
#pragma omp parallel for private(computing, j , k)
for( i=0 ; i < _vector_size ; i++){
for( j=0 ; j < _vector_size ; j++){
computing =0;
for( k=0 ; k < _vector_size ; k++)
computing += AA[ i*_vector_size + k ] *
BB[ k*_vector_size + j ];
CC[ i*_vector_size + j ] = computing;
}
}

printf("nVector_size:%dn", _vector_size);
printf("Matrix_size:%dn", _matrix_size);
printf("Processing time:%fn", time);
return 0;
}

OpenMP Directive Table
Directive Description
atomic Specifies that a memory location that will be updated atomically.
barrier
Synchronizes all threads in a team; all threads pause at the barrier, until all threads execute the
barrier.
critical Specifies that code is only executed on one thread at a time.
flush Specifies that all threads have the same view of memory for all shared objects.
for Causes the work done in a for loop inside a parallel region to be divided among threads.
master Specifies that only the master threadshould execute a section of the program.
ordered Specifies that code under a parallelized for loop should be executed like a sequential loop.
parallel Defines a parallel region, which is code that will be executed by multiple threads in parallel.
sections Identifies code sections to be divided among all threads.
single
Lets you specify that a section of code should be executed on a single thread, not necessarily
the master thread.
threadprivate Specifies that a variable is private to a thread.
Source :http://msdn2.microsoft.com/zh-tw/library/0ca2w8dk(VS.80).aspx

OpenMP Clause Table
Clause Description
copyin Allows threads to access the master thread's value, for a threadprivate variable.
copyprivate Specifies that one or more variables should be shared among all threads.
default Specifies the behavior of unscoped variables in a parallel region.
firstprivate
Specifies that each thread should have its own instance of a variable, and that the variable should
be initialized with the value of the variable, because it exists before the parallel construct.
if Specifies whether a loop should be executed in parallel or in serial.
lastprivate
Specifies that the enclosing context's version of the variable is set equal to the private version of
whichever thread executes the final iteration (for-loop construct) or last section (#pragma sections).
nowait Overrides the barrier implicit in a directive.
num_threads Sets the number of threads in a thread team.
ordered Required on a parallel for statement if an ordered directive is to be used in the loop.
private Specifies that each thread should have its own instance of a variable.
reduction
Specifies that one or more variables that are private to each thread are the subject of a reduction
operation at the end of the parallel region.
schedule Applies to the for directive. Have fourt method ： static 、 dynamic 、 guided 、 runtime
shared Specifies that one or more variables should be shared among all threads.
Source :http://msdn2.microsoft.com/zh-tw/library/0ca2w8dk(VS.80).aspx

Reference
 Michael J. Quinn, “Parallel Programming in C with MPI and OpenMP”
 Introduction to Parallel Computing 　
http://www.llnl.gov/computing/tutorials/parallel_comp/
 OpenMP standard http://www.openmp.org/drupal/
 OpenMP MSDN tutorial
http://msdn2.microsoft.com/en-us/library/tt15eb9t(VS.80).aspx
 OpenMP tutorial http://www.llnl.gov/computing/tutorials/openMP/#DO
 Kang Su Gatlin , Pete Isensee, “Reap the Benefits of Multithreading without
All the Work” ,MSDN Magazine

OpenMP

More Related Content

What's hot

Viewers also liked

Similar to OpenMP

More from ZongYing Lyu

Recently uploaded

OpenMP