Hyrje openmp
Upcoming SlideShare
Loading in...5
×
 

Hyrje openmp

on

  • 352 views

 

Statistics

Views

Total Views
352
Views on SlideShare
352
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hyrje openmp Hyrje openmp Presentation Transcript

  • Cfare eshte OpenMP ?“Open specifications for Multi-Processing”API(Application Programming Interface) per zhvillimin eapplikimeve paralel ne C/C++, Fortran ne arkitektura mememorie te ndare (shared memory)I thjeshte per programim & paralelizem inkremental
  • OpenMP ofron:Model programimi Fork-JoinMulti-threaded
  • OpenMP konsiston ne:Direktiva kompilatori#pragma omp direktiva [clauses]{ …...........}Funksione librarie omp_get_num_threads()Variabla ambienti export OMP_NUM_THREADS=4
  • Direktiva Kompilatori:Rajonet paraleleNdarja e punesSinkronizimiAtribute te fushepamjes se te dhenave- private- firstprivate- lastprivate- shared- reduction
  • Funksione librarieNumri i thread-veThread IDNdryshimi dinamik i numrit te thread-veParelelizmi i perfshireTimersLocking API
  • Variablat e ambientitCakton numrin e thread-veTipi i skedulimitNdryshimi dinamik i numrit te thread-veParelelizmi i perfshire
  • Rajonet Paralele Rajoni paralel eshte nje bllok kodi qe ekzekutohet nga disa thread njekohesisht#pragma omp parallel [klauzole[[,] klauzole] ...] { "ky instruksion do te ekzekutohet paralelisht" } (barrier)
  • Ushtrim: Hello World (serial)#include <stdio.h> Kompilimi: gcc Hello.c -o Hello void main() { Output: Hello World (0) int ID = 0; printf(“Hello World (%d) n“,ID); }
  • Ushtrim: Hello World (paralel)#include <stdio.h> Kompilimi: Direktiva kompilatori te#include <omp.h> OpenMP gcc -fopenmp Hello.c -o Hello void main() { Output: Funksion librarie #pragma omp parallel i OpenMP Hello World (2) Hello World (1) { Hello World (3) int ID = 0; Hello World (0) ID= omp_get_thread_num(); printf(“Hello World (%d) n“,ID); } }
  • Klauzolat e OpenMPShume direktiva to OpenMP permbajne klauzolaKlauzolat perdoren per te specifikuar informacione shteseper direktivatDirektiva te caktuara kane klauzola specifike
  • Klauzolat if/private/shared if (exp)- Kodi ekzekutohet ne paralel n.q.s.e exp vleresohet ne “true” perndryshe ekzekutohet ne serial private (list) - Gjithe referencat jane per objektin lokal - Vlerat jane te papercaktuara ne hyrje dhe dalje te rajonit paralel shared (list)- Te dhenat aksesohen nga te gjithe thread-et
  • Fushpamja e variablave (scoping): Private ose te ndare (shared) - Variablat private mund te aksesohen vetem nga thread-i qe i zoteron ato - Variablat a ndare mund te aksesohen nga cdo thread Klauzolat: privat(....)/shared(.....)/default(private|none)#pragma omp parallel#pragma omp parallel private(a,c)#pragma omp parallel default(private), shared(a,b)#pragma omp parallel default(none), shared(a,b,c)
  • Shembuj me fushpamje variablash :int x=1; 2 3#pragma omp parallel shared(x) num_threads(2) 3 2{ 3 3 x++; printf(“%dn”,x); } printf(“%dn”,x);int x=1;#pragma omp parallel private(x) num_threads(2) printon cdo gje { x++; printf(“%dn”,x); ne fund printon } 1 printf(“%dn”,x);
  • Konstruktet e ndarjes se punes direktiva - sectionsdirektiva - for #pragma omp sections direktiva - single#pragma omp for {{ #pragma omp single for(i=0;i<N;i++) #pragma omp section { a[i]=a[i]+1; { .... } …code section1 } } #pragma omp section { …code section2 } } Puna (kodi) ndahet ndermjet thread-ave - keto direktiva duhet te perfshihen brenda rajoneve paralele - barrier e nenkuptuar (implicite) ne dalje - konstruktet e ndarjes se punes nuk krijone thread-e te reja
  • Direktiva - for #pragma omp for [clause …]Sherben per paralelizimin e ciklit forint a[N],i;…...#pragma omp parallel { …..... #pragma omp for for(i=0;i<N;i++) a[i]=a[i]+1;}
  • Direktiva - for #pragma omp for [clause …]Sherben per paralelizimin e ciklit forint a[N],i;…...#pragma omp parallel { …..... #pragma omp for jo cdo loop for mund te paralelizohet for(i=0;i<N;i++) duhet te kemi pavaresi te iteracioneve a[i]=a[i]+1; for(i=1;i<N;i++) a[i]=a[i-1]+1;}
  • Ushtrim: mbledhje vektoresh#include <omp.h>#define N 10int main(void) { float a[N], b[N], c[N]; int i, TID, nthreads; omp_set_num_threads(4); #pragma omp parallel default(none), private(i), shared(a,b) { #pragma omp for for (i = 0; i < N; i++) { a[i] = (i+1) * 1.0; b[i] = (i+1) * 2.0; } }#pragma omp parallel default(none), private(i,TID), shared(a,b,c,nthreads) { TID = omp_get_thread_num(); if (TID == 0) { nthreads = omp_get_num_threads(); printf("Number of threads = (%d) n",nthreads); } printf("Thread %d starting n",TID); #pragma omp for for (i = 0; i < N; i++) { c[i] = a[i] + b[i]; printf("%d, %d, %f, %f, %f n",TID,i+1,a[i], b[i],c[i]); } } }
  • Ushtrim: mbledhje vektoresh#include <omp.h>#define N 10 Thread 2 startingint main(void) { 2, 7, 7.000000, 14.000000, 21.000000 float a[N], b[N], c[N]; 2, 8, 8.000000, 16.000000, 24.000000 int i, TID, nthreads; 2, 9, 9.000000, 18.000000, 27.000000 omp_set_num_threads(4); Number of threads = (4) #pragma omp parallel default(none), private(i), shared(a,b) Thread 0 starting { 0, 1, 1.000000, 2.000000, 3.000000 #pragma omp for 0, 2, 2.000000, 4.000000, 6.000000 for (i = 0; i < N; i++) { 0, 3, 3.000000, 6.000000, 9.000000 a[i] = (i+1) * 1.0; Thread 3 starting b[i] = (i+1) * 2.0; 3, 10, 10.000000, 20.000000, } 30.000000 } Thread 1 starting 1, 4, 4.000000, 8.000000, 12.000000#pragma omp parallel default(none), private(i,TID), shared(a,b,c,nthreads) { 1, 5, 5.000000, 10.000000, 15.000000 TID = omp_get_thread_num(); 1, 6, 6.000000, 12.000000, 18.000000 if (TID == 0) { nthreads = omp_get_num_threads(); printf("Number of threads = (%d) n",nthreads); } printf("Thread %d starting n",TID); #pragma omp for for (i = 0; i < N; i++) { c[i] = a[i] + b[i]; printf("%d, %d, %f, %f, %f n",TID,i+1,a[i], b[i],c[i]); } } }
  • Paralelizimi i cikleve for Hapat e pergjithshem qe duhen ndjekur- gjej ciklet for me intensive nga ana llogaritese- konverto keto ne cikle me iteracione te pavarur- vendos direktiven e duhur OpenMP ne pozicionin e duhur int i, j, A[MAX]; int i, A[MAX]; j = 5; #pragma omp parallel for for (i=0;i< MAX; i++) { for (i=0;i< MAX; i++) { j +=2; int j =5+2*i; A[i] = big(j); A[i] = big(j); } }
  • Klauzola reduction reduction ( operator : list ) Variabli ku aplikohet kjo direktive duhet te jete i deklaruar shared Perdoret kur vlera e variablit akumulohet brenda nje cikli for (operatoret : +, *, -, /, &, ^, |, &&, ||) Nje kopje e variablit krijohet e inicializohet per cdo thread Ne perfundim te rajonit ose konstruktit , applikohet opeatori mbi te gjithe variablat private te threade-ve dhe rezultati ruhet ne variablin a ndare (shared)…...int a[n],b[n],results;…..initialization of a,b…..........#pragma omp parallel for default(shared) private(i) reduction(+:result) { for (i=0; i < n; i++) result = result + (a[i] * b[i]);} printf("Final result= %fn",result);
  • Klauzola Schedule (ne direktiven for) Kjo direktive percakton si ndahen iteracionet e ciklit for ndermjet thread-ve schedule(static [,chunk])- i ndan iteracionet ne bloqe me permase “chunk” dhe ai cakton threadeve ne kohen e kompilimit schedule(dynamic[,chunk])– cdo thread-i i caktohet ne kohen e ekzekutimi nje bllok me permase “chunk” nga nje “queue” kur thread perfundon bllokun e caktuar terheq nje blook tjeter nga “queue” schedule(guided[,chunk])– Njesoj si “dynamic” vetem se permasa e blloqeve zvogelohet me vazhdimin e perpunimit te blloqeve schedule(runtime)– tipi i skedulimit dhe “chunk” merren nga variabli i ambientit OMP_SCHEDULE
  • Direktiva sinkronizimi OpenMP ofron disa mekanizma sinkronizimi- barrier (sinkronizon gjithe thread-at ne nje pozicion te kodit)- master (vetem thread (master) kryesor ekzekuton bllokun)- critical (vetem nje thread ne kohe ekzekuton bllokun)- atomic (njesoj si critical por vetem per nje variabel)
  • Shembuj sinkronizimiint x=1; int x=1;#pragma omp parallel num_threads(2){ #pragma omp parallel num_threads(2) #pragma omp master { { x++; x++; foo(2),foo(2) #pragma omp barrier } foo(x); foo(x); foo(1),foo(2) } foo(3),foo(3)} int x=1; #pragma omp parallel num_threads(2) { #pragma omp critical { x++; foo(x); foo(2),foo(3) } }
  • Ushtrime a)nderto nje program qe llogarit shumezimin scalar te dy vektoreve(dot product) ne OpenMP duke mos perdorur klauzolen reduce dhe me pas duke e perdorur ate b) Shkruaj fillimisht nje program serial ne C qe gjen dhe shfaq ne ekran maximumin e nje vektori me numer te madh elementesh (~10000)- Programin me siper paralelizojeni ne disa thread-e me OpenMP dhe masni sa eshte speedup-i (perfitimi ne kohe ne krahasim me versionin serial)- Perdorni funksionet e librarise per matjen e kohes c) shkruaj nje program serial dhe pastaj ne paralel qe gjen dhe shfaq maximumin a elementeve te dy matricave