• Save
Parallel Programming
Upcoming SlideShare
Loading in...5
×
 

Parallel Programming

on

  • 1,064 views

 

Statistics

Views

Total Views
1,064
Views on SlideShare
1,059
Embed Views
5

Actions

Likes
2
Downloads
0
Comments
0

1 Embed 5

https://www.linkedin.com 5

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • A deadlock refers to a specific condition when two or more processes are each waiting for the other to release a resource. Deadlock is a common problem in multiprocessing where many processes share a mutually exclusive resource.
  • Deadlock will occur if both processes perform the send, using synchronous routines first (or blocking routines without sufficient buffering). This is because neither will return; they will wait for matching receives that are never reached.
  • Sorting is a fundamental operation that appears in many applications.Sorting is also often done because it makes searches and other operations easier but it has a processing costNow we’d like to show that Quicksort a well known fast algorithm can be improve using parallel programming and reduce the processing cost
  • Quicksort works first selecting one number called pivot, and it’s compared to the other number in the listIf the number is less than the pivot, it is placed in one sublist. Otherwise, it is placed in the other sublist. The procedure is repeated on the sublists. Quicksort is usually described by a recursive algorithm. One obvious way to parallelize quicksort is to start with one processor and pass on one of the recursive calls to another processor while keeping the other recursive call to perform.
  • In the quick_sort function, the original array ispartitioned into two parts. Each part is handled by the quick_sort function recursively. Since the twoparts of the array are manipulated independently, work can execute concurrently by using OpenMPparallel tasksThe par_quick_sort function has a parallel construct that contains a single construct. In the single construct, there is a call to the quick_sort function. Two tasks are generated in the quick_sort function. The quick_sort function is called recursively, causing many tasks to be generated until the low limit threshold is reached.The execution model of the qsort_task program can be described as a single producer, multiple consumer model. The thread executing the single region generates tasks; the threads in the team execute these tasks. All the tasks generated are guaranteed to complete by the time the threads exit thesingle region. When a thread finishes executing a task, it grabs a new task to execute. In this way, all threads can execute available tasks without barrier synchronization, thereby improving load balancing.
  • One of the first thing we can see is that with the parallel program, both cpu are working
  • We made some tests and we found that the parallel quicksort has a best performance than sequential quicksort

Parallel Programming Parallel Programming Presentation Transcript

  • Parallel Programming Recent Trends in Software Engineering Prof. Peter Stoehr By Jorge Ortiz Chirag Setty Uday Sharma Kristal Lucero June 9th 2011 1
  • Agenda• Basic concepts and motivational considerations • Definition and advantages • The always need of computational speed • The Flynns taxonomy • Types of parallel computers • Programming shared memory multiprocessors• Main issues in parallel programming • Threads • Synchronization • Deadlock• Sorting Algorithms – parallel and sequential implementations • Quicksort• Conclusions 2
  • Why to use Parallel Programming?• Basic concepts - Definition and Parallel Programming (Definition) advantages - computational speed • “Is a form of computation in which many calculations are carried out - The Flynns simultaneously” taxonomy- Parallel computers Types - Shared memorymultiprocessors Advantages• Parallelprogramming • It usually gets more:- Threads ◦ computational power- Lock ◦ fault tolerance- Synchronization ◦ larger ammount of memory.- Deadlock ◦ Speed Up factor•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 3
  • It will be always demand of computational Speed!• Basic concepts - Definition and “Will mankind one day without the net expenditure of energy advantages be able to restore the sun to its full youthfulness even after it had died - computational speed of old age?” - The Flynns taxonomy The Last Question (1956) – Isaac Asimov- Parallel computers Types - Shared memorymultiprocessors Areas such as Numerical Modeling of Scientific and Engineering• Parallel problems like the motion of astronomical bodies and Simulation ofprogramming- Threads large DNA Structures, Global Weather Forecasting, require greater- Lock computational speed than it is currently available.- Synchronization- Deadlock•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 4
  • The Grand Challenge Problems (1)• Basic concepts - Definition and Modeling Motion of Astronomical Bodies advantages - computational speed - The Flynns taxonomy- Parallel computers Types - Shared memorymultiprocessors• Parallelprogramming- Threads- Lock • Gravitational forces act among N-bodies so as the prediction of the- Synchronization movement can be calculated getting the total force on each body.- Deadlock ◦ For N-bodies →N-1 forces to calculate or N2•Sorting Algorithm ◦ Optimized implementations →O(N log2 N)parallel and ◦ Calculations are repeated once new positions are obtained.sequentialimplementation ◦ 1 Gallaxy is almost 1011 stars - Quicksort ◦ O(N log2 N) is almost 1 year for each iteration• Conclusions 5
  • Flynns taxonomy• Basic concepts - Definition and advantages - computational speed - The Flynns taxonomy- Parallel computers Types - Shared memorymultiprocessors• Parallelprogramming- Threads- Lock- Synchronization- Deadlock•Sorting Algorithmparallel and SPMD MPMDsequentialimplementation - Quicksort• Conclusions 6
  • Parallel Computers (1)• Basic concepts - Definition and advantages Nowadays there are two main approaches: - computational speed • Shared Memory Multiprocessor (Considerations) - The Flynns taxonomy ◦ Data sharing memory and synchronization issues appear.- Parallel computers Types ▪ How the data is shared among processors in the execution - Shared memory time?multiprocessors• Parallel ▪ larger shared memory machines do not satisfy UMA. Why?programming • Some processors are «nearer to» … and those ones can access- Threads the memory faster.- Lock- Synchronization- Deadlock•Sorting Algorithmparallel andsequentialimplementation Processors are faster Memory access is not - Quicksort as fast still• Conclusions 7
  • Parallel Computers (2)• Basic concepts - Definition and ...Shared Memory Multiprocessor (Considerations) advantages - computational ◦ The vendors have built computers with hierarchical memory speed systems - The Flynns ◦ SMPs have some memory that is not shared taxonomy- Parallel computers Types - Shared memorymultiprocessors• Parallelprogramming- Threads- Lock- Synchronization- Deadlock•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 8
  • Parallel Computers (3)• Basic concepts - Definition and • Networked Computers as a computing Platform. advantages - computational Efforts to build parallel computer systems by using networked speed - The Flynns computers as a cheaper alternative to expensive supercomputers, taxonomy started in the early 1990s- Parallel computers Types - Shared memorymultiprocessors• Parallelprogramming- Threads- Lock- Synchronization- Deadlock•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 9
  • Programming Shared Memory Multiprocessors (1)• Basic concepts - Definition and advantages - computational 1. Thread libraries - programmer decomposes program into speed individual parallel sequences, (threads), each being able to access - The Flynns shared variables declared outside threads. taxonomy- Parallel computers Types 2. Higher level library functions and preprocessor compiler - Shared memory directives to declare shared variables and specify parallelism.multiprocessors• Parallel 3. Use a modified sequential programming languageprogramming- Threads Added syntax to declare shared variables and specify parallelism.- Lock Eg. UPC (Unified Parallel C) - needs a UPC compiler.- Synchronization- Deadlock•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 10
  • Programming Shared Memory Multiprocessors (1)• Basic concepts - Definition and advantages - computational speed 4. Use a specially designed parallel programming language - The Flynns taxonomy with syntax to express parallelism. Compiler automatically creates- Parallel executable code for each processor (not now common). computers Types - Shared memorymultiprocessors 5. Use a regular sequential programming language• Parallel such as C and ask parallelizing compiler to convert it into parallelprogramming executable code. (not now common).- Threads- Lock- Synchronization- Deadlock•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 11
  • Thread• Basic concepts - Definition and • The threads that are executed independently to advantages each other are called as asynchronous threads. - computational speed - The Flynns • Problems: taxonomy- Parallel  Two or more threads share the same resource while computers Types - Shared memory only one of them can access the resource at one time.multiprocessors  If the producer and the consumer are sharing the same• Parallelprogramming kind of data in a program- Threads  then either producer may produce the data faster or- Lock- Synchronization consumer may retrieve an order of data and process it- Deadlock without its existing•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 12
  • Thread• Basic concepts - Definition and Start advantages - computational speed Thread1 Thread2 - The Flynns taxonomy- Parallel Shared computers Types - Shared memorymultiprocessors Variable and Method• Parallelprogramming- Threads •Java uses the keyword synchronized to- Lock- Synchronization synchronize them and intercommunicate to each- Deadlock other.•Sorting Algorithmparallel and •a mechanism which allows two or more threads tosequential share all the available resources in a sequentialimplementation - Quicksort manner• Conclusions 13
  • Lock• Basic concepts - Definition and • Lock term refers to the access granted to a particular advantages - computational thread that can access the shared resources. speed - The Flynns taxonomy • Java has build-in lock that only comes in action when- Parallel the object has synchronized method code computers Types - Shared memorymultiprocessors • no other thread can acquire the lock until the lock is• Parallel not released by first threadprogramming- Threads • Acquire the lock means the thread currently in- Lock- Synchronization synchronized method and released the lock means- Deadlock•Sorting Algorithm exit the synchronized method.parallel andsequentialimplementation - Quicksort• Conclusions 14
  • Important Points• Basic concepts • Points for synchronization or lock : - Definition and advantages - computational  Only methods (or blocks) can be synchronized speed - The Flynns  Each object has just one lock taxonomy- Parallel  All methods in a class need not to be synchronized computers Types - Shared memory  If a thread goes to sleep, it holds any locks it has ? itmultiprocessors doesnt release them.• Parallelprogramming • Two ways to synchronized the execution of code:- Threads- Lock  Synchronized Methods- Synchronization- Deadlock  Synchronized Blocks•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 15
  • Synchronization - Barrier• Basic concepts - Definition and • We could start multiple threads each time advantages around the loop, and wait for them all to - computational speed complete - The Flynns taxonomy- Parallel computers Types - Shared memory • This is inefficient, since we are continuallymultiprocessors• Parallel spawning new processesprogramming- Threads- Lock- Synchronization • This is much less efficient than having looping- Deadlock•Sorting Algorithm n processes and implementing synchronizationparallel andsequentialimplementation - Quicksort• Conclusions 16
  • Synchronization – Barrier• Basic concepts - Definition and • A barrier, a basic mechanism for synchronizing advantages processes-inserted at the point in each process - computational speed where it must wait. - The Flynns taxonomy- Parallel • All processes can continue from this point when computers Types all the processes have reached it - Shared memorymultiprocessors• Parallel • In message-passing systems, barriers are oftenprogramming provided with library routines:- Threads- Lock- Synchronization- Deadlock•Sorting Algorithm • MPI has the barrier routine, MPI_Barrier()parallel andsequentialimplementation - Quicksort• Conclusions • PVM has a similar barrier routine, pvm_barrier() 17
  • Barrier Example• Basic concepts - Definition and Process advantages Pn-1 - computational P0 P1 P2 speed - The Flynns taxonomy- Parallel Active computers Types - Shared memory Timemultiprocessors• Parallelprogramming Waiting Barrier- Threads- Lock- Synchronization- Deadlock•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 18
  • Counter Implementation• Basic concepts - Definition and Centralized counter implementation (sometimes called a linear barrier) advantages - computational speed - The Flynns taxonomy Process- Parallel P1 computers Types P0 Pn-1 counter - Shared memorymultiprocessors• Parallelprogramming- Threads- Lock Increment n Barrier Barrier Barrier- Synchronization check for n- Deadlock•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 19
  • Tree Implementation• Basic concepts - Definition and • More efficient. Suppose there are eight advantages processes, P0, P1, P2, P3, P4, P5, P6, and P7: - computational speed - The Flynns • First stage: P1 sends message to P0; taxonomy- Parallel • P3 sends message to P2; computers Types • P5 sends message to P4; - Shared memorymultiprocessors• Parallelprogramming • P7 sends message to P6;- Threads- Lock • Second stage: P2 sends message to P0;- Synchronization- Deadlock•Sorting Algorithm • P6 sends message to P4;parallel andsequential • Third stage: P4 sends message to P0;implementation - Quicksort • P0 terminates arrival phase;• Conclusions 20
  • Tree Implementation• Basic concepts P0 P1 P2 P3 P4 P5 P6 P7 - Definition and advantages - computational speed Arrival - The Flynns At taxonomy- Parallel Barrier Synchronization computers Types Message - Shared memorymultiprocessors• Parallelprogramming- Threads- Lock- Synchronization- Deadlock•Sorting Algorithm Departureparallel and Fromsequentialimplementation Barrier - Quicksort• Conclusions 21
  • Data parallel computations• Basic concepts - Definition and • Synchronisation is required advantages - computational speed - The Flynns taxonomy • Same operation in different data element- Parallel computers Types - Shared memorymultiprocessors • Data parallel programming is more convivenient• Parallelprogramming because:- Threads- Lock • Ease of programming- Synchronization- Deadlock•Sorting Algorithmparallel andsequential • Scale easily to larger problemsimplementation - Quicksort• Conclusions 22
  • Data parallel computations• Basic concepts - Definition and • Many numeric and non-numeric problem can be advantages cast in data parallel form - computational speed - The Flynns taxonomy- Parallel • Example : SIMD Computers computers Types - Shared memorymultiprocessors• Parallelprogramming • SIMD computers :- Threads- Lock • Same instruction executed on different- Synchronization- Deadlock processor but on different data type•Sorting Algorithmparallel and • Synchronisation is built into the hardwaresequentialimplementation - Quicksort• Conclusions 23
  • Example• Basic concepts - Definition and •For(i=0;i<n;i++) advantages - computational •a[i]=a[i]+k speed - The Flynns taxonomy a[]=a[]+- Parallel computers Types k; - Shared memorymultiprocessors• Parallelprogramming- Threads- Lock- Synchronization a[n-1]=a[n- a[0]=a[0]+k a[1]=a[1]+k- Deadlock 1]+k•Sorting Algorithmparallel andsequentialimplementation A[0] A[1] A[n-1] - Quicksort• Conclusions 24
  • Barrier Requirement• Basic concepts - Definition and • Data parallel technique is applied to Multiprocessor advantages or Multicomputer . - computational speed - The Flynns taxonomy- Parallel • The whole construct should not be completed before computers Types - Shared memory the instances thus a barrier is required .multiprocessors• Parallelprogramming- Threads- Lock • Forall(i=0; i<n;i++)- Synchronization- Deadlock a[i]=a[i] +k;•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 25
  • Butterfly barrier• Basic concepts - Definition and Sending a Message advantages - computational speed - The Flynns Process Process taxonomy- Parallel 1 Receiving confirmation 2 computers Types message - Shared memorymultiprocessors• Parallelprogramming- Threads -Send a Message to partner process- Lock- Synchronization -Wait until message is received from that process- Deadlock•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 26
  • Stages of Butterfly Barrier• Basic concepts - Definition and If we have n=2 processes we build a barrier in k stages 1,2,…,k. advantages - computational At stage s processes synchronize with a partner that is 2s-1 steps away. speed - The Flynns These are interleaved so that no process can pass through all stages in taxonomy the barrier until all process have reached it.- Parallel computers Types If n isn’t a power of 2 we can use the next largest 2k, but this isn’t - Shared memory efficientmultiprocessors and the system is no longer symmetric.• Parallelprogramming- Threads- Lock- Synchronization- Deadlock•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 27
  • Working model of Butterfly Barrier 0 1 2 3 4 5 6 7• Basic concepts - Definition and Round 0 advantages - computational speed - The Flynns Round 1 taxonomy- Parallel computers Types - Shared memorymultiprocessors• Parallelprogramming Round 2- Threads- Lock- Synchronization- Deadlock•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 28 28
  • Virtual process in Butterfly Barrier When the number of thread is not power of 2? Virtual process• Basic concepts 0 1 2 3 4 (2) (1) (0 - Definition and advantages ) - computational Round 0 speed - The Flynns taxonomy- Parallel Round 1 computers Types - Shared memorymultiprocessors• Parallelprogramming- Threads Round 2- Lock- Synchronization- Deadlock•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 29
  • Local Synchronization• Basic concepts Useful when calculation take varying amount of time and delays can occu - Definition and advantages randomly at any processor or task. - computational Create Batch speed Process (Pi-1) Process Pi Process (Pi+1) - The Flynns taxonomy- Parallel computers Types Recv(Pi); Send(Pi-1) Recv(Pi) - Shared memorymultiprocessors Send(Pi+1)• Parallelprogramming Send(Pi); Recv(Pi-1) Send (Pi)- Threads- Lock Recv(Pi+1)- Synchronization- Deadlock•Sorting Algorithmparallel andsequential Note: Not a perfect three – process barrier Pi-1 will onlyimplementation synchronize with Pi and continue as soon as Pi allow - Quicksort• Conclusions 30
  • Synchronous Iteration (Synchronous Parallelism)• Basic concepts • Synchronous iteration: this term is used to describe a - Definition and advantages situation where a problem is solved by iteration and - computational • each iteration step is composed of several processes that start speed together at the beginning of the iteration step and - The Flynns • next iteration step cannot begin until all processes have finished taxonomy- Parallel the current iteration step. computers Types Iteration 1 Iteration 2 Iteration 3 Iteration n - Shared memorymultiprocessors• Parallelprogramming Steps 0 to Steps 0 to Steps 0 to Steps 0 to n-1 n-1 n-1 n-1- Threads- Lock- Synchronization- Deadlock•Sorting Algorithm Iteration process diagramparallel andsequentialimplementation - Quicksort• Conclusions 31
  • Example 1: Synchronous Iteration• Basic concepts Equation: (4+6) –(2*3) - Definition and advantages - computational speed - The Flynns - 10 - 10-6=4 - 10-6=4 taxonomy- Parallel computers Types - Shared memory * 2*3=6 + 4+6=10 + 4+6=10 * 2*3=6multiprocessors• Parallel aprogramming 2 3 4 6 4 6 2 3- Threads- Lock- Synchronization- Deadlock Squential solution : Solve Parallel solution : Solve•Sorting Algorithm above equation linear above equation parallel way and start with solve way and from both sideparallel and equation prioriy wise of treesequentialimplementation - Quicksort• Conclusions 32
  • Example 2: Synchronous Iteration• Basic concepts - Definition and • Solving a General System of Linear Equations by advantages Iteration - computational speed • Suppose the equations are of a general form with n - The Flynns equations and n unknowns taxonomy- Parallel computers Types - Shared memorymultiprocessors• Parallelprogramming- Threads- Lock- Synchronization- Deadlock•Sorting Algorithmparallel andsequentialimplementation where the unknowns are x0,x1,x2,… xn-1 (0 <=i< n). - Quicksort• Conclusions 33
  • Illustrating Jacobi Iteration• Basic concepts 7x1 + 3x2 + x3 = 18 - Definition and advantages 2x1 - 9x2 + 4x3 = 12 - computational x1 - 4x2 + 12x3 = 6 X11 =2.571 - 0.429(0) - 0.143(0) = 2.571 X12 = - 1.333+0.222 (0) +0.444 (0) = -1.333 speed X13 = 0.500 - 0.083 (0) + 0.333 (0) = 0.500 - The Flynns taxonomy X1 = 18/7 - 3/7x2 - 1/7x3- Parallel X2 = -12/9 + 2/9x1 + 4/9x3 computers Types - Shared memory X3 = 6/12 - 1/12x1 + 4/12x2multiprocessors Use as the initial estimates: x1(0) = x2(0) = x3(0) = 0. Insert these• Parallel estimates into these equations yielding newprogramming estimates of the parameters.- Threads- Lock The estimated results after each iteration are shown as:- Synchronization Iteration x(1) x(2) x(3)- Deadlock 1 2.57143 - 1.33333 0.50000•Sorting Algorithm 2 3.07143 - 0.53968 - 0.15873parallel and 3 2.82540 - 0.72134 0.06415sequential 4 2.87141 - 0.67695 0.02410implementation 5 2.85811 - 0.68453 0.03506 - Quicksort 6 2.85979 - 0.68261 0.03365• Conclusions 34
  • Sequential Code• Basic concepts 7x1 + 3x2 + x3 = 18 for (i=0; i<n; i++) 2x1 - 9x2 + 4x3 = 12 - Definition and advantages x[i] = b[i]; x1 - 4x2 + 12x3 = 6 - computational for (iter = 0; iter < limit; Use as the initial estimates: speed - The Flynns iter++){ x1(0) = x2(0) = x3(0) = 0. Insert these taxonomy for (i=0; i<n; i++){ estimates into these equations yielding new estimates of the parameters.- Parallel sum = 0; computers Types - Shared memory for (j=0; j<n; j++) Iteration 1: newx[0] = (18 – 0)/7 = 2.571multiprocessors if (i != j)• Parallel sum = sum + newx[1] = - (12 – 0)/9 = -1.333programming a[i][j]*x[j]; newx[2] = (6 – 0)/12 = 0.500- Threads- Lock x1(1) = 2.571 x2(1) = -1.333 x3(1) = 0.500- Synchronization newx[i] = (b[i]- sum)/a[i][i]- Deadlock Iteration 2:•Sorting Algorithm } newx[0] = 2.571 +0.500357= 3.071parallel andsequential for (i=0; i<n; i++) newx[1] = -1.333+0.792762 = -implementation x[i] = newx[i]; 0.540 - Quicksort } newx[2] = 0.500 -0.657282 = - 0.158• Conclusions 35
  • Parallel Code • Suppose we have a process Pi for each unknown xi; the code for process Pi may be:• Basic concepts x[i] = b[i] - Definition and Iteration 1 for (iter = 0; iter < limit; iter++) Broadcast_Rv advantages { X1[0]=2.571 X1[0]=2.571 - computational sum = -a[i][i] * x[i]; speed for (j = 0; j < n; j++) X1[1]=-1.333 X1[1]=- - The Flynns sum = sum + a[i][j] * x[j]; 1.333 taxonomy new_x[i] = (b[i] - sum) /a[i][i]; X1[2]=0.5 X1[2]=0.5- Parallel broadcast_receive(&new_x[i]); Iteration 2 X2[0] = computers Types global_barrier(); 3.071 - Shared memory } X2[0] = 3.071 X2[1]= -multiprocessors • broadcast receive() is used 0.540 X2[1]= - X2[2]=-• Parallel here 0.540 0.158programming (1) to send the newly computed X2[2]=- X3[0] =- Threads value of x[i] from process Pi to 2.825 0.158 every other process and (2) to X3[1]= -- Lock Iteration 3 collect data broadcasted from any 0.721- Synchronization other processes to process Pi. X2[0] = X3[2]=- Deadlock 2.825 0.064•Sorting Algorithm X2[1]= -parallel and 0.721sequential X2[2]=-implementation 0.064 - Quicksort• Conclusions Recieve Send 36
  • A New Message-Passing Operation - Allgather.• Basic concepts - Definition and • Broadcast and gather values in one composite construction. advantages - computational speed - The Flynns taxonomy- Parallel computers Types - Shared memorymultiprocessors• Parallelprogramming- Threads- Lock- Synchronization- Deadlock•Sorting Algorithmparallel andsequential • Note: MPI Allgather() is also a global barrier, so you do not need toimplementation • add global_barrier(). - Quicksort• Conclusions 37
  • Solution By Iteration• Basic concepts - Definition and advantages • Iterative methods - computational • Applicable when direct methods require excessive speed - The Flynns computations taxonomy • Have the advantage of small memory requirements- Parallel • May not always converge/terminate computers Types - Shared memorymultiprocessors • An iterative method begins with an initial guess for the• Parallel unknownsprogramming • e.g., xi=bi- Threads- Lock • Iterations are continued until sufficiently accurate values- Synchronization obtained for the unknowns- Deadlock•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 38
  • Deadlock• Basic concepts A set of processes or threads is deadlocked when each - Definition and process or thread is waiting for a resource to be freed which advantages - computational is controlled by another process. Simple deadlock situation speed example. - The Flynns taxonomy- Parallel P1 R1 computers Types - Shared memorymultiprocessors• Parallelprogramming- Threads R2 P2- Lock- Synchronization- Deadlock•Sorting Algorithm From Process to Resourceparallel andsequential and Vice Versaimplementation - Quicksort R Resource P Process• Conclusions 39
  • Deadlock• Basic concepts - Definition and When a pair of processes each send advantages - computational and receive from each other, deadlock may occur. speed - The Flynns taxonomy- Parallel computers Types - Shared memorymultiprocessors Process P1 Process P2• Parallelprogramming send() send()- Threads . .- Lock . .- Synchronization . .- Deadlock•Sorting Algorithm recv() recv()parallel andsequentialimplementation - Quicksort• Conclusions 40
  • Deadlock• Basic concepts - Definition and Solution advantages - computational  Removing the mutual exclusion condition speed - The Flynns taxonomy- Parallel computers Types  Or removing the “hold and wait” condition. requiring processes to request all the resources - Shared memorymultiprocessors• Parallel they will need before starting upprogramming- Threads- Lock- Synchronization- Deadlock  Employ timeouts to recover from deadlock•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 41
  • Sorting Algorithms• Basic concepts  Sorting numbers – that is, rearranging a list of numbers into - Definition and increasing (or decreasing) order – is a fundamental operation advantages - computational that appears in many applications. speed - The Flynns  Sorting is also applicable to non-numerical values; for taxonomy example, rearranging strings into alphabetical order.- Parallel computers Types  Sorting is also often done because it makes searches and other - Shared memorymultiprocessors operations easier.• Parallel  Many parallel sorting algorithms and parallel implementationsprogramming of sequential sorting algorithms are synchronous algorithms.- Threads- Lock- Synchronization- Deadlock Here we select one sequential algorithms for conversion to a•Sorting Algorithmparallel and parallel implementation.sequentialimplementation Quicksort - Quicksort• Conclusions 42
  • Quicksort Pivot• Basic concepts 4 2 7 8 5 1 3 6 P0 - Definition and advantages - computational P0 P4 speed 3 2 1 4 5 7 8 6 - The Flynns taxonomy- Parallel 2 1 3 4 5 7 8 6 P0 P2 P4 P6 computers Types - Shared memorymultiprocessors 1 2 3 6 7 8 P0 P1 P6 P7• Parallelprogramming Sorted list Process allocation- Threads- Lock- Synchronization 1. Select the pivot  Recursive algorithm- Deadlock•Sorting Algorithm 2. Split  Recursive calls -> differentparallel and process 3. Repeat the procedure on thesequentialimplementation sublists - Quicksort• Conclusions 43
  • • Basic concepts - Definition and advantages - computational speed Quicksort using OpenMP - The Flynns taxonomy- Parallel computers Types - Shared memorymultiprocessors• Parallelprogramming Demo- Threads- Lock- Synchronization- Deadlock•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 44
  • Quicksort using OpenMP• Basic concepts  Quicksort sequential implementation - Definition and advantages qsort_seq.c - computational speed This program takes one integer parameter, num_elems. The - The Flynns taxonomy num_elems parameter specifies the size of the array to be- Parallel sorted computers Types - Shared memory $ ./qsort_seq 5000000multiprocessors• Parallel  Quicksort parallel implementation using OpenMPprogramming- Threads qsort_task.c- Lock- Synchronization This program takes two integer parameters:- Deadlock num_elems, low_limit. The num_elems parameter specifies•Sorting Algorithm the size of the array to be sorted. The quick_sort function isparallel and called recursively, causing many tasks to be generated untilsequentialimplementation the low_limit threshold is reached - Quicksort $ ./qsort_task 5000000 100• Conclusions 45
  • Quicksort using OpenMP• Basic concepts Set the OpenMP environment variables - Definition and advantages - computational OMP_NUM_THREADS 2  sets the number of threads to speed use for parallel regions - The Flynns taxonomy OMP_WAIT_POLICY ACTIVE  provides a hint to an- Parallel computers Types OpenMP implementation about the desired behavior of - Shared memory waiting threads. The ACTIVE value specifies that waitingmultiprocessors threads should mostly be active, i.e., consume processor• Parallelprogramming cycles, while waiting.- Threads- Lock OMP_DYNAMIC FALSE  controls dynamic adjustment of- Synchronization the number of threads to use for executing parallel- Deadlock regions. If the environment variable is set to false, the•Sorting Algorithmparallel and dynamic adjustment of the number of threads issequential disabledimplementation - Quicksort• Conclusions 46
  • Sequential vs Parallel 47
  • Results• Basic concepts - Definition and 1.000.000 2.000.000 5.000.000 10.000.000 advantages Sequential Quicksort 0,234601 0,495793 1,307625 2,707997 - computational Parallel Quicksort 0,124090 0,259852 0,687688 1,432589 speed - The Flynns 3.000000 taxonomy- Parallel computers Types 2.500000 50% faster! - Shared memorymultiprocessors 2.000000• Parallelprogramming 1.500000 Sequential Quicksort- Threads Parallel Quicksort- Lock- Synchronization 1.000000- Deadlock•Sorting Algorithm 0.500000parallel andsequential 0.000000implementation 1,000,000 2,000,000 5,000,000 10,000,000 - Quicksort• Conclusions 48
  • Conclusions• Basic concepts - Definition and  The demand of computing power and speed advantages increase every day. - computational speed - The Flynns  Programs that are properly designed to take taxonomy- Parallel advantage of parallelism can execute faster computers Types than their sequential counterparts, which is - Shared memorymultiprocessors an advantage.• Parallelprogramming  Some algorithms cannot be parallelized- Threads- Lock- Synchronization  Parallelization offers a new way to increase- Deadlock performance.•Sorting Algorithmparallel andsequentialimplementation - Quicksort• Conclusions 49
  • Thank you Questions? 50
  • ReferencesOpenMP Specification http://www.openmp.org/mp-documents/spec30.pdfReap the Benefits of Multithreading without all the work http://msdn.microsoft.com/en-us/magazine/cc163717.aspxOpenMP http://www.metz.supelec.fr/metz/personnel/vialle/course/SI-PP/notes-de-cours-specifiques/PP-02-OpenMP- 6spp.pdfParallel Programming http://coitweb.uncc.edu/%7Eabw/ITCS4145F10/Sorting Algorithms http://www.c.happycodings.com/Sorting_Searching/index.htmlOpenMP Exercise https://computing.llnl.gov/tutorials/openMP/exercise.htmlOpenMP Recursive Routines http://www.openmp.org/pipermail/omp/2005/000145.htmlOpenMP http://en.wikipedia.org/wiki/OpenMPhttp://publib.boulder.ibm.com/infocenter/comphelp/v111v131/index.jsp?topic=/com.ibm.xlc111.aix.doc/compiler_ref/ prag_omp_task.htmlThe Joys of Concurrent Programming http://www.informit.com/articles/article.aspx?p=30413&seqNum=2Wilkinson, Barry and Allen Michael. Parallel Programming. Second Edition. 2005 . Pearson Prentice Hall.J Ekanayake, G. F. 2010. High performance parallel computing with clouds and cloud technologies. Cloud Computing.A Grama, A. G., G Karypis, V Kumar 2003. Introduction to parallel computing. Addison-Wesley. 51