SlideShare a Scribd company logo
1 of 51
Parallel Programming
 Recent Trends in Software Engineering
           Prof. Peter Stoehr

                   By
              Jorge Ortiz
             Chirag Setty
             Uday Sharma
             Kristal Lucero



              June 9th 2011
                                         1
Agenda
•   Basic concepts and motivational considerations
     • Definition and advantages
     • The always need of computational speed
     • The Flynn's taxonomy
     • Types of parallel computers
     • Programming shared memory multiprocessors
•   Main issues in parallel programming
     • Threads
     • Synchronization
     • Deadlock
•   Sorting Algorithms – parallel and sequential implementations
     • Quicksort
•   Conclusions                                                    2
Why to use Parallel
                                  Programming?
• Basic concepts
 - Definition and    Parallel Programming (Definition)
    advantages
 - computational
    speed             • “Is a form of computation in which many calculations are carried out
 - The Flynn's          simultaneously”
    taxonomy
- Parallel
 computers Types
 - Shared memory
multiprocessors
                     Advantages
• Parallel
programming           • It usually gets more:
- Threads               ◦ computational power
- Lock                  ◦ fault tolerance
- Synchronization       ◦ larger ammount of memory.
- Deadlock              ◦ Speed Up factor
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                                  3
It will be always demand of
                          computational Speed!
• Basic concepts
 - Definition and        “Will mankind one day without the net expenditure of energy
    advantages
                     be able to restore the sun to its full youthfulness even after it had died
 - computational
    speed                                          of old age?”
 - The Flynn's
    taxonomy                     The Last Question (1956) – Isaac Asimov
- Parallel
 computers Types
 - Shared memory
multiprocessors
                     Areas such as Numerical Modeling of Scientific and Engineering
• Parallel
                     problems like the motion of astronomical bodies and Simulation of
programming
- Threads
                     large DNA Structures, Global Weather Forecasting, require greater
- Lock               computational speed than it is currently available.
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                                     4
The Grand Challenge
                                 Problems (1)
• Basic concepts
 - Definition and                Modeling Motion of Astronomical Bodies
    advantages
 - computational
    speed
 - The Flynn's
    taxonomy
- Parallel
 computers Types
 - Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock               • Gravitational forces act among N-bodies so as the prediction of the
- Synchronization      movement can be calculated getting the total force on each body.
- Deadlock             ◦ For N-bodies →N-1 forces to calculate or N2
•Sorting Algorithm     ◦ Optimized implementations →O(N log2 N)
parallel and
                       ◦ Calculations are repeated once new positions are obtained.
sequential
implementation         ◦ 1 Gallaxy is almost 1011 stars
 - Quicksort           ◦ O(N log2 N) is almost 1 year for each iteration
• Conclusions                                                                                5
Flynn's taxonomy
• Basic concepts
 - Definition and
    advantages
 - computational
    speed
 - The Flynn's
    taxonomy
- Parallel
 computers Types
 - Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
                              SPMD   MPMD
sequential
implementation
 - Quicksort
• Conclusions                               6
Parallel Computers (1)
• Basic concepts
 - Definition and
    advantages       Nowadays there are two main approaches:
 - computational
    speed             • Shared Memory Multiprocessor (Considerations)
 - The Flynn's
    taxonomy           ◦        Data sharing memory and synchronization issues appear.
- Parallel
 computers Types           ▪     How the data is shared among processors in the execution
 - Shared memory                 time?
multiprocessors
• Parallel                 ▪     larger shared memory machines do not satisfy UMA. Why?
programming                    • Some processors are «nearer to» … and those ones can access
- Threads                        the memory faster.
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation          Processors are faster                           Memory access is not
 - Quicksort                                                               as fast still
• Conclusions                                                                                  7
Parallel Computers (2)
• Basic concepts
 - Definition and    ...Shared Memory Multiprocessor (Considerations)
    advantages
 - computational
                     ◦   The vendors have built computers with hierarchical memory
    speed                systems
 - The Flynn's       ◦   SMPs have some memory that is not shared
    taxonomy
- Parallel
 computers Types
 - Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                        8
Parallel Computers (3)
• Basic concepts
 - Definition and    • Networked Computers as a computing Platform.
    advantages
 - computational
                      Efforts to build parallel computer systems by using networked
    speed
 - The Flynn's        computers as a cheaper alternative to expensive supercomputers,
    taxonomy          started in the early 1990s
- Parallel
 computers Types
 - Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                           9
Programming Shared Memory
                          Multiprocessors (1)
• Basic concepts
 - Definition and
    advantages
 - computational     1. Thread libraries - programmer decomposes program into
    speed               individual parallel sequences, (threads), each being able to access
 - The Flynn's          shared variables declared outside threads.
    taxonomy
- Parallel
 computers Types     2. Higher level library functions and preprocessor compiler
 - Shared memory         directives to declare shared variables and specify parallelism.
multiprocessors
• Parallel
                     3. Use a modified sequential programming language
programming
- Threads               Added syntax to declare shared variables and specify parallelism.
- Lock                  Eg. UPC (Unified Parallel C) - needs a UPC compiler.
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                                 10
Programming Shared Memory
                          Multiprocessors (1)
• Basic concepts
 - Definition and
    advantages
 - computational
    speed
                     4. Use a specially designed parallel programming language
 - The Flynn's
    taxonomy           with syntax to express parallelism. Compiler automatically creates
- Parallel             executable code for each processor (not now common).
 computers Types
 - Shared memory
multiprocessors      5. Use a regular sequential programming language
• Parallel             such as C and ask parallelizing compiler to convert it into parallel
programming            executable code. (not now common).
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                                 11
Thread
• Basic concepts
 - Definition and
                     • The threads that are executed independently to
    advantages         each other are called as asynchronous threads.
 - computational
    speed
 - The Flynn's       • Problems:
    taxonomy
- Parallel              Two or more threads share the same resource while
 computers Types
 - Shared memory
                         only one of them can access the resource at one time.
multiprocessors
                        If the producer and the consumer are sharing the same
• Parallel
programming              kind of data in a program
- Threads
                        then either producer may produce the data faster or
- Lock
- Synchronization        consumer may retrieve an order of data and process it
- Deadlock               without its existing
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                    12
Thread
• Basic concepts
 - Definition and                           Start
    advantages
 - computational
    speed                  Thread1                         Thread2
 - The Flynn's
    taxonomy
- Parallel
                                           Shared
 computers Types
 - Shared memory
multiprocessors                      Variable and Method
• Parallel
programming
- Threads            •Java uses the keyword synchronized to
- Lock
- Synchronization
                     synchronize them and intercommunicate to each
- Deadlock           other.
•Sorting Algorithm
parallel and         •a mechanism which allows two or more threads to
sequential           share all the available resources in a sequential
implementation
 - Quicksort         manner
• Conclusions                                                        13
Lock
• Basic concepts
 - Definition and    •    Lock term refers to the access granted to a particular
    advantages
 - computational         thread that can access the shared resources.
    speed
 - The Flynn's
    taxonomy
                     • Java has build-in lock that only comes in action when
- Parallel             the object has synchronized method code
 computers Types
 - Shared memory
multiprocessors
                     • no other thread can acquire the lock until the lock is
• Parallel             not released by first thread
programming
- Threads            • Acquire the lock means the thread currently in
- Lock
- Synchronization      synchronized method and released the lock means
- Deadlock
•Sorting Algorithm     exit the synchronized method.
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                   14
Important Points
• Basic concepts     • Points for synchronization or lock :
 - Definition and
    advantages
 - computational
                         Only methods (or blocks) can be synchronized
    speed
 - The Flynn's
                         Each object has just one lock
    taxonomy
- Parallel
                         All methods in a class need not to be synchronized
 computers Types
 - Shared memory         If a thread goes to sleep, it holds any locks it has ? it
multiprocessors           doesn't release them.
• Parallel
programming          • Two ways to synchronized the execution of code:
- Threads
- Lock                   Synchronized Methods
- Synchronization
- Deadlock               Synchronized Blocks
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                    15
Synchronization - Barrier
• Basic concepts
 - Definition and
                     • We could start multiple threads each time
    advantages         around the loop, and wait for them all to
 - computational
    speed              complete
 - The Flynn's
    taxonomy
- Parallel
 computers Types
 - Shared memory     • This is inefficient, since we are continually
multiprocessors
• Parallel
                       spawning new processes
programming
- Threads
- Lock
- Synchronization    • This is much less efficient than having looping
- Deadlock
•Sorting Algorithm     n processes and implementing synchronization
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                          16
Synchronization –
                                      Barrier
• Basic concepts
 - Definition and
                     • A barrier, a basic mechanism for synchronizing
    advantages         processes-inserted at the point in each process
 - computational
    speed              where it must wait.
 - The Flynn's
    taxonomy
- Parallel
                     • All processes can continue from this point when
 computers Types       all the processes have reached it
 - Shared memory
multiprocessors
• Parallel
                     • In message-passing systems, barriers are often
programming            provided with library routines:
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm   • MPI has the barrier routine, MPI_Barrier()
parallel and
sequential
implementation
 - Quicksort
• Conclusions
                     • PVM has a similar barrier routine, pvm_barrier()
                                                                          17
Barrier Example
• Basic concepts
 - Definition and                                Process
    advantages                                                       Pn-1
 - computational                       P0   P1       P2
    speed
 - The Flynn's
    taxonomy
- Parallel                  Active
 computers Types
 - Shared memory
                     Time
multiprocessors
• Parallel
programming                 Waiting                        Barrier
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                               18
Counter Implementation
• Basic concepts
 - Definition and    Centralized counter implementation (sometimes called a linear barrier)
    advantages
 - computational
    speed
 - The Flynn's
    taxonomy                                       Process
- Parallel                                              P1
 computers Types
                                         P0                                         Pn-1
                      counter
 - Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock               Increment n       Barrier         Barrier                     Barrier
- Synchronization    check for n
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                                19
Tree Implementation
• Basic concepts
 - Definition and
                     • More efficient. Suppose there are eight
    advantages         processes, P0, P1, P2, P3, P4, P5, P6, and P7:
 - computational
    speed
 - The Flynn's       • First stage: P1 sends message to P0;
    taxonomy
- Parallel           • P3 sends message to P2;
 computers Types

                     • P5 sends message to P4;
 - Shared memory
multiprocessors
• Parallel
programming          • P7 sends message to P6;
- Threads
- Lock               • Second stage: P2 sends message to P0;
- Synchronization
- Deadlock
•Sorting Algorithm   • P6 sends message to P4;
parallel and
sequential           • Third stage: P4 sends message to P0;
implementation
 - Quicksort         • P0 terminates arrival phase;
• Conclusions                                                           20
Tree Implementation
• Basic concepts               P0    P1   P2   P3   P4   P5        P6       P7
 - Definition and
    advantages
 - computational
    speed           Arrival
 - The Flynn's      At
    taxonomy
- Parallel
                    Barrier                              Synchronization
 computers Types                                         Message
 - Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
                   Departure
parallel and
                   From
sequential
implementation
                   Barrier
 - Quicksort
• Conclusions                                                              21
Data parallel computations
• Basic concepts
 - Definition and
                     • Synchronisation is required
    advantages
 - computational
    speed
 - The Flynn's
    taxonomy
                     • Same operation in different data element
- Parallel
 computers Types
 - Shared memory
multiprocessors      • Data parallel programming is more convivenient
• Parallel
programming            because:
- Threads
- Lock                  • Ease of programming
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
                        • Scale easily to larger problems
implementation
 - Quicksort
• Conclusions                                                           22
Data parallel computations
• Basic concepts
 - Definition and
                     • Many numeric and non-numeric problem can be
    advantages         cast in data parallel form
 - computational
    speed
 - The Flynn's
    taxonomy
- Parallel           • Example : SIMD Computers
 computers Types
 - Shared memory
multiprocessors
• Parallel
programming          • SIMD computers :
- Threads
- Lock                 • Same instruction executed on different
- Synchronization
- Deadlock               processor but on different data type
•Sorting Algorithm
parallel and           • Synchronisation is built into the hardware
sequential
implementation
 - Quicksort
• Conclusions                                                         23
Example
• Basic concepts
 - Definition and    •For(i=0;i<n;i++)
    advantages
 - computational     •a[i]=a[i]+k
    speed
 - The Flynn's
    taxonomy                                 a[]=a[]+
- Parallel
 computers Types                             k;
 - Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock
- Synchronization                                       a[n-1]=a[n-
                        a[0]=a[0]+k      a[1]=a[1]+k
- Deadlock                                                 1]+k
•Sorting Algorithm
parallel and
sequential
implementation
                         A[0]
                                            A[1]          A[n-1]
 - Quicksort
• Conclusions                                                         24
Barrier Requirement
• Basic concepts
 - Definition and
                     • Data parallel technique is applied to Multiprocessor
    advantages         or Multicomputer .
 - computational
    speed
 - The Flynn's
    taxonomy
- Parallel           • The whole construct should not be completed before
 computers Types
 - Shared memory       the instances thus a barrier is required .
multiprocessors
• Parallel
programming
- Threads
- Lock
                     • Forall(i=0; i<n;i++)
- Synchronization
- Deadlock              a[i]=a[i] +k;
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                 25
Butterfly barrier
• Basic concepts
 - Definition and                          Sending a Message
    advantages
 - computational
    speed
 - The Flynn's
                              Process                                    Process
    taxonomy
- Parallel                       1       Receiving confirmation             2
 computers Types                                 message
 - Shared memory
multiprocessors
• Parallel
programming
- Threads            -Send a Message to partner process
- Lock
- Synchronization    -Wait until message is received from that process
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                      26
Stages of Butterfly Barrier
• Basic concepts
 - Definition and    If we have n=2      processes we build a barrier in k stages 1,2,…,k.
    advantages
 - computational     At stage s processes synchronize with a partner that is 2s-1 steps away.
    speed
 - The Flynn's       These are interleaved so that no process can pass through all stages in
    taxonomy         the barrier until all process have reached it.
- Parallel
 computers Types      If n isn’t a power of 2 we can use the next largest 2k, but this isn’t
 - Shared memory     efficient
multiprocessors
                      and the system is no longer symmetric.
• Parallel
programming
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                                  27
Working model of Butterfly Barrier
                                0   1   2   3   4   5   6        7
• Basic concepts
 - Definition and     Round 0
    advantages
 - computational
    speed
 - The Flynn's        Round 1
    taxonomy
- Parallel
 computers Types
 - Shared memory
multiprocessors
• Parallel
programming          Round 2
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                               28       28
Virtual process in Butterfly Barrier
                     When the number of thread is not power of 2?
                                                              Virtual process
• Basic concepts                0   1     2      3    4    (2)      (1)     (0
 - Definition and
    advantages                                                              )
 - computational      Round 0
    speed
 - The Flynn's
    taxonomy
- Parallel            Round 1
 computers Types
 - Shared memory
multiprocessors
• Parallel
programming
- Threads            Round 2
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                    29
Local Synchronization
• Basic concepts     Useful when calculation take varying amount of time and delays can occu
 - Definition and
    advantages
                      randomly at any processor or task.
 - computational     Create Batch
    speed              Process (Pi-1)          Process Pi          Process (Pi+1)
 - The Flynn's
    taxonomy
- Parallel
 computers Types       Recv(Pi);                Send(Pi-1)              Recv(Pi)
 - Shared memory
multiprocessors
                                                Send(Pi+1)
• Parallel
programming            Send(Pi);                Recv(Pi-1)              Send (Pi)
- Threads
- Lock                                          Recv(Pi+1)
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential                 Note: Not a perfect three – process barrier Pi-1 will only
implementation             synchronize with Pi and continue as soon as Pi allow
 - Quicksort
• Conclusions                                                                           30
Synchronous Iteration
                         (Synchronous Parallelism)
• Basic concepts     •  Synchronous iteration: this term is used to describe a
 - Definition and
    advantages
                        situation where a problem is solved by iteration and
 - computational     • each iteration step is composed of several processes that start
    speed               together at the beginning of the iteration step and
 - The Flynn's       • next iteration step cannot begin until all processes have finished
    taxonomy
- Parallel             the current iteration step.
 computers Types         Iteration 1   Iteration 2          Iteration 3          Iteration n
 - Shared memory
multiprocessors
• Parallel
programming              Steps 0 to    Steps 0 to            Steps 0 to           Steps 0 to
                            n-1           n-1                   n-1                  n-1
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
                                                     Iteration process diagram
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                                  31
Example 1:
                          Synchronous Iteration
• Basic concepts                                      Equation: (4+6) –(2*3)
 - Definition and
    advantages
 - computational
    speed
 - The Flynn's                           -
                                       10 -
                                      10-6=4
                                                                                             -
                                                                                          10-6=4
    taxonomy
- Parallel
 computers Types
 - Shared memory            *
                          2*3=6                      +
                                                   4+6=10
                                                                               +
                                                                             4+6=10                      *
                                                                                                       2*3=6
multiprocessors
• Parallel                                                                                 a
programming
                     2            3            4            6            4          6              2           3
- Threads
- Lock
- Synchronization
- Deadlock               Squential solution : Solve                            Parallel solution : Solve
•Sorting Algorithm       above equation linear                                 above equation parallel
                         way and start with solve                              way and from both side
parallel and
                         equation prioriy wise                                 of tree
sequential
implementation
 - Quicksort
• Conclusions                                                                                                  32
Example 2:
                         Synchronous Iteration
• Basic concepts
 - Definition and    • Solving a General System of Linear Equations by
    advantages         Iteration
 - computational
    speed            • Suppose the equations are of a general form with n
 - The Flynn's         equations and n unknowns
    taxonomy
- Parallel
 computers Types
 - Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation       where the unknowns are x0,x1,x2,… xn-1 (0 <=i< n).
 - Quicksort
• Conclusions                                                               33
Illustrating Jacobi Iteration
• Basic concepts
                     7x1 + 3x2 + x3 = 18
 - Definition and
    advantages       2x1 - 9x2 + 4x3 = 12
 - computational     x1 - 4x2 + 12x3 = 6                           X11 =2.571 - 0.429(0) - 0.143(0)    = 2.571
                                                                   X12 = - 1.333+0.222 (0) +0.444 (0) = -1.333
    speed                                                          X13 = 0.500 - 0.083 (0) + 0.333 (0) = 0.500
 - The Flynn's
    taxonomy         X1 = 18/7 - 3/7x2 - 1/7x3
- Parallel
                     X2 = -12/9 + 2/9x1 + 4/9x3
 computers Types
 - Shared memory     X3 = 6/12 - 1/12x1 + 4/12x2
multiprocessors      Use as the initial estimates:
                     x1(0) = x2(0) = x3(0) = 0. Insert these
• Parallel           estimates into these equations yielding new
programming          estimates of the parameters.
- Threads
- Lock                                    The estimated results after each iteration are shown as:
- Synchronization                         Iteration x(1)             x(2)              x(3)
- Deadlock                                1        2.57143       - 1.33333           0.50000
•Sorting Algorithm                        2        3.07143       - 0.53968         - 0.15873
parallel and                              3        2.82540       - 0.72134           0.06415
sequential                                4        2.87141       - 0.67695           0.02410
implementation                            5        2.85811       - 0.68453           0.03506
 - Quicksort                              6        2.85979       - 0.68261           0.03365
• Conclusions                                                                                                34
Sequential Code

• Basic concepts                                         7x1 + 3x2 + x3 = 18
                     for (i=0; i<n; i++)                 2x1 - 9x2 + 4x3 = 12
 - Definition and
    advantages         x[i] = b[i];                      x1 - 4x2 + 12x3 = 6
 - computational     for (iter = 0; iter < limit;
                                                         Use as the initial estimates:
    speed
 - The Flynn's
                     iter++){                            x1(0) = x2(0) = x3(0) = 0. Insert these
    taxonomy            for (i=0; i<n; i++){             estimates into these equations yielding new
                                                         estimates of the parameters.
- Parallel                sum = 0;
 computers Types
 - Shared memory
                          for (j=0; j<n; j++)            Iteration 1:
                                                         newx[0] = (18 – 0)/7 =    2.571
multiprocessors             if (i != j)
• Parallel                     sum = sum +               newx[1] = - (12 – 0)/9 = -1.333
programming          a[i][j]*x[j];                       newx[2] = (6 – 0)/12 =    0.500
- Threads
- Lock                                                   x1(1) = 2.571 x2(1) = -1.333 x3(1) = 0.500
- Synchronization        newx[i] = (b[i]- sum)/a[i][i]
- Deadlock                                               Iteration 2:
•Sorting Algorithm        }                              newx[0] = 2.571 +0.500357=        3.071
parallel and
sequential                for (i=0; i<n; i++)
                                                         newx[1] = -1.333+0.792762 = -
implementation              x[i] = newx[i];              0.540
 - Quicksort         }                                   newx[2] = 0.500 -0.657282 = - 0.158
• Conclusions                                                                                          35
Parallel Code
                     • Suppose we have a process Pi for each unknown xi; the code
                     for process Pi may be:
• Basic concepts       x[i] = b[i]
 - Definition and                                            Iteration 1
                      for (iter = 0; iter < limit; iter++)                               Broadcast_Rv
    advantages        {
                                                              X1[0]=2.571                X1[0]=2.571
 - computational        sum = -a[i][i] * x[i];
    speed               for (j = 0; j < n; j++)               X1[1]=-1.333               X1[1]=-
 - The Flynn's          sum = sum + a[i][j] * x[j];                                      1.333
    taxonomy            new_x[i] = (b[i] - sum) /a[i][i];    X1[2]=0.5                   X1[2]=0.5
- Parallel              broadcast_receive(&new_x[i]);         Iteration 2                X2[0] =
 computers Types        global_barrier();                                                3.071
 - Shared memory     }                                       X2[0] = 3.071               X2[1]= -
multiprocessors      • broadcast receive() is used                                       0.540
                                                             X2[1]= -                    X2[2]=-
• Parallel                 here                              0.540                       0.158
programming          (1) to send the newly computed
                                                              X2[2]=-                    X3[0] =
- Threads            value of x[i] from process Pi to                                    2.825
                                                              0.158
                     every other process and (2) to                                      X3[1]= -
- Lock                                                       Iteration 3
                     collect data broadcasted from any                                   0.721
- Synchronization
                     other processes to process Pi.          X2[0] =                     X3[2]=
- Deadlock                                                   2.825                       0.064
•Sorting Algorithm                                           X2[1]= -
parallel and                                                 0.721
sequential                                                   X2[2]=-
implementation                                               0.064
 - Quicksort
• Conclusions                                                           Recieve   Send
                                                                                                     36
A New Message-Passing
                                    Operation - Allgather.
• Basic concepts
 - Definition and    •       Broadcast and gather values in one composite construction.
    advantages
 - computational
    speed
 - The Flynn's
    taxonomy
- Parallel
 computers Types
 - Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential               •    Note: MPI Allgather() is also a global barrier, so you do not need to
implementation           •    add global_barrier().
 - Quicksort
• Conclusions                                                                                 37
Solution By Iteration

• Basic concepts
 - Definition and
    advantages       •   Iterative methods
 - computational
                          • Applicable when direct methods require excessive
    speed
 - The Flynn's                computations
    taxonomy              • Have the advantage of small memory requirements
- Parallel                • May not always converge/terminate
 computers Types
 - Shared memory
multiprocessors      •   An iterative method begins with an initial guess for the
• Parallel               unknowns
programming               • e.g., xi=bi
- Threads
- Lock               •   Iterations are continued until sufficiently accurate values
- Synchronization        obtained for the unknowns
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                                          38
Deadlock
• Basic concepts     A set of processes or threads is deadlocked when each
 - Definition and    process or thread is waiting for a resource to be freed which
    advantages
 - computational     is controlled by another process. Simple deadlock situation
    speed            example.
 - The Flynn's
    taxonomy
- Parallel                                 P1             R1
 computers Types
 - Shared memory
multiprocessors
• Parallel
programming
- Threads                                  R2             P2
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
                              From Process to Resource
parallel and
sequential                    and Vice Versa
implementation
 - Quicksort           R     Resource     P     Process
• Conclusions                                                                   39
Deadlock
• Basic concepts
 - Definition and    When a pair of processes each send
    advantages
 - computational     and receive from each other, deadlock
                     may occur.
    speed
 - The Flynn's
    taxonomy
- Parallel
 computers Types
 - Shared memory
multiprocessors      Process P1    Process P2
• Parallel
programming          send()        send()
- Threads            .             .
- Lock               .             .
- Synchronization
                     .             .
- Deadlock
•Sorting Algorithm   recv()        recv()
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                            40
Deadlock
• Basic concepts
 - Definition and
                     Solution
    advantages
 - computational      Removing the mutual exclusion condition
    speed
 - The Flynn's
    taxonomy
- Parallel
 computers Types      Or removing the “hold and wait” condition.
                      requiring processes to request all the resources
 - Shared memory
multiprocessors
• Parallel            they will need before starting up
programming
- Threads
- Lock
- Synchronization
- Deadlock            Employ timeouts to recover from deadlock
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                       41
Sorting Algorithms
• Basic concepts      Sorting numbers – that is, rearranging a list of numbers into
 - Definition and
                       increasing (or decreasing) order – is a fundamental operation
    advantages
 - computational       that appears in many applications.
    speed
 - The Flynn's        Sorting is also applicable to non-numerical values; for
    taxonomy           example, rearranging strings into alphabetical order.
- Parallel
 computers Types      Sorting is also often done because it makes searches and other
 - Shared memory
multiprocessors
                       operations easier.
• Parallel            Many parallel sorting algorithms and parallel implementations
programming            of sequential sorting algorithms are synchronous algorithms.
- Threads
- Lock
- Synchronization
- Deadlock
                     Here we select one sequential algorithms for conversion to a
•Sorting Algorithm
parallel and           parallel implementation.
sequential
implementation
                         Quicksort
 - Quicksort
• Conclusions                                                                          42
Quicksort
                     Pivot
• Basic concepts
                       4     2   7    8   5   1   3   6                       P0
 - Definition and
    advantages
 - computational                                                    P0                  P4
    speed              3     2   1    4   5   7   8   6
 - The Flynn's
    taxonomy
- Parallel             2     1   3    4   5   7   8   6        P0        P2        P4        P6
 computers Types
 - Shared memory
multiprocessors        1 2       3            6 7     8   P0        P1                  P6        P7
• Parallel
programming             Sorted list                                           Process allocation
- Threads
- Lock
- Synchronization
                      1. Select the pivot                  Recursive algorithm
- Deadlock
•Sorting Algorithm
                      2. Split                             Recursive calls -> different
parallel and                                                process
                       3. Repeat the procedure on the
sequential
implementation        sublists
 - Quicksort
• Conclusions                                                                                          43
• Basic concepts
 - Definition and
    advantages
 - computational
    speed

                     Quicksort using OpenMP
 - The Flynn's
    taxonomy
- Parallel
 computers Types
 - Shared memory
multiprocessors
• Parallel
programming                   Demo
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                 44
Quicksort using OpenMP
• Basic concepts      Quicksort sequential implementation
 - Definition and
    advantages         qsort_seq.c
 - computational
    speed              This program takes one integer parameter, num_elems. The
 - The Flynn's
    taxonomy           num_elems parameter specifies the size of the array to be
- Parallel             sorted
 computers Types
 - Shared memory       $ ./qsort_seq 5000000
multiprocessors
• Parallel            Quicksort parallel implementation using OpenMP
programming
- Threads
                       qsort_task.c
- Lock
- Synchronization      This program takes two integer parameters:
- Deadlock             num_elems, low_limit. The num_elems parameter specifies
•Sorting Algorithm     the size of the array to be sorted. The quick_sort function is
parallel and           called recursively, causing many tasks to be generated until
sequential
implementation
                       the low_limit threshold is reached
 - Quicksort
                       $ ./qsort_task 5000000 100
• Conclusions                                                                           45
Quicksort using OpenMP
• Basic concepts     Set the OpenMP environment variables
 - Definition and
    advantages
 - computational
                     OMP_NUM_THREADS 2  sets the number of threads to
    speed              use for parallel regions
 - The Flynn's
    taxonomy         OMP_WAIT_POLICY ACTIVE  provides a hint to an
- Parallel
 computers Types
                       OpenMP implementation about the desired behavior of
 - Shared memory       waiting threads. The ACTIVE value specifies that waiting
multiprocessors
                       threads should mostly be active, i.e., consume processor
• Parallel
programming
                       cycles, while waiting.
- Threads
- Lock
                     OMP_DYNAMIC FALSE  controls dynamic adjustment of
- Synchronization      the number of threads to use for executing parallel
- Deadlock
                       regions. If the environment variable is set to false, the
•Sorting Algorithm
parallel and           dynamic adjustment of the number of threads is
sequential             disabled
implementation
 - Quicksort
• Conclusions                                                                      46
Sequential vs Parallel




                         47
Results
• Basic concepts
 - Definition and                            1.000.000        2.000.000   5.000.000 10.000.000
    advantages       Sequential Quicksort     0,234601         0,495793    1,307625   2,707997
 - computational     Parallel Quicksort       0,124090         0,259852    0,687688   1,432589
    speed
 - The Flynn's        3.000000
    taxonomy
- Parallel
 computers Types      2.500000                                                         50% faster!
 - Shared memory
multiprocessors       2.000000
• Parallel
programming           1.500000                                                     Sequential Quicksort
- Threads
                                                                                   Parallel Quicksort
- Lock
- Synchronization     1.000000

- Deadlock
•Sorting Algorithm    0.500000
parallel and
sequential
                      0.000000
implementation
                                 1,000,000   2,000,000    5,000,000   10,000,000
 - Quicksort
• Conclusions                                                                                             48
Conclusions
• Basic concepts
 - Definition and
                      The demand of computing power and speed
    advantages         increase every day.
 - computational
    speed
 - The Flynn's        Programs that are properly designed to take
    taxonomy
- Parallel
                       advantage of parallelism can execute faster
 computers Types       than their sequential counterparts, which is
 - Shared memory
multiprocessors        an advantage.
• Parallel
programming           Some algorithms cannot be parallelized
- Threads
- Lock
- Synchronization
                      Parallelization offers a new way to increase
- Deadlock             performance.
•Sorting Algorithm
parallel and
sequential
implementation
 - Quicksort
• Conclusions                                                         49
Thank you

 Questions?




              50
References
OpenMP Specification http://www.openmp.org/mp-documents/spec30.pdf

Reap the Benefits of Multithreading without all the work http://msdn.microsoft.com/en-us/magazine/cc163717.aspx

OpenMP http://www.metz.supelec.fr/metz/personnel/vialle/course/SI-PP/notes-de-cours-specifiques/PP-02-OpenMP-
    6spp.pdf

Parallel Programming http://coitweb.uncc.edu/%7Eabw/ITCS4145F10/

Sorting Algorithms http://www.c.happycodings.com/Sorting_Searching/index.html

OpenMP Exercise https://computing.llnl.gov/tutorials/openMP/exercise.html

OpenMP Recursive Routines http://www.openmp.org/pipermail/omp/2005/000145.html

OpenMP http://en.wikipedia.org/wiki/OpenMP

http://publib.boulder.ibm.com/infocenter/comphelp/v111v131/index.jsp?topic=/com.ibm.xlc111.aix.doc/compiler_ref/
     prag_omp_task.html

The Joys of Concurrent Programming http://www.informit.com/articles/article.aspx?p=30413&seqNum=2



Wilkinson, Barry and Allen Michael. Parallel Programming. Second Edition. 2005 . Pearson Prentice Hall.

J Ekanayake, G. F. 2010. High performance parallel computing with clouds and cloud technologies. Cloud Computing.

A Grama, A. G., G Karypis, V Kumar 2003. Introduction to parallel computing. Addison-Wesley.

                                                                                                          51

More Related Content

What's hot

Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platformsSyed Zaid Irshad
 
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor schedulingShashank Kapoor
 
Superscalar processor
Superscalar processorSuperscalar processor
Superscalar processornoor ul ain
 
multiprocessors and multicomputers
 multiprocessors and multicomputers multiprocessors and multicomputers
multiprocessors and multicomputersPankaj Kumar Jain
 
Unit 1 architecture of distributed systems
Unit 1 architecture of distributed systemsUnit 1 architecture of distributed systems
Unit 1 architecture of distributed systemskaran2190
 
Dichotomy of parallel computing platforms
Dichotomy of parallel computing platformsDichotomy of parallel computing platforms
Dichotomy of parallel computing platformsSyed Zaid Irshad
 
program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecturePankaj Kumar Jain
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureBalaji Vignesh
 
Communication costs in parallel machines
Communication costs in parallel machinesCommunication costs in parallel machines
Communication costs in parallel machinesSyed Zaid Irshad
 
Register transfer language
Register transfer languageRegister transfer language
Register transfer languageSanjeev Patel
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updatedVajira Thambawita
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed SystemSunita Sahu
 

What's hot (20)

Array Processor
Array ProcessorArray Processor
Array Processor
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platforms
 
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor scheduling
 
Superscalar processor
Superscalar processorSuperscalar processor
Superscalar processor
 
multiprocessors and multicomputers
 multiprocessors and multicomputers multiprocessors and multicomputers
multiprocessors and multicomputers
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Unit 1 architecture of distributed systems
Unit 1 architecture of distributed systemsUnit 1 architecture of distributed systems
Unit 1 architecture of distributed systems
 
Multiprocessor system
Multiprocessor system Multiprocessor system
Multiprocessor system
 
Dichotomy of parallel computing platforms
Dichotomy of parallel computing platformsDichotomy of parallel computing platforms
Dichotomy of parallel computing platforms
 
program flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architectureprogram flow mechanisms, advanced computer architecture
program flow mechanisms, advanced computer architecture
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Communication costs in parallel machines
Communication costs in parallel machinesCommunication costs in parallel machines
Communication costs in parallel machines
 
Register transfer language
Register transfer languageRegister transfer language
Register transfer language
 
VLIW Processors
VLIW ProcessorsVLIW Processors
VLIW Processors
 
Parallelism
ParallelismParallelism
Parallelism
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updated
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
 

Similar to Parallel Programming Quicksort

Similar to Parallel Programming Quicksort (20)

Parallel architecture &programming
Parallel architecture &programmingParallel architecture &programming
Parallel architecture &programming
 
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programming
 
Modern Java Concurrency (Devoxx Nov/2011)
Modern Java Concurrency (Devoxx Nov/2011)Modern Java Concurrency (Devoxx Nov/2011)
Modern Java Concurrency (Devoxx Nov/2011)
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Modern Java Concurrency (OSCON 2012)
Modern Java Concurrency (OSCON 2012)Modern Java Concurrency (OSCON 2012)
Modern Java Concurrency (OSCON 2012)
 
CA UNIT IV.pptx
CA UNIT IV.pptxCA UNIT IV.pptx
CA UNIT IV.pptx
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
CUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU ComputingCUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU Computing
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
unit 4.pptx
unit 4.pptxunit 4.pptx
unit 4.pptx
 
unit 4.pptx
unit 4.pptxunit 4.pptx
unit 4.pptx
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
PARALLELISM IN MULTICORE PROCESSORS
PARALLELISM  IN MULTICORE PROCESSORSPARALLELISM  IN MULTICORE PROCESSORS
PARALLELISM IN MULTICORE PROCESSORS
 
22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processing
 
Pthread
PthreadPthread
Pthread
 
4 threads
4 threads4 threads
4 threads
 
Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
 
General Purpose GPU Computing
General Purpose GPU ComputingGeneral Purpose GPU Computing
General Purpose GPU Computing
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 

More from Uday Sharma

Tftp client server communication
Tftp client server communicationTftp client server communication
Tftp client server communicationUday Sharma
 
Wat question papers
Wat question papersWat question papers
Wat question papersUday Sharma
 
Fundamentals of java --- version 2
Fundamentals of java --- version 2Fundamentals of java --- version 2
Fundamentals of java --- version 2Uday Sharma
 
Exercises of java tutoring -version1
Exercises of java tutoring -version1Exercises of java tutoring -version1
Exercises of java tutoring -version1Uday Sharma
 
Java tutor oo ps introduction-version 1
Java tutor  oo ps introduction-version 1Java tutor  oo ps introduction-version 1
Java tutor oo ps introduction-version 1Uday Sharma
 
Logistics final prefinal
Logistics final prefinalLogistics final prefinal
Logistics final prefinalUday Sharma
 
Making Rules Project Management
Making Rules  Project ManagementMaking Rules  Project Management
Making Rules Project ManagementUday Sharma
 
Intelligent Weather Service
Intelligent Weather Service Intelligent Weather Service
Intelligent Weather Service Uday Sharma
 
India presentation
India presentationIndia presentation
India presentationUday Sharma
 

More from Uday Sharma (11)

Tftp client server communication
Tftp client server communicationTftp client server communication
Tftp client server communication
 
Wat question papers
Wat question papersWat question papers
Wat question papers
 
Fundamentals of java --- version 2
Fundamentals of java --- version 2Fundamentals of java --- version 2
Fundamentals of java --- version 2
 
Core java
Core javaCore java
Core java
 
Exercises of java tutoring -version1
Exercises of java tutoring -version1Exercises of java tutoring -version1
Exercises of java tutoring -version1
 
Java tutor oo ps introduction-version 1
Java tutor  oo ps introduction-version 1Java tutor  oo ps introduction-version 1
Java tutor oo ps introduction-version 1
 
Logistics final prefinal
Logistics final prefinalLogistics final prefinal
Logistics final prefinal
 
Presentation1
Presentation1Presentation1
Presentation1
 
Making Rules Project Management
Making Rules  Project ManagementMaking Rules  Project Management
Making Rules Project Management
 
Intelligent Weather Service
Intelligent Weather Service Intelligent Weather Service
Intelligent Weather Service
 
India presentation
India presentationIndia presentation
India presentation
 

Parallel Programming Quicksort

  • 1. Parallel Programming Recent Trends in Software Engineering Prof. Peter Stoehr By Jorge Ortiz Chirag Setty Uday Sharma Kristal Lucero June 9th 2011 1
  • 2. Agenda • Basic concepts and motivational considerations • Definition and advantages • The always need of computational speed • The Flynn's taxonomy • Types of parallel computers • Programming shared memory multiprocessors • Main issues in parallel programming • Threads • Synchronization • Deadlock • Sorting Algorithms – parallel and sequential implementations • Quicksort • Conclusions 2
  • 3. Why to use Parallel Programming? • Basic concepts - Definition and Parallel Programming (Definition) advantages - computational speed • “Is a form of computation in which many calculations are carried out - The Flynn's simultaneously” taxonomy - Parallel computers Types - Shared memory multiprocessors Advantages • Parallel programming • It usually gets more: - Threads ◦ computational power - Lock ◦ fault tolerance - Synchronization ◦ larger ammount of memory. - Deadlock ◦ Speed Up factor •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 3
  • 4. It will be always demand of computational Speed! • Basic concepts - Definition and “Will mankind one day without the net expenditure of energy advantages be able to restore the sun to its full youthfulness even after it had died - computational speed of old age?” - The Flynn's taxonomy The Last Question (1956) – Isaac Asimov - Parallel computers Types - Shared memory multiprocessors Areas such as Numerical Modeling of Scientific and Engineering • Parallel problems like the motion of astronomical bodies and Simulation of programming - Threads large DNA Structures, Global Weather Forecasting, require greater - Lock computational speed than it is currently available. - Synchronization - Deadlock •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 4
  • 5. The Grand Challenge Problems (1) • Basic concepts - Definition and Modeling Motion of Astronomical Bodies advantages - computational speed - The Flynn's taxonomy - Parallel computers Types - Shared memory multiprocessors • Parallel programming - Threads - Lock • Gravitational forces act among N-bodies so as the prediction of the - Synchronization movement can be calculated getting the total force on each body. - Deadlock ◦ For N-bodies →N-1 forces to calculate or N2 •Sorting Algorithm ◦ Optimized implementations →O(N log2 N) parallel and ◦ Calculations are repeated once new positions are obtained. sequential implementation ◦ 1 Gallaxy is almost 1011 stars - Quicksort ◦ O(N log2 N) is almost 1 year for each iteration • Conclusions 5
  • 6. Flynn's taxonomy • Basic concepts - Definition and advantages - computational speed - The Flynn's taxonomy - Parallel computers Types - Shared memory multiprocessors • Parallel programming - Threads - Lock - Synchronization - Deadlock •Sorting Algorithm parallel and SPMD MPMD sequential implementation - Quicksort • Conclusions 6
  • 7. Parallel Computers (1) • Basic concepts - Definition and advantages Nowadays there are two main approaches: - computational speed • Shared Memory Multiprocessor (Considerations) - The Flynn's taxonomy ◦ Data sharing memory and synchronization issues appear. - Parallel computers Types ▪ How the data is shared among processors in the execution - Shared memory time? multiprocessors • Parallel ▪ larger shared memory machines do not satisfy UMA. Why? programming • Some processors are «nearer to» … and those ones can access - Threads the memory faster. - Lock - Synchronization - Deadlock •Sorting Algorithm parallel and sequential implementation Processors are faster Memory access is not - Quicksort as fast still • Conclusions 7
  • 8. Parallel Computers (2) • Basic concepts - Definition and ...Shared Memory Multiprocessor (Considerations) advantages - computational ◦ The vendors have built computers with hierarchical memory speed systems - The Flynn's ◦ SMPs have some memory that is not shared taxonomy - Parallel computers Types - Shared memory multiprocessors • Parallel programming - Threads - Lock - Synchronization - Deadlock •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 8
  • 9. Parallel Computers (3) • Basic concepts - Definition and • Networked Computers as a computing Platform. advantages - computational Efforts to build parallel computer systems by using networked speed - The Flynn's computers as a cheaper alternative to expensive supercomputers, taxonomy started in the early 1990s - Parallel computers Types - Shared memory multiprocessors • Parallel programming - Threads - Lock - Synchronization - Deadlock •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 9
  • 10. Programming Shared Memory Multiprocessors (1) • Basic concepts - Definition and advantages - computational 1. Thread libraries - programmer decomposes program into speed individual parallel sequences, (threads), each being able to access - The Flynn's shared variables declared outside threads. taxonomy - Parallel computers Types 2. Higher level library functions and preprocessor compiler - Shared memory directives to declare shared variables and specify parallelism. multiprocessors • Parallel 3. Use a modified sequential programming language programming - Threads Added syntax to declare shared variables and specify parallelism. - Lock Eg. UPC (Unified Parallel C) - needs a UPC compiler. - Synchronization - Deadlock •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 10
  • 11. Programming Shared Memory Multiprocessors (1) • Basic concepts - Definition and advantages - computational speed 4. Use a specially designed parallel programming language - The Flynn's taxonomy with syntax to express parallelism. Compiler automatically creates - Parallel executable code for each processor (not now common). computers Types - Shared memory multiprocessors 5. Use a regular sequential programming language • Parallel such as C and ask parallelizing compiler to convert it into parallel programming executable code. (not now common). - Threads - Lock - Synchronization - Deadlock •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 11
  • 12. Thread • Basic concepts - Definition and • The threads that are executed independently to advantages each other are called as asynchronous threads. - computational speed - The Flynn's • Problems: taxonomy - Parallel  Two or more threads share the same resource while computers Types - Shared memory only one of them can access the resource at one time. multiprocessors  If the producer and the consumer are sharing the same • Parallel programming kind of data in a program - Threads  then either producer may produce the data faster or - Lock - Synchronization consumer may retrieve an order of data and process it - Deadlock without its existing •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 12
  • 13. Thread • Basic concepts - Definition and Start advantages - computational speed Thread1 Thread2 - The Flynn's taxonomy - Parallel Shared computers Types - Shared memory multiprocessors Variable and Method • Parallel programming - Threads •Java uses the keyword synchronized to - Lock - Synchronization synchronize them and intercommunicate to each - Deadlock other. •Sorting Algorithm parallel and •a mechanism which allows two or more threads to sequential share all the available resources in a sequential implementation - Quicksort manner • Conclusions 13
  • 14. Lock • Basic concepts - Definition and • Lock term refers to the access granted to a particular advantages - computational thread that can access the shared resources. speed - The Flynn's taxonomy • Java has build-in lock that only comes in action when - Parallel the object has synchronized method code computers Types - Shared memory multiprocessors • no other thread can acquire the lock until the lock is • Parallel not released by first thread programming - Threads • Acquire the lock means the thread currently in - Lock - Synchronization synchronized method and released the lock means - Deadlock •Sorting Algorithm exit the synchronized method. parallel and sequential implementation - Quicksort • Conclusions 14
  • 15. Important Points • Basic concepts • Points for synchronization or lock : - Definition and advantages - computational  Only methods (or blocks) can be synchronized speed - The Flynn's  Each object has just one lock taxonomy - Parallel  All methods in a class need not to be synchronized computers Types - Shared memory  If a thread goes to sleep, it holds any locks it has ? it multiprocessors doesn't release them. • Parallel programming • Two ways to synchronized the execution of code: - Threads - Lock  Synchronized Methods - Synchronization - Deadlock  Synchronized Blocks •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 15
  • 16. Synchronization - Barrier • Basic concepts - Definition and • We could start multiple threads each time advantages around the loop, and wait for them all to - computational speed complete - The Flynn's taxonomy - Parallel computers Types - Shared memory • This is inefficient, since we are continually multiprocessors • Parallel spawning new processes programming - Threads - Lock - Synchronization • This is much less efficient than having looping - Deadlock •Sorting Algorithm n processes and implementing synchronization parallel and sequential implementation - Quicksort • Conclusions 16
  • 17. Synchronization – Barrier • Basic concepts - Definition and • A barrier, a basic mechanism for synchronizing advantages processes-inserted at the point in each process - computational speed where it must wait. - The Flynn's taxonomy - Parallel • All processes can continue from this point when computers Types all the processes have reached it - Shared memory multiprocessors • Parallel • In message-passing systems, barriers are often programming provided with library routines: - Threads - Lock - Synchronization - Deadlock •Sorting Algorithm • MPI has the barrier routine, MPI_Barrier() parallel and sequential implementation - Quicksort • Conclusions • PVM has a similar barrier routine, pvm_barrier() 17
  • 18. Barrier Example • Basic concepts - Definition and Process advantages Pn-1 - computational P0 P1 P2 speed - The Flynn's taxonomy - Parallel Active computers Types - Shared memory Time multiprocessors • Parallel programming Waiting Barrier - Threads - Lock - Synchronization - Deadlock •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 18
  • 19. Counter Implementation • Basic concepts - Definition and Centralized counter implementation (sometimes called a linear barrier) advantages - computational speed - The Flynn's taxonomy Process - Parallel P1 computers Types P0 Pn-1 counter - Shared memory multiprocessors • Parallel programming - Threads - Lock Increment n Barrier Barrier Barrier - Synchronization check for n - Deadlock •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 19
  • 20. Tree Implementation • Basic concepts - Definition and • More efficient. Suppose there are eight advantages processes, P0, P1, P2, P3, P4, P5, P6, and P7: - computational speed - The Flynn's • First stage: P1 sends message to P0; taxonomy - Parallel • P3 sends message to P2; computers Types • P5 sends message to P4; - Shared memory multiprocessors • Parallel programming • P7 sends message to P6; - Threads - Lock • Second stage: P2 sends message to P0; - Synchronization - Deadlock •Sorting Algorithm • P6 sends message to P4; parallel and sequential • Third stage: P4 sends message to P0; implementation - Quicksort • P0 terminates arrival phase; • Conclusions 20
  • 21. Tree Implementation • Basic concepts P0 P1 P2 P3 P4 P5 P6 P7 - Definition and advantages - computational speed Arrival - The Flynn's At taxonomy - Parallel Barrier Synchronization computers Types Message - Shared memory multiprocessors • Parallel programming - Threads - Lock - Synchronization - Deadlock •Sorting Algorithm Departure parallel and From sequential implementation Barrier - Quicksort • Conclusions 21
  • 22. Data parallel computations • Basic concepts - Definition and • Synchronisation is required advantages - computational speed - The Flynn's taxonomy • Same operation in different data element - Parallel computers Types - Shared memory multiprocessors • Data parallel programming is more convivenient • Parallel programming because: - Threads - Lock • Ease of programming - Synchronization - Deadlock •Sorting Algorithm parallel and sequential • Scale easily to larger problems implementation - Quicksort • Conclusions 22
  • 23. Data parallel computations • Basic concepts - Definition and • Many numeric and non-numeric problem can be advantages cast in data parallel form - computational speed - The Flynn's taxonomy - Parallel • Example : SIMD Computers computers Types - Shared memory multiprocessors • Parallel programming • SIMD computers : - Threads - Lock • Same instruction executed on different - Synchronization - Deadlock processor but on different data type •Sorting Algorithm parallel and • Synchronisation is built into the hardware sequential implementation - Quicksort • Conclusions 23
  • 24. Example • Basic concepts - Definition and •For(i=0;i<n;i++) advantages - computational •a[i]=a[i]+k speed - The Flynn's taxonomy a[]=a[]+ - Parallel computers Types k; - Shared memory multiprocessors • Parallel programming - Threads - Lock - Synchronization a[n-1]=a[n- a[0]=a[0]+k a[1]=a[1]+k - Deadlock 1]+k •Sorting Algorithm parallel and sequential implementation A[0] A[1] A[n-1] - Quicksort • Conclusions 24
  • 25. Barrier Requirement • Basic concepts - Definition and • Data parallel technique is applied to Multiprocessor advantages or Multicomputer . - computational speed - The Flynn's taxonomy - Parallel • The whole construct should not be completed before computers Types - Shared memory the instances thus a barrier is required . multiprocessors • Parallel programming - Threads - Lock • Forall(i=0; i<n;i++) - Synchronization - Deadlock a[i]=a[i] +k; •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 25
  • 26. Butterfly barrier • Basic concepts - Definition and Sending a Message advantages - computational speed - The Flynn's Process Process taxonomy - Parallel 1 Receiving confirmation 2 computers Types message - Shared memory multiprocessors • Parallel programming - Threads -Send a Message to partner process - Lock - Synchronization -Wait until message is received from that process - Deadlock •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 26
  • 27. Stages of Butterfly Barrier • Basic concepts - Definition and If we have n=2 processes we build a barrier in k stages 1,2,…,k. advantages - computational At stage s processes synchronize with a partner that is 2s-1 steps away. speed - The Flynn's These are interleaved so that no process can pass through all stages in taxonomy the barrier until all process have reached it. - Parallel computers Types If n isn’t a power of 2 we can use the next largest 2k, but this isn’t - Shared memory efficient multiprocessors and the system is no longer symmetric. • Parallel programming - Threads - Lock - Synchronization - Deadlock •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 27
  • 28. Working model of Butterfly Barrier 0 1 2 3 4 5 6 7 • Basic concepts - Definition and Round 0 advantages - computational speed - The Flynn's Round 1 taxonomy - Parallel computers Types - Shared memory multiprocessors • Parallel programming Round 2 - Threads - Lock - Synchronization - Deadlock •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 28 28
  • 29. Virtual process in Butterfly Barrier When the number of thread is not power of 2? Virtual process • Basic concepts 0 1 2 3 4 (2) (1) (0 - Definition and advantages ) - computational Round 0 speed - The Flynn's taxonomy - Parallel Round 1 computers Types - Shared memory multiprocessors • Parallel programming - Threads Round 2 - Lock - Synchronization - Deadlock •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 29
  • 30. Local Synchronization • Basic concepts Useful when calculation take varying amount of time and delays can occu - Definition and advantages randomly at any processor or task. - computational Create Batch speed Process (Pi-1) Process Pi Process (Pi+1) - The Flynn's taxonomy - Parallel computers Types Recv(Pi); Send(Pi-1) Recv(Pi) - Shared memory multiprocessors Send(Pi+1) • Parallel programming Send(Pi); Recv(Pi-1) Send (Pi) - Threads - Lock Recv(Pi+1) - Synchronization - Deadlock •Sorting Algorithm parallel and sequential Note: Not a perfect three – process barrier Pi-1 will only implementation synchronize with Pi and continue as soon as Pi allow - Quicksort • Conclusions 30
  • 31. Synchronous Iteration (Synchronous Parallelism) • Basic concepts • Synchronous iteration: this term is used to describe a - Definition and advantages situation where a problem is solved by iteration and - computational • each iteration step is composed of several processes that start speed together at the beginning of the iteration step and - The Flynn's • next iteration step cannot begin until all processes have finished taxonomy - Parallel the current iteration step. computers Types Iteration 1 Iteration 2 Iteration 3 Iteration n - Shared memory multiprocessors • Parallel programming Steps 0 to Steps 0 to Steps 0 to Steps 0 to n-1 n-1 n-1 n-1 - Threads - Lock - Synchronization - Deadlock •Sorting Algorithm Iteration process diagram parallel and sequential implementation - Quicksort • Conclusions 31
  • 32. Example 1: Synchronous Iteration • Basic concepts Equation: (4+6) –(2*3) - Definition and advantages - computational speed - The Flynn's - 10 - 10-6=4 - 10-6=4 taxonomy - Parallel computers Types - Shared memory * 2*3=6 + 4+6=10 + 4+6=10 * 2*3=6 multiprocessors • Parallel a programming 2 3 4 6 4 6 2 3 - Threads - Lock - Synchronization - Deadlock Squential solution : Solve Parallel solution : Solve •Sorting Algorithm above equation linear above equation parallel way and start with solve way and from both side parallel and equation prioriy wise of tree sequential implementation - Quicksort • Conclusions 32
  • 33. Example 2: Synchronous Iteration • Basic concepts - Definition and • Solving a General System of Linear Equations by advantages Iteration - computational speed • Suppose the equations are of a general form with n - The Flynn's equations and n unknowns taxonomy - Parallel computers Types - Shared memory multiprocessors • Parallel programming - Threads - Lock - Synchronization - Deadlock •Sorting Algorithm parallel and sequential implementation where the unknowns are x0,x1,x2,… xn-1 (0 <=i< n). - Quicksort • Conclusions 33
  • 34. Illustrating Jacobi Iteration • Basic concepts 7x1 + 3x2 + x3 = 18 - Definition and advantages 2x1 - 9x2 + 4x3 = 12 - computational x1 - 4x2 + 12x3 = 6 X11 =2.571 - 0.429(0) - 0.143(0) = 2.571 X12 = - 1.333+0.222 (0) +0.444 (0) = -1.333 speed X13 = 0.500 - 0.083 (0) + 0.333 (0) = 0.500 - The Flynn's taxonomy X1 = 18/7 - 3/7x2 - 1/7x3 - Parallel X2 = -12/9 + 2/9x1 + 4/9x3 computers Types - Shared memory X3 = 6/12 - 1/12x1 + 4/12x2 multiprocessors Use as the initial estimates: x1(0) = x2(0) = x3(0) = 0. Insert these • Parallel estimates into these equations yielding new programming estimates of the parameters. - Threads - Lock The estimated results after each iteration are shown as: - Synchronization Iteration x(1) x(2) x(3) - Deadlock 1 2.57143 - 1.33333 0.50000 •Sorting Algorithm 2 3.07143 - 0.53968 - 0.15873 parallel and 3 2.82540 - 0.72134 0.06415 sequential 4 2.87141 - 0.67695 0.02410 implementation 5 2.85811 - 0.68453 0.03506 - Quicksort 6 2.85979 - 0.68261 0.03365 • Conclusions 34
  • 35. Sequential Code • Basic concepts 7x1 + 3x2 + x3 = 18 for (i=0; i<n; i++) 2x1 - 9x2 + 4x3 = 12 - Definition and advantages x[i] = b[i]; x1 - 4x2 + 12x3 = 6 - computational for (iter = 0; iter < limit; Use as the initial estimates: speed - The Flynn's iter++){ x1(0) = x2(0) = x3(0) = 0. Insert these taxonomy for (i=0; i<n; i++){ estimates into these equations yielding new estimates of the parameters. - Parallel sum = 0; computers Types - Shared memory for (j=0; j<n; j++) Iteration 1: newx[0] = (18 – 0)/7 = 2.571 multiprocessors if (i != j) • Parallel sum = sum + newx[1] = - (12 – 0)/9 = -1.333 programming a[i][j]*x[j]; newx[2] = (6 – 0)/12 = 0.500 - Threads - Lock x1(1) = 2.571 x2(1) = -1.333 x3(1) = 0.500 - Synchronization newx[i] = (b[i]- sum)/a[i][i] - Deadlock Iteration 2: •Sorting Algorithm } newx[0] = 2.571 +0.500357= 3.071 parallel and sequential for (i=0; i<n; i++) newx[1] = -1.333+0.792762 = - implementation x[i] = newx[i]; 0.540 - Quicksort } newx[2] = 0.500 -0.657282 = - 0.158 • Conclusions 35
  • 36. Parallel Code • Suppose we have a process Pi for each unknown xi; the code for process Pi may be: • Basic concepts x[i] = b[i] - Definition and Iteration 1 for (iter = 0; iter < limit; iter++) Broadcast_Rv advantages { X1[0]=2.571 X1[0]=2.571 - computational sum = -a[i][i] * x[i]; speed for (j = 0; j < n; j++) X1[1]=-1.333 X1[1]=- - The Flynn's sum = sum + a[i][j] * x[j]; 1.333 taxonomy new_x[i] = (b[i] - sum) /a[i][i]; X1[2]=0.5 X1[2]=0.5 - Parallel broadcast_receive(&new_x[i]); Iteration 2 X2[0] = computers Types global_barrier(); 3.071 - Shared memory } X2[0] = 3.071 X2[1]= - multiprocessors • broadcast receive() is used 0.540 X2[1]= - X2[2]=- • Parallel here 0.540 0.158 programming (1) to send the newly computed X2[2]=- X3[0] = - Threads value of x[i] from process Pi to 2.825 0.158 every other process and (2) to X3[1]= - - Lock Iteration 3 collect data broadcasted from any 0.721 - Synchronization other processes to process Pi. X2[0] = X3[2]= - Deadlock 2.825 0.064 •Sorting Algorithm X2[1]= - parallel and 0.721 sequential X2[2]=- implementation 0.064 - Quicksort • Conclusions Recieve Send 36
  • 37. A New Message-Passing Operation - Allgather. • Basic concepts - Definition and • Broadcast and gather values in one composite construction. advantages - computational speed - The Flynn's taxonomy - Parallel computers Types - Shared memory multiprocessors • Parallel programming - Threads - Lock - Synchronization - Deadlock •Sorting Algorithm parallel and sequential • Note: MPI Allgather() is also a global barrier, so you do not need to implementation • add global_barrier(). - Quicksort • Conclusions 37
  • 38. Solution By Iteration • Basic concepts - Definition and advantages • Iterative methods - computational • Applicable when direct methods require excessive speed - The Flynn's computations taxonomy • Have the advantage of small memory requirements - Parallel • May not always converge/terminate computers Types - Shared memory multiprocessors • An iterative method begins with an initial guess for the • Parallel unknowns programming • e.g., xi=bi - Threads - Lock • Iterations are continued until sufficiently accurate values - Synchronization obtained for the unknowns - Deadlock •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 38
  • 39. Deadlock • Basic concepts A set of processes or threads is deadlocked when each - Definition and process or thread is waiting for a resource to be freed which advantages - computational is controlled by another process. Simple deadlock situation speed example. - The Flynn's taxonomy - Parallel P1 R1 computers Types - Shared memory multiprocessors • Parallel programming - Threads R2 P2 - Lock - Synchronization - Deadlock •Sorting Algorithm From Process to Resource parallel and sequential and Vice Versa implementation - Quicksort R Resource P Process • Conclusions 39
  • 40. Deadlock • Basic concepts - Definition and When a pair of processes each send advantages - computational and receive from each other, deadlock may occur. speed - The Flynn's taxonomy - Parallel computers Types - Shared memory multiprocessors Process P1 Process P2 • Parallel programming send() send() - Threads . . - Lock . . - Synchronization . . - Deadlock •Sorting Algorithm recv() recv() parallel and sequential implementation - Quicksort • Conclusions 40
  • 41. Deadlock • Basic concepts - Definition and Solution advantages - computational  Removing the mutual exclusion condition speed - The Flynn's taxonomy - Parallel computers Types  Or removing the “hold and wait” condition. requiring processes to request all the resources - Shared memory multiprocessors • Parallel they will need before starting up programming - Threads - Lock - Synchronization - Deadlock  Employ timeouts to recover from deadlock •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 41
  • 42. Sorting Algorithms • Basic concepts  Sorting numbers – that is, rearranging a list of numbers into - Definition and increasing (or decreasing) order – is a fundamental operation advantages - computational that appears in many applications. speed - The Flynn's  Sorting is also applicable to non-numerical values; for taxonomy example, rearranging strings into alphabetical order. - Parallel computers Types  Sorting is also often done because it makes searches and other - Shared memory multiprocessors operations easier. • Parallel  Many parallel sorting algorithms and parallel implementations programming of sequential sorting algorithms are synchronous algorithms. - Threads - Lock - Synchronization - Deadlock Here we select one sequential algorithms for conversion to a •Sorting Algorithm parallel and parallel implementation. sequential implementation Quicksort - Quicksort • Conclusions 42
  • 43. Quicksort Pivot • Basic concepts 4 2 7 8 5 1 3 6 P0 - Definition and advantages - computational P0 P4 speed 3 2 1 4 5 7 8 6 - The Flynn's taxonomy - Parallel 2 1 3 4 5 7 8 6 P0 P2 P4 P6 computers Types - Shared memory multiprocessors 1 2 3 6 7 8 P0 P1 P6 P7 • Parallel programming Sorted list Process allocation - Threads - Lock - Synchronization 1. Select the pivot  Recursive algorithm - Deadlock •Sorting Algorithm 2. Split  Recursive calls -> different parallel and process 3. Repeat the procedure on the sequential implementation sublists - Quicksort • Conclusions 43
  • 44. • Basic concepts - Definition and advantages - computational speed Quicksort using OpenMP - The Flynn's taxonomy - Parallel computers Types - Shared memory multiprocessors • Parallel programming Demo - Threads - Lock - Synchronization - Deadlock •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 44
  • 45. Quicksort using OpenMP • Basic concepts  Quicksort sequential implementation - Definition and advantages qsort_seq.c - computational speed This program takes one integer parameter, num_elems. The - The Flynn's taxonomy num_elems parameter specifies the size of the array to be - Parallel sorted computers Types - Shared memory $ ./qsort_seq 5000000 multiprocessors • Parallel  Quicksort parallel implementation using OpenMP programming - Threads qsort_task.c - Lock - Synchronization This program takes two integer parameters: - Deadlock num_elems, low_limit. The num_elems parameter specifies •Sorting Algorithm the size of the array to be sorted. The quick_sort function is parallel and called recursively, causing many tasks to be generated until sequential implementation the low_limit threshold is reached - Quicksort $ ./qsort_task 5000000 100 • Conclusions 45
  • 46. Quicksort using OpenMP • Basic concepts Set the OpenMP environment variables - Definition and advantages - computational OMP_NUM_THREADS 2  sets the number of threads to speed use for parallel regions - The Flynn's taxonomy OMP_WAIT_POLICY ACTIVE  provides a hint to an - Parallel computers Types OpenMP implementation about the desired behavior of - Shared memory waiting threads. The ACTIVE value specifies that waiting multiprocessors threads should mostly be active, i.e., consume processor • Parallel programming cycles, while waiting. - Threads - Lock OMP_DYNAMIC FALSE  controls dynamic adjustment of - Synchronization the number of threads to use for executing parallel - Deadlock regions. If the environment variable is set to false, the •Sorting Algorithm parallel and dynamic adjustment of the number of threads is sequential disabled implementation - Quicksort • Conclusions 46
  • 48. Results • Basic concepts - Definition and 1.000.000 2.000.000 5.000.000 10.000.000 advantages Sequential Quicksort 0,234601 0,495793 1,307625 2,707997 - computational Parallel Quicksort 0,124090 0,259852 0,687688 1,432589 speed - The Flynn's 3.000000 taxonomy - Parallel computers Types 2.500000 50% faster! - Shared memory multiprocessors 2.000000 • Parallel programming 1.500000 Sequential Quicksort - Threads Parallel Quicksort - Lock - Synchronization 1.000000 - Deadlock •Sorting Algorithm 0.500000 parallel and sequential 0.000000 implementation 1,000,000 2,000,000 5,000,000 10,000,000 - Quicksort • Conclusions 48
  • 49. Conclusions • Basic concepts - Definition and  The demand of computing power and speed advantages increase every day. - computational speed - The Flynn's  Programs that are properly designed to take taxonomy - Parallel advantage of parallelism can execute faster computers Types than their sequential counterparts, which is - Shared memory multiprocessors an advantage. • Parallel programming  Some algorithms cannot be parallelized - Threads - Lock - Synchronization  Parallelization offers a new way to increase - Deadlock performance. •Sorting Algorithm parallel and sequential implementation - Quicksort • Conclusions 49
  • 51. References OpenMP Specification http://www.openmp.org/mp-documents/spec30.pdf Reap the Benefits of Multithreading without all the work http://msdn.microsoft.com/en-us/magazine/cc163717.aspx OpenMP http://www.metz.supelec.fr/metz/personnel/vialle/course/SI-PP/notes-de-cours-specifiques/PP-02-OpenMP- 6spp.pdf Parallel Programming http://coitweb.uncc.edu/%7Eabw/ITCS4145F10/ Sorting Algorithms http://www.c.happycodings.com/Sorting_Searching/index.html OpenMP Exercise https://computing.llnl.gov/tutorials/openMP/exercise.html OpenMP Recursive Routines http://www.openmp.org/pipermail/omp/2005/000145.html OpenMP http://en.wikipedia.org/wiki/OpenMP http://publib.boulder.ibm.com/infocenter/comphelp/v111v131/index.jsp?topic=/com.ibm.xlc111.aix.doc/compiler_ref/ prag_omp_task.html The Joys of Concurrent Programming http://www.informit.com/articles/article.aspx?p=30413&seqNum=2 Wilkinson, Barry and Allen Michael. Parallel Programming. Second Edition. 2005 . Pearson Prentice Hall. J Ekanayake, G. F. 2010. High performance parallel computing with clouds and cloud technologies. Cloud Computing. A Grama, A. G., G Karypis, V Kumar 2003. Introduction to parallel computing. Addison-Wesley. 51

Editor's Notes

  1. A deadlock refers to a specific condition when two or more processes are each waiting for the other to release a resource. Deadlock is a common problem in multiprocessing where many processes share a mutually exclusive resource.
  2. Deadlock will occur if both processes perform the send, using synchronous routines first (or blocking routines without sufficient buffering). This is because neither will return; they will wait for matching receives that are never reached.
  3. Sorting is a fundamental operation that appears in many applications.Sorting is also often done because it makes searches and other operations easier but it has a processing costNow we’d like to show that Quicksort a well known fast algorithm can be improve using parallel programming and reduce the processing cost
  4. Quicksort works first selecting one number called pivot, and it’s compared to the other number in the listIf the number is less than the pivot, it is placed in one sublist. Otherwise, it is placed in the other sublist. The procedure is repeated on the sublists. Quicksort is usually described by a recursive algorithm. One obvious way to parallelize quicksort is to start with one processor and pass on one of the recursive calls to another processor while keeping the other recursive call to perform.
  5. In the quick_sort function, the original array ispartitioned into two parts. Each part is handled by the quick_sort function recursively. Since the twoparts of the array are manipulated independently, work can execute concurrently by using OpenMPparallel tasksThe par_quick_sort function has a parallel construct that contains a single construct. In the single construct, there is a call to the quick_sort function. Two tasks are generated in the quick_sort function. The quick_sort function is called recursively, causing many tasks to be generated until the low limit threshold is reached.The execution model of the qsort_task program can be described as a single producer, multiple consumer model. The thread executing the single region generates tasks; the threads in the team execute these tasks. All the tasks generated are guaranteed to complete by the time the threads exit thesingle region. When a thread finishes executing a task, it grabs a new task to execute. In this way, all threads can execute available tasks without barrier synchronization, thereby improving load balancing.
  6. One of the first thing we can see is that with the parallel program, both cpu are working
  7. We made some tests and we found that the parallel quicksort has a best performance than sequential quicksort