1. Parallel Programming
Recent Trends in Software Engineering
Prof. Peter Stoehr
By
Jorge Ortiz
Chirag Setty
Uday Sharma
Kristal Lucero
June 9th 2011
1
2. Agenda
• Basic concepts and motivational considerations
• Definition and advantages
• The always need of computational speed
• The Flynn's taxonomy
• Types of parallel computers
• Programming shared memory multiprocessors
• Main issues in parallel programming
• Threads
• Synchronization
• Deadlock
• Sorting Algorithms – parallel and sequential implementations
• Quicksort
• Conclusions 2
3. Why to use Parallel
Programming?
• Basic concepts
- Definition and Parallel Programming (Definition)
advantages
- computational
speed • “Is a form of computation in which many calculations are carried out
- The Flynn's simultaneously”
taxonomy
- Parallel
computers Types
- Shared memory
multiprocessors
Advantages
• Parallel
programming • It usually gets more:
- Threads ◦ computational power
- Lock ◦ fault tolerance
- Synchronization ◦ larger ammount of memory.
- Deadlock ◦ Speed Up factor
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 3
4. It will be always demand of
computational Speed!
• Basic concepts
- Definition and “Will mankind one day without the net expenditure of energy
advantages
be able to restore the sun to its full youthfulness even after it had died
- computational
speed of old age?”
- The Flynn's
taxonomy The Last Question (1956) – Isaac Asimov
- Parallel
computers Types
- Shared memory
multiprocessors
Areas such as Numerical Modeling of Scientific and Engineering
• Parallel
problems like the motion of astronomical bodies and Simulation of
programming
- Threads
large DNA Structures, Global Weather Forecasting, require greater
- Lock computational speed than it is currently available.
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 4
5. The Grand Challenge
Problems (1)
• Basic concepts
- Definition and Modeling Motion of Astronomical Bodies
advantages
- computational
speed
- The Flynn's
taxonomy
- Parallel
computers Types
- Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock • Gravitational forces act among N-bodies so as the prediction of the
- Synchronization movement can be calculated getting the total force on each body.
- Deadlock ◦ For N-bodies →N-1 forces to calculate or N2
•Sorting Algorithm ◦ Optimized implementations →O(N log2 N)
parallel and
◦ Calculations are repeated once new positions are obtained.
sequential
implementation ◦ 1 Gallaxy is almost 1011 stars
- Quicksort ◦ O(N log2 N) is almost 1 year for each iteration
• Conclusions 5
7. Parallel Computers (1)
• Basic concepts
- Definition and
advantages Nowadays there are two main approaches:
- computational
speed • Shared Memory Multiprocessor (Considerations)
- The Flynn's
taxonomy ◦ Data sharing memory and synchronization issues appear.
- Parallel
computers Types ▪ How the data is shared among processors in the execution
- Shared memory time?
multiprocessors
• Parallel ▪ larger shared memory machines do not satisfy UMA. Why?
programming • Some processors are «nearer to» … and those ones can access
- Threads the memory faster.
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation Processors are faster Memory access is not
- Quicksort as fast still
• Conclusions 7
8. Parallel Computers (2)
• Basic concepts
- Definition and ...Shared Memory Multiprocessor (Considerations)
advantages
- computational
◦ The vendors have built computers with hierarchical memory
speed systems
- The Flynn's ◦ SMPs have some memory that is not shared
taxonomy
- Parallel
computers Types
- Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 8
9. Parallel Computers (3)
• Basic concepts
- Definition and • Networked Computers as a computing Platform.
advantages
- computational
Efforts to build parallel computer systems by using networked
speed
- The Flynn's computers as a cheaper alternative to expensive supercomputers,
taxonomy started in the early 1990s
- Parallel
computers Types
- Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 9
10. Programming Shared Memory
Multiprocessors (1)
• Basic concepts
- Definition and
advantages
- computational 1. Thread libraries - programmer decomposes program into
speed individual parallel sequences, (threads), each being able to access
- The Flynn's shared variables declared outside threads.
taxonomy
- Parallel
computers Types 2. Higher level library functions and preprocessor compiler
- Shared memory directives to declare shared variables and specify parallelism.
multiprocessors
• Parallel
3. Use a modified sequential programming language
programming
- Threads Added syntax to declare shared variables and specify parallelism.
- Lock Eg. UPC (Unified Parallel C) - needs a UPC compiler.
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 10
11. Programming Shared Memory
Multiprocessors (1)
• Basic concepts
- Definition and
advantages
- computational
speed
4. Use a specially designed parallel programming language
- The Flynn's
taxonomy with syntax to express parallelism. Compiler automatically creates
- Parallel executable code for each processor (not now common).
computers Types
- Shared memory
multiprocessors 5. Use a regular sequential programming language
• Parallel such as C and ask parallelizing compiler to convert it into parallel
programming executable code. (not now common).
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 11
12. Thread
• Basic concepts
- Definition and
• The threads that are executed independently to
advantages each other are called as asynchronous threads.
- computational
speed
- The Flynn's • Problems:
taxonomy
- Parallel Two or more threads share the same resource while
computers Types
- Shared memory
only one of them can access the resource at one time.
multiprocessors
If the producer and the consumer are sharing the same
• Parallel
programming kind of data in a program
- Threads
then either producer may produce the data faster or
- Lock
- Synchronization consumer may retrieve an order of data and process it
- Deadlock without its existing
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 12
13. Thread
• Basic concepts
- Definition and Start
advantages
- computational
speed Thread1 Thread2
- The Flynn's
taxonomy
- Parallel
Shared
computers Types
- Shared memory
multiprocessors Variable and Method
• Parallel
programming
- Threads •Java uses the keyword synchronized to
- Lock
- Synchronization
synchronize them and intercommunicate to each
- Deadlock other.
•Sorting Algorithm
parallel and •a mechanism which allows two or more threads to
sequential share all the available resources in a sequential
implementation
- Quicksort manner
• Conclusions 13
14. Lock
• Basic concepts
- Definition and • Lock term refers to the access granted to a particular
advantages
- computational thread that can access the shared resources.
speed
- The Flynn's
taxonomy
• Java has build-in lock that only comes in action when
- Parallel the object has synchronized method code
computers Types
- Shared memory
multiprocessors
• no other thread can acquire the lock until the lock is
• Parallel not released by first thread
programming
- Threads • Acquire the lock means the thread currently in
- Lock
- Synchronization synchronized method and released the lock means
- Deadlock
•Sorting Algorithm exit the synchronized method.
parallel and
sequential
implementation
- Quicksort
• Conclusions 14
15. Important Points
• Basic concepts • Points for synchronization or lock :
- Definition and
advantages
- computational
Only methods (or blocks) can be synchronized
speed
- The Flynn's
Each object has just one lock
taxonomy
- Parallel
All methods in a class need not to be synchronized
computers Types
- Shared memory If a thread goes to sleep, it holds any locks it has ? it
multiprocessors doesn't release them.
• Parallel
programming • Two ways to synchronized the execution of code:
- Threads
- Lock Synchronized Methods
- Synchronization
- Deadlock Synchronized Blocks
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 15
16. Synchronization - Barrier
• Basic concepts
- Definition and
• We could start multiple threads each time
advantages around the loop, and wait for them all to
- computational
speed complete
- The Flynn's
taxonomy
- Parallel
computers Types
- Shared memory • This is inefficient, since we are continually
multiprocessors
• Parallel
spawning new processes
programming
- Threads
- Lock
- Synchronization • This is much less efficient than having looping
- Deadlock
•Sorting Algorithm n processes and implementing synchronization
parallel and
sequential
implementation
- Quicksort
• Conclusions 16
17. Synchronization –
Barrier
• Basic concepts
- Definition and
• A barrier, a basic mechanism for synchronizing
advantages processes-inserted at the point in each process
- computational
speed where it must wait.
- The Flynn's
taxonomy
- Parallel
• All processes can continue from this point when
computers Types all the processes have reached it
- Shared memory
multiprocessors
• Parallel
• In message-passing systems, barriers are often
programming provided with library routines:
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm • MPI has the barrier routine, MPI_Barrier()
parallel and
sequential
implementation
- Quicksort
• Conclusions
• PVM has a similar barrier routine, pvm_barrier()
17
18. Barrier Example
• Basic concepts
- Definition and Process
advantages Pn-1
- computational P0 P1 P2
speed
- The Flynn's
taxonomy
- Parallel Active
computers Types
- Shared memory
Time
multiprocessors
• Parallel
programming Waiting Barrier
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 18
19. Counter Implementation
• Basic concepts
- Definition and Centralized counter implementation (sometimes called a linear barrier)
advantages
- computational
speed
- The Flynn's
taxonomy Process
- Parallel P1
computers Types
P0 Pn-1
counter
- Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock Increment n Barrier Barrier Barrier
- Synchronization check for n
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 19
20. Tree Implementation
• Basic concepts
- Definition and
• More efficient. Suppose there are eight
advantages processes, P0, P1, P2, P3, P4, P5, P6, and P7:
- computational
speed
- The Flynn's • First stage: P1 sends message to P0;
taxonomy
- Parallel • P3 sends message to P2;
computers Types
• P5 sends message to P4;
- Shared memory
multiprocessors
• Parallel
programming • P7 sends message to P6;
- Threads
- Lock • Second stage: P2 sends message to P0;
- Synchronization
- Deadlock
•Sorting Algorithm • P6 sends message to P4;
parallel and
sequential • Third stage: P4 sends message to P0;
implementation
- Quicksort • P0 terminates arrival phase;
• Conclusions 20
22. Data parallel computations
• Basic concepts
- Definition and
• Synchronisation is required
advantages
- computational
speed
- The Flynn's
taxonomy
• Same operation in different data element
- Parallel
computers Types
- Shared memory
multiprocessors • Data parallel programming is more convivenient
• Parallel
programming because:
- Threads
- Lock • Ease of programming
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
• Scale easily to larger problems
implementation
- Quicksort
• Conclusions 22
23. Data parallel computations
• Basic concepts
- Definition and
• Many numeric and non-numeric problem can be
advantages cast in data parallel form
- computational
speed
- The Flynn's
taxonomy
- Parallel • Example : SIMD Computers
computers Types
- Shared memory
multiprocessors
• Parallel
programming • SIMD computers :
- Threads
- Lock • Same instruction executed on different
- Synchronization
- Deadlock processor but on different data type
•Sorting Algorithm
parallel and • Synchronisation is built into the hardware
sequential
implementation
- Quicksort
• Conclusions 23
25. Barrier Requirement
• Basic concepts
- Definition and
• Data parallel technique is applied to Multiprocessor
advantages or Multicomputer .
- computational
speed
- The Flynn's
taxonomy
- Parallel • The whole construct should not be completed before
computers Types
- Shared memory the instances thus a barrier is required .
multiprocessors
• Parallel
programming
- Threads
- Lock
• Forall(i=0; i<n;i++)
- Synchronization
- Deadlock a[i]=a[i] +k;
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 25
26. Butterfly barrier
• Basic concepts
- Definition and Sending a Message
advantages
- computational
speed
- The Flynn's
Process Process
taxonomy
- Parallel 1 Receiving confirmation 2
computers Types message
- Shared memory
multiprocessors
• Parallel
programming
- Threads -Send a Message to partner process
- Lock
- Synchronization -Wait until message is received from that process
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 26
27. Stages of Butterfly Barrier
• Basic concepts
- Definition and If we have n=2 processes we build a barrier in k stages 1,2,…,k.
advantages
- computational At stage s processes synchronize with a partner that is 2s-1 steps away.
speed
- The Flynn's These are interleaved so that no process can pass through all stages in
taxonomy the barrier until all process have reached it.
- Parallel
computers Types If n isn’t a power of 2 we can use the next largest 2k, but this isn’t
- Shared memory efficient
multiprocessors
and the system is no longer symmetric.
• Parallel
programming
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 27
29. Virtual process in Butterfly Barrier
When the number of thread is not power of 2?
Virtual process
• Basic concepts 0 1 2 3 4 (2) (1) (0
- Definition and
advantages )
- computational Round 0
speed
- The Flynn's
taxonomy
- Parallel Round 1
computers Types
- Shared memory
multiprocessors
• Parallel
programming
- Threads Round 2
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 29
30. Local Synchronization
• Basic concepts Useful when calculation take varying amount of time and delays can occu
- Definition and
advantages
randomly at any processor or task.
- computational Create Batch
speed Process (Pi-1) Process Pi Process (Pi+1)
- The Flynn's
taxonomy
- Parallel
computers Types Recv(Pi); Send(Pi-1) Recv(Pi)
- Shared memory
multiprocessors
Send(Pi+1)
• Parallel
programming Send(Pi); Recv(Pi-1) Send (Pi)
- Threads
- Lock Recv(Pi+1)
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential Note: Not a perfect three – process barrier Pi-1 will only
implementation synchronize with Pi and continue as soon as Pi allow
- Quicksort
• Conclusions 30
31. Synchronous Iteration
(Synchronous Parallelism)
• Basic concepts • Synchronous iteration: this term is used to describe a
- Definition and
advantages
situation where a problem is solved by iteration and
- computational • each iteration step is composed of several processes that start
speed together at the beginning of the iteration step and
- The Flynn's • next iteration step cannot begin until all processes have finished
taxonomy
- Parallel the current iteration step.
computers Types Iteration 1 Iteration 2 Iteration 3 Iteration n
- Shared memory
multiprocessors
• Parallel
programming Steps 0 to Steps 0 to Steps 0 to Steps 0 to
n-1 n-1 n-1 n-1
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
Iteration process diagram
parallel and
sequential
implementation
- Quicksort
• Conclusions 31
32. Example 1:
Synchronous Iteration
• Basic concepts Equation: (4+6) –(2*3)
- Definition and
advantages
- computational
speed
- The Flynn's -
10 -
10-6=4
-
10-6=4
taxonomy
- Parallel
computers Types
- Shared memory *
2*3=6 +
4+6=10
+
4+6=10 *
2*3=6
multiprocessors
• Parallel a
programming
2 3 4 6 4 6 2 3
- Threads
- Lock
- Synchronization
- Deadlock Squential solution : Solve Parallel solution : Solve
•Sorting Algorithm above equation linear above equation parallel
way and start with solve way and from both side
parallel and
equation prioriy wise of tree
sequential
implementation
- Quicksort
• Conclusions 32
33. Example 2:
Synchronous Iteration
• Basic concepts
- Definition and • Solving a General System of Linear Equations by
advantages Iteration
- computational
speed • Suppose the equations are of a general form with n
- The Flynn's equations and n unknowns
taxonomy
- Parallel
computers Types
- Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation where the unknowns are x0,x1,x2,… xn-1 (0 <=i< n).
- Quicksort
• Conclusions 33
35. Sequential Code
• Basic concepts 7x1 + 3x2 + x3 = 18
for (i=0; i<n; i++) 2x1 - 9x2 + 4x3 = 12
- Definition and
advantages x[i] = b[i]; x1 - 4x2 + 12x3 = 6
- computational for (iter = 0; iter < limit;
Use as the initial estimates:
speed
- The Flynn's
iter++){ x1(0) = x2(0) = x3(0) = 0. Insert these
taxonomy for (i=0; i<n; i++){ estimates into these equations yielding new
estimates of the parameters.
- Parallel sum = 0;
computers Types
- Shared memory
for (j=0; j<n; j++) Iteration 1:
newx[0] = (18 – 0)/7 = 2.571
multiprocessors if (i != j)
• Parallel sum = sum + newx[1] = - (12 – 0)/9 = -1.333
programming a[i][j]*x[j]; newx[2] = (6 – 0)/12 = 0.500
- Threads
- Lock x1(1) = 2.571 x2(1) = -1.333 x3(1) = 0.500
- Synchronization newx[i] = (b[i]- sum)/a[i][i]
- Deadlock Iteration 2:
•Sorting Algorithm } newx[0] = 2.571 +0.500357= 3.071
parallel and
sequential for (i=0; i<n; i++)
newx[1] = -1.333+0.792762 = -
implementation x[i] = newx[i]; 0.540
- Quicksort } newx[2] = 0.500 -0.657282 = - 0.158
• Conclusions 35
36. Parallel Code
• Suppose we have a process Pi for each unknown xi; the code
for process Pi may be:
• Basic concepts x[i] = b[i]
- Definition and Iteration 1
for (iter = 0; iter < limit; iter++) Broadcast_Rv
advantages {
X1[0]=2.571 X1[0]=2.571
- computational sum = -a[i][i] * x[i];
speed for (j = 0; j < n; j++) X1[1]=-1.333 X1[1]=-
- The Flynn's sum = sum + a[i][j] * x[j]; 1.333
taxonomy new_x[i] = (b[i] - sum) /a[i][i]; X1[2]=0.5 X1[2]=0.5
- Parallel broadcast_receive(&new_x[i]); Iteration 2 X2[0] =
computers Types global_barrier(); 3.071
- Shared memory } X2[0] = 3.071 X2[1]= -
multiprocessors • broadcast receive() is used 0.540
X2[1]= - X2[2]=-
• Parallel here 0.540 0.158
programming (1) to send the newly computed
X2[2]=- X3[0] =
- Threads value of x[i] from process Pi to 2.825
0.158
every other process and (2) to X3[1]= -
- Lock Iteration 3
collect data broadcasted from any 0.721
- Synchronization
other processes to process Pi. X2[0] = X3[2]=
- Deadlock 2.825 0.064
•Sorting Algorithm X2[1]= -
parallel and 0.721
sequential X2[2]=-
implementation 0.064
- Quicksort
• Conclusions Recieve Send
36
37. A New Message-Passing
Operation - Allgather.
• Basic concepts
- Definition and • Broadcast and gather values in one composite construction.
advantages
- computational
speed
- The Flynn's
taxonomy
- Parallel
computers Types
- Shared memory
multiprocessors
• Parallel
programming
- Threads
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
parallel and
sequential • Note: MPI Allgather() is also a global barrier, so you do not need to
implementation • add global_barrier().
- Quicksort
• Conclusions 37
38. Solution By Iteration
• Basic concepts
- Definition and
advantages • Iterative methods
- computational
• Applicable when direct methods require excessive
speed
- The Flynn's computations
taxonomy • Have the advantage of small memory requirements
- Parallel • May not always converge/terminate
computers Types
- Shared memory
multiprocessors • An iterative method begins with an initial guess for the
• Parallel unknowns
programming • e.g., xi=bi
- Threads
- Lock • Iterations are continued until sufficiently accurate values
- Synchronization obtained for the unknowns
- Deadlock
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 38
39. Deadlock
• Basic concepts A set of processes or threads is deadlocked when each
- Definition and process or thread is waiting for a resource to be freed which
advantages
- computational is controlled by another process. Simple deadlock situation
speed example.
- The Flynn's
taxonomy
- Parallel P1 R1
computers Types
- Shared memory
multiprocessors
• Parallel
programming
- Threads R2 P2
- Lock
- Synchronization
- Deadlock
•Sorting Algorithm
From Process to Resource
parallel and
sequential and Vice Versa
implementation
- Quicksort R Resource P Process
• Conclusions 39
40. Deadlock
• Basic concepts
- Definition and When a pair of processes each send
advantages
- computational and receive from each other, deadlock
may occur.
speed
- The Flynn's
taxonomy
- Parallel
computers Types
- Shared memory
multiprocessors Process P1 Process P2
• Parallel
programming send() send()
- Threads . .
- Lock . .
- Synchronization
. .
- Deadlock
•Sorting Algorithm recv() recv()
parallel and
sequential
implementation
- Quicksort
• Conclusions 40
41. Deadlock
• Basic concepts
- Definition and
Solution
advantages
- computational Removing the mutual exclusion condition
speed
- The Flynn's
taxonomy
- Parallel
computers Types Or removing the “hold and wait” condition.
requiring processes to request all the resources
- Shared memory
multiprocessors
• Parallel they will need before starting up
programming
- Threads
- Lock
- Synchronization
- Deadlock Employ timeouts to recover from deadlock
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 41
42. Sorting Algorithms
• Basic concepts Sorting numbers – that is, rearranging a list of numbers into
- Definition and
increasing (or decreasing) order – is a fundamental operation
advantages
- computational that appears in many applications.
speed
- The Flynn's Sorting is also applicable to non-numerical values; for
taxonomy example, rearranging strings into alphabetical order.
- Parallel
computers Types Sorting is also often done because it makes searches and other
- Shared memory
multiprocessors
operations easier.
• Parallel Many parallel sorting algorithms and parallel implementations
programming of sequential sorting algorithms are synchronous algorithms.
- Threads
- Lock
- Synchronization
- Deadlock
Here we select one sequential algorithms for conversion to a
•Sorting Algorithm
parallel and parallel implementation.
sequential
implementation
Quicksort
- Quicksort
• Conclusions 42
45. Quicksort using OpenMP
• Basic concepts Quicksort sequential implementation
- Definition and
advantages qsort_seq.c
- computational
speed This program takes one integer parameter, num_elems. The
- The Flynn's
taxonomy num_elems parameter specifies the size of the array to be
- Parallel sorted
computers Types
- Shared memory $ ./qsort_seq 5000000
multiprocessors
• Parallel Quicksort parallel implementation using OpenMP
programming
- Threads
qsort_task.c
- Lock
- Synchronization This program takes two integer parameters:
- Deadlock num_elems, low_limit. The num_elems parameter specifies
•Sorting Algorithm the size of the array to be sorted. The quick_sort function is
parallel and called recursively, causing many tasks to be generated until
sequential
implementation
the low_limit threshold is reached
- Quicksort
$ ./qsort_task 5000000 100
• Conclusions 45
46. Quicksort using OpenMP
• Basic concepts Set the OpenMP environment variables
- Definition and
advantages
- computational
OMP_NUM_THREADS 2 sets the number of threads to
speed use for parallel regions
- The Flynn's
taxonomy OMP_WAIT_POLICY ACTIVE provides a hint to an
- Parallel
computers Types
OpenMP implementation about the desired behavior of
- Shared memory waiting threads. The ACTIVE value specifies that waiting
multiprocessors
threads should mostly be active, i.e., consume processor
• Parallel
programming
cycles, while waiting.
- Threads
- Lock
OMP_DYNAMIC FALSE controls dynamic adjustment of
- Synchronization the number of threads to use for executing parallel
- Deadlock
regions. If the environment variable is set to false, the
•Sorting Algorithm
parallel and dynamic adjustment of the number of threads is
sequential disabled
implementation
- Quicksort
• Conclusions 46
49. Conclusions
• Basic concepts
- Definition and
The demand of computing power and speed
advantages increase every day.
- computational
speed
- The Flynn's Programs that are properly designed to take
taxonomy
- Parallel
advantage of parallelism can execute faster
computers Types than their sequential counterparts, which is
- Shared memory
multiprocessors an advantage.
• Parallel
programming Some algorithms cannot be parallelized
- Threads
- Lock
- Synchronization
Parallelization offers a new way to increase
- Deadlock performance.
•Sorting Algorithm
parallel and
sequential
implementation
- Quicksort
• Conclusions 49
51. References
OpenMP Specification http://www.openmp.org/mp-documents/spec30.pdf
Reap the Benefits of Multithreading without all the work http://msdn.microsoft.com/en-us/magazine/cc163717.aspx
OpenMP http://www.metz.supelec.fr/metz/personnel/vialle/course/SI-PP/notes-de-cours-specifiques/PP-02-OpenMP-
6spp.pdf
Parallel Programming http://coitweb.uncc.edu/%7Eabw/ITCS4145F10/
Sorting Algorithms http://www.c.happycodings.com/Sorting_Searching/index.html
OpenMP Exercise https://computing.llnl.gov/tutorials/openMP/exercise.html
OpenMP Recursive Routines http://www.openmp.org/pipermail/omp/2005/000145.html
OpenMP http://en.wikipedia.org/wiki/OpenMP
http://publib.boulder.ibm.com/infocenter/comphelp/v111v131/index.jsp?topic=/com.ibm.xlc111.aix.doc/compiler_ref/
prag_omp_task.html
The Joys of Concurrent Programming http://www.informit.com/articles/article.aspx?p=30413&seqNum=2
Wilkinson, Barry and Allen Michael. Parallel Programming. Second Edition. 2005 . Pearson Prentice Hall.
J Ekanayake, G. F. 2010. High performance parallel computing with clouds and cloud technologies. Cloud Computing.
A Grama, A. G., G Karypis, V Kumar 2003. Introduction to parallel computing. Addison-Wesley.
51
Editor's Notes
A deadlock refers to a specific condition when two or more processes are each waiting for the other to release a resource. Deadlock is a common problem in multiprocessing where many processes share a mutually exclusive resource.
Deadlock will occur if both processes perform the send, using synchronous routines first (or blocking routines without sufficient buffering). This is because neither will return; they will wait for matching receives that are never reached.
Sorting is a fundamental operation that appears in many applications.Sorting is also often done because it makes searches and other operations easier but it has a processing costNow we’d like to show that Quicksort a well known fast algorithm can be improve using parallel programming and reduce the processing cost
Quicksort works first selecting one number called pivot, and it’s compared to the other number in the listIf the number is less than the pivot, it is placed in one sublist. Otherwise, it is placed in the other sublist. The procedure is repeated on the sublists. Quicksort is usually described by a recursive algorithm. One obvious way to parallelize quicksort is to start with one processor and pass on one of the recursive calls to another processor while keeping the other recursive call to perform.
In the quick_sort function, the original array ispartitioned into two parts. Each part is handled by the quick_sort function recursively. Since the twoparts of the array are manipulated independently, work can execute concurrently by using OpenMPparallel tasksThe par_quick_sort function has a parallel construct that contains a single construct. In the single construct, there is a call to the quick_sort function. Two tasks are generated in the quick_sort function. The quick_sort function is called recursively, causing many tasks to be generated until the low limit threshold is reached.The execution model of the qsort_task program can be described as a single producer, multiple consumer model. The thread executing the single region generates tasks; the threads in the team execute these tasks. All the tasks generated are guaranteed to complete by the time the threads exit thesingle region. When a thread finishes executing a task, it grabs a new task to execute. In this way, all threads can execute available tasks without barrier synchronization, thereby improving load balancing.
One of the first thing we can see is that with the parallel program, both cpu are working
We made some tests and we found that the parallel quicksort has a best performance than sequential quicksort