From the perspective of Design and Analysis of Algorithm. I made these slide by collecting data from many sites.
I am Danish Javed. Student of BSCS Hons. at ITU Information Technology University Lahore, Punjab, Pakistan.
2. Group Members
Arsalan Ali Daim (BSCS14068)
Danish Javed (BSCS14028)
Muhammad Hamza (BSCS14062)
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
3. Outline
1. What is Parallel Algorithm?
2. Its Abilities.
3. Why Parallel Computing?
4. Parallel Algorithms.
5. Limitations for Parallel Algorithms.
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
4. What is Parallel Algorithm?
A parallel algorithm is an algorithm that has been
specifically written for execution on a computer with two or
more processors.
But it can be run on computers with single processor
(multiple functional units, pipelined functional units, pipelined memory systems)
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
5. What makes Parallel Algorithms better?
• Throughput: Is the number of operations done per time unit.
• Latency : Is the time needed to complete one operation.
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
6. Why Parallel Computing?
The Real World is Massively Parallel:
• In the natural world, many complex, interrelated events are happening at the same time, yet within a
temporal sequence.
• Compared to serial computing, parallel computing is much better suited for modeling, simulating and
understanding complex, real world phenomena.
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
7. Why Parallel Computing?
SOLVE LARGER / MORE COMPLEX PROBLEMS:
• Many problems are so large and/or complex that it is impractical or impossible to solve them on a
single computer, especially given limited computer memory.
• Example: Web search engines/databases processing millions of transactions per second
TAKE ADVANTAGE OF NON-LOCAL RESOURCES:
• Using computer resources on a wide area network, or even the Internet when local computer
resources are scarce or insufficient.
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
8. Hardware implementation for
Parallel algorithms (PRAM MODEL)
In the PRAM model, processors communicate by reading from and writing to the shared
memory locations
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
9. Classification of PRAM MODEL
PRAM is classified in two basic types
1. CRAM(Concurrent RAM)
2. ERAM(Exclusive RAM)
And they also have some combinations.
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
10. Parallel Algorithm
1. Odd – Even Transposition Sort
2. Parallel Merge Sort
3. Computing Sum of a Sequence with parallelism
There are many more…
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
11. Odd – Even Transposition Sort
Variation of bubble sort.
Operates in two alternating phases, even phase and odd phase.
Even phase
Even-numbered processes exchange numbers with their right
neighbors.
Odd phase
Odd-numbered processes exchange numbers with their
right neighbors.
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
12. Odd – Even Transposition Sort – Example
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
Parallel time complexity: Tpar = O(n) (for P=n)
14. Odd – Even Transposition Sort
Assuming our array of n elements to sort is
very large, we will be working with many
virtual processors on the same processor
to emulate one Process per element.
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
15. Merge Sort
Example of divide and conquer algorithms.
Sorting algorithm to sort a vector, first divide it into two parts.
Apply same method again to each subdivided part. When both are sorted with m and n
elements.
Merge them and it will produce sorted vector.
The average complexity T(n) = O(nlogn)
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
16. Parallel Merge Sort
Divided into two tasks:
1.Divide the list
2.Conquer the list
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
17. Parallel Merge Sort
Divide the list onto different processors
Simple tree structure like this:
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
18. Parallel Merge Sort
Merge elements as they come together.
Simple tree structure like this:
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
19. Parallel Merge Sort – Algorithm
ALGORITHM: mergesort(A)
1 if (|A| = 1) then return A
2 else
3 in parallel do
4 L := mergesort(A[0..|A|/2))
5 R := mergesort(A[|A|/2..|A|))
6 return merge(L, R)
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
20. Parallel Merge Sort – Complexity
Sequential Merge Sort = O(nlogn)
In Parallel, we have n processors
logn time required to divide sequence
logn time required to merge sequence
logn+logn = 2logn
So, T(n) = O(logn)
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
21. Computing the sum of a Sequence.
Consider a sequence of n elements.
Devise an algorithm that performs many operations in parallel.
In parallel, each element of A with an even index is paired and summed with the next element
of A. Like , A[0] is paired with A[1], A[2] with A[3], and so on.
The result is a new sequence of ⌈n/2⌉ numbers.
This pairing and summing step can be repeated until, after ⌈log2 n⌉ steps, a sequence consisting
of a single value is produced, and this value is equal to the final sum.
Sequentially, its time complexity is O(n) but using this technique of parallelism the time
complexity reduced to O(log2n).
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
22. The Limitations and Problems
• Data Dependency
• Race Condition
• Resource Requirements
• Scalability
• Parallel Slowdown
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
23. Data Dependency
Results from multiple use of the same location(s) in storage by different tasks.
e.g.
for (int i=0;i<100;i++)
array[i]=array[i-1]*20;
Shared memory architectures -synchronize read/write operations between tasks.
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
24. Race Condition
If instruction 1B is executed between 1A and 3A, or if instruction 1A is executed between 1B and
3B, the program will produce incorrect data. This is known as a race condition.
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
25. Resource Requirements
• The primary intent of parallel programming is to decrease execution wall clock time, however
in order to accomplish this, more CPU time is required. For example, a parallel code that runs
in 1 hour on 8 processors actually uses 8 hours of CPU time.
• The amount of memory required can be greater for parallel codes than serial codes, due to
the need to replicate data and for overheads associated with parallel support libraries and
subsystems.
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
26. Scalability
Two types of scaling based on time to solution:
◦ Strong scaling: The total problem size stays fixed as more processors are added.
◦ Weak scaling: The problem size per processor stays fixed as more processors are added.
Hardware factors play a significant role in scalability. Examples:
◦ Memory-CPU bus bandwidth
◦ Amount of memory available on any given machine or set of machines
◦ Processor clock speed
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
27. Parallel Slowdown
• Not all parallelization results in speed-up.
• Once task split up into multiple threads those threads spend a large amount of time
communicating among each other resulting degradation in the system.
• This is known as parallel slowdown.
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)
28. Parallel Slowdown – Example
I have observed a few such attempts where parallel code used the Threading Building
Blocks library (TBB). Much to experimenters’ astonishment, not only their simple parallel
programs sometimes expose no reasonable speedup but even those can be slower than
sequential counterparts!
Conclusion: when developing programs with TBB, you should take into account that using TBB
classes and functions may impact compiler optimizations, which has especially bad impact on
simple algorithms with small amount of work per iteration. Proper use of local variables helps
optimization and improves parallel speedup.
For Further info: https://software.intel.com/en-us/blogs/2008/03/04/why-a-simple-test-can-
get-parallel-slowdown
PARALLEL ALGORITHM (DESIGN AND ANALYSIS OF ALGORITHMS)