Upcoming SlideShare
×

# Parallel Algorithms

312 views
232 views

Published on

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
312
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
16
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Parallel Algorithms

1. 1. Algorithms Parallel Algorithms 1
2. 2. Page 2 An overview • A simple parallel algorithm for computing parallel prefix. • A parallel merging algorithm
3. 3. Page 3 • We are given an ordered set A of n elements and a binary associative operator . • We have to compute the ordered set 0 1 2 1, , ,..., nA a a a a 0 0 1 0 1 1, ,..., ... na a a a a a Definition of prefix computation
4. 4. Page 4 • For example, if is + and the input is the ordered set {5, 3, -6, 2, 7, 10, -2, 8} then the output is {5, 8, 2, 4, 11, 21, 19, 27} • Prefix sum can be computed in O (n) time sequentially. An example of prefix computation
5. 5. Page 5 First Pass • For every internal node of the tree, compute the sum of all the leaves in its subtree in a bottom-up fashion. sum[v] := sum[L[v]] + sum[R[v]] Using a binary tree
6. 6. Page 6 for d = 0 to log n – 1 do for i = 0 to n – 1 by 2d+1 do in parallel a[i + 2d+1 - 1] := a[i + 2d - 1] + a[i + 2d+1 - 1] • In our example, n = 8, hence the outer loop iterates 3 times, d = 0, 1, 2. Parallel prefix computation
7. 7. Page 7 • d = 0: In this case, the increments of 2d+1 will be in terms of 2 elements. • for i = 0, a[0 + 20+1 - 1] := a[0 + 20 - 1] + a[0 + 20+1 - 1] or, a[1] := a[0] + a[1] When d= 0
8. 8. Page 8 • d = 1: In this case, the increments of 2d+1 will be in terms of 4 elements. • for i = 0, a[0 + 21+1 - 1] := a[0 + 21 - 1] + a[0 + 21+1 - 1] or, a[3] := a[1] + a[3] • for i = 4, a[4 + 21+1 - 1] := a[4 + 21 - 1] + a[4 + 21+1 - 1] or, a[7] := a[5] + a[7] When d = 1
9. 9. Page 9 • blue: no change from last iteration. • magenta: changed in the current iteration. The First Pass
10. 10. Page 10 Second Pass • The idea in the second pass is to do a topdown computation to generate all the prefix sums. • We use the notation pre[v] to denote the prefix sum at every node. The Second Pass
11. 11. Page 11 • pre[root] := 0, the identity element for the operation, since we are considering the operation. • If the operation is max, the identity element will be . Computation in the second phase
12. 12. Page 12 pre[L[v]] := pre[v] pre[R[v]] := sum[L[v]] + pre[v] Second phase (continued)
13. 13. Page 13 Example of second phase pre[L[v]] := pre[v] pre[R[v]] := sum[L[v]] + pre[v]
14. 14. Page 14 for d = (log n – 1) downto 0 do for i = 0 to n – 1 by 2d+1 do in parallel temp := a[i + 2d - 1] a[i + 2d - 1] := a[i + 2d+1 - 1] (left child) a[i + 2d+1 - 1] := temp + a[i + 2d+1 - 1] (right child) a[7] is set to 0 Parallel prefix computation
15. 15. Page 15 • We consider the case d = 2 and i = 0 temp := a[0 + 22 - 1] := a[3] a[0 + 22 - 1] := a[0 + 22+1 - 1] or, a[3] := a[7] a[0 + 22+1 - 1] := temp + a[0 + 22+1 - 1] or, a[7] := a[3] + a[7] Parallel prefix computation
16. 16. Page 16 • blue: no change from last iteration. • magenta: left child. • brown: right child. Parallel prefix computation
17. 17. Page 17 • All the prefix sums except the last one are now in the leaves of the tree from left to right. • The prefix sums have to be shifted one position to the left. Also, the last prefix sum (the sum of all the elements) should be inserted at the last leaf. • The complexity is O (log n) time and O (n) processors. Exercise: Reduce the processor complexity to O (n / log n). Parallel prefix computation
18. 18. Page 18 Parallel merging through partitioning The partitioning strategy consists of: • Breaking up the given problem into many independent subproblems of equal size • Solving the subproblems in parallel This is similar to the divide-and-conquer strategy in sequential computing.
19. 19. Page 19 Partitioning and Merging Given a set S with a relation , S is linearly ordered, if for every pair a,b S. • either a b or b a. The merging problem is the following:
20. 20. Page 20 Partitioning and Merging Input: Two sorted arrays A = (a1, a2,..., am) and B = (b1, b2,..., bn) whose elements are drawn from a linearly ordered set. Output: A merged sorted sequence C = (c1, c2,..., cm+n).
21. 21. Page 21 Merging For example, if A = (2,8,11,13,17,20) and B = (3,6,10,15,16,73), the merged sequence C = (2,3,6,8,10,11,13,15,16,17,20,73).
22. 22. Page 22 Merging A sequential algorithm • Simultaneously move two pointers along the two arrays • Write the items in sorted order in another array
23. 23. Page 23 Partitioning and Merging • The complexity of the sequential algorithm is O(m + n). • We will use the partitioning strategy for solving this problem in parallel.
24. 24. Page 24 Partitioning and Merging Definitions: rank(ai : A) is the number of elements in A less than or equal to ai A. rank(bi : A) is the number of elements in A less than or equal to bi B.
25. 25. Page 25 Merging For example, consider the arrays: A = (2,8,11,13,17,20) B = (3,6,10,15,16,73) rank(11 : A) = 3 and rank(11 : B) = 3.
26. 26. Page 26 Merging • The position of an element ai A in the sorted array C is: rank(ai : A) + rank(ai : B). For example, the position of 11 in the sorted array C is: rank(11 : A) + rank(11 : B) = 3 + 3 = 6.
27. 27. Page 27 Parallel Merging • The idea is to decompose the overall merging problem into many smaller merging problems. • When the problem size is sufficiently small, we will use the sequential algorithm.
28. 28. Page 28 Merging • The main task is to generate smaller merging problems such that: • Each sequence in such a smaller problem has O(log m) or O(log n) elements. • Then we can use the sequential algorithm since the time complexity will be O(log m + log n).
29. 29. Page 29 Parallel Merging Step 1. Divide the array B into blocks such that each block has log m elements. Hence there are m/log m blocks. For each block, the last elements are i log m, 1 i m/log m
30. 30. Page 30 Parallel Merging Step 2. We allocate one processor for each last element in B. •For a last element i log m, this processor does a binary search in the array A to determine two elements ak, ak+1 such that ak i log m ak+1. •All the m/log m binary searches are done in parallel and take O(log m) time each.
31. 31. Page 31 Parallel Merging • After the binary searches are over, the array A is divided into m/log m blocks. • There is a one-to-one correspondence between the blocks in A and B. We call a pair of such blocks as matching blocks.
32. 32. Page 32 Parallel Merging • Each block in A is determined in the following way. • Consider the two elements i log m and(i + 1) log m. These are the elements in the (i + 1)-th block of B. • The two elements that determine rank(i log m : A) and rank((i + 1) log m : A) define the matching block in A
33. 33. Page 33 Parallel Merging • These two matching blocks determine a smaller merging problem. • Every element inside a matching block has to be ranked inside the other matching block. • Hence, the problem of merging a pair of matching blocks is an independent subproblem which does not affect any other block.
34. 34. Page 34 Parallel Merging • If the size of each block in A is O(log m), we can directly run the sequential algorithm on every pair of matching blocks from A and B. • Some blocks in A may be larger than O(log m) and hence we have to do some more work to break them into smaller blocks.
35. 35. Page 35 Parallel Merging If a block in Ai is larger than O(log m) and the matching block of Ai is Bj, we do the following •We divide Ai into blocks of size O(log m). •Then we apply the same algorithm to rank the boundary elements of each block in Ai in Bj. •Now each block in A is of size O(log m) •This takes O(log log m) time.
36. 36. Page 36 Parallel Merging Step 3. • We now take every pair of matching blocks from A and B and run the sequential merging algorithm. • One processor is allocated for every matching pair and this processor merges the pair in O(log m) time. We have to analyse the time and processor complexities of each of the steps to get the overall complexities.
37. 37. Page 37 Parallel Merging Complexity of Step 1 • The task in Step 1 is to partition B into blocks of size log m. • We allocate m/log m processors. • Since B is an array, processor Pi, 1 i m/log m can find the element i log m in O(1) time.
38. 38. Page 38 Parallel Merging Complexity of Step 2 • In Step 2, m/log m processors do binary search in array A in O(log n) time each. • Hence the time complexity is O(log n) and the work done is (m log n)/ log m (m log(m + n)) / log m (m + n) for n,m 4. Hence the total work is O(m + n).
39. 39. Page 39 Parallel Merging Complexity of Step 3 • In Step 3, we use m/log m processors • Each processor merges a pair Ai, Bi in O(log m) time.Hence the total work done is m. Theorem Let A and B be two sorted sequences each of length n. A and B can be merged in O(log n) time using O(n) operations in the CREW PRAM.