Parallel Algorithms

910 views

Published on

Published in: Education, Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
910
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
56
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Parallel Algorithms

  1. 1. Algorithms Parallel Algorithms 1
  2. 2. Page 2 An overview • A parallel merging algorithm • Accelerated Cascading and Parallel List Ranking
  3. 3. Page 3 Parallel merging through partitioning The partitioning strategy consists of: • Breaking up the given problem into many independent subproblems of equal size • Solving the subproblems in parallel This is similar to the divide-and-conquer strategy in sequential computing.
  4. 4. Page 4 Partitioning and Merging Given a set S with a relation , S is linearly ordered, if for every pair a,b S. • either a b or b a. The merging problem is the following:
  5. 5. Page 5 Partitioning and Merging Input: Two sorted arrays A = (a1, a2,..., am) and B = (b1, b2,..., bn) whose elements are drawn from a linearly ordered set. Output: A merged sorted sequence C = (c1, c2,..., cm+n).
  6. 6. Page 6 Merging For example, if A = (2,8,11,13,17,20) and B = (3,6,10,15,16,73), the merged sequence C = (2,3,6,8,10,11,13,15,16,17,20,73).
  7. 7. Page 7 Merging A sequential algorithm • Simultaneously move two pointers along the two arrays • Write the items in sorted order in another array
  8. 8. Page 8 Partitioning and Merging • The complexity of the sequential algorithm is O(m + n). • We will use the partitioning strategy for solving this problem in parallel.
  9. 9. Page 9 Partitioning and Merging Definitions: rank(ai : A) is the number of elements in A less than or equal to ai A. rank(bi : A) is the number of elements in A less than or equal to bi B.
  10. 10. Page 10 Merging For example, consider the arrays: A = (2,8,11,13,17,20) B = (3,6,10,15,16,73) rank(11 : A) = 3 and rank(11 : B) = 3.
  11. 11. Page 11 Merging • The position of an element ai A in the sorted array C is: rank(ai : A) + rank(ai : B). For example, the position of 11 in the sorted array C is: rank(11 : A) + rank(11 : B) = 3 + 3 = 6.
  12. 12. Page 12 Parallel Merging • The idea is to decompose the overall merging problem into many smaller merging problems. • When the problem size is sufficiently small, we will use the sequential algorithm.
  13. 13. Page 13 Merging • The main task is to generate smaller merging problems such that: • Each sequence in such a smaller problem has O(log m) or O(log n) elements. • Then we can use the sequential algorithm since the time complexity will be O(log m + log n).
  14. 14. Page 14 Parallel Merging Step 1. Divide the array B into blocks such that each block has log m elements. Hence there are m/log m blocks. For each block, the last elements are i log m, 1 i m/log m
  15. 15. Page 15 Parallel Merging Step 2. We allocate one processor for each last element in B. •For a last element i log m, this processor does a binary search in the array A to determine two elements ak, ak+1 such that ak i log m ak+1. •All the m/log m binary searches are done in parallel and take O(log m) time each.
  16. 16. Page 16 Parallel Merging • After the binary searches are over, the array A is divided into m/log m blocks. • There is a one-to-one correspondence between the blocks in A and B. We call a pair of such blocks as matching blocks.
  17. 17. Page 17 Parallel Merging • Each block in A is determined in the following way. • Consider the two elements i log m and(i + 1) log m. These are the elements in the (i + 1)-th block of B. • The two elements that determine rank(i log m : A) and rank((i + 1) log m : A) define the matching block in A
  18. 18. Page 18 Parallel Merging • These two matching blocks determine a smaller merging problem. • Every element inside a matching block has to be ranked inside the other matching block. • Hence, the problem of merging a pair of matching blocks is an independent subproblem which does not affect any other block.
  19. 19. Page 19 Parallel Merging • If the size of each block in A is O(log m), we can directly run the sequential algorithm on every pair of matching blocks from A and B. • Some blocks in A may be larger than O(log m) and hence we have to do some more work to break them into smaller blocks.
  20. 20. Page 20 Parallel Merging If a block in Ai is larger than O(log m) and the matching block of Ai is Bj, we do the following •We divide Ai into blocks of size O(log m). •Then we apply the same algorithm to rank the boundary elements of each block in Ai in Bj. •Now each block in A is of size O(log m) •This takes O(log log m) time.
  21. 21. Page 21 Parallel Merging Step 3. • We now take every pair of matching blocks from A and B and run the sequential merging algorithm. • One processor is allocated for every matching pair and this processor merges the pair in O(log m) time. We have to analyse the time and processor complexities of each of the steps to get the overall complexities.
  22. 22. Page 22 Parallel Merging Complexity of Step 1 • The task in Step 1 is to partition B into blocks of size log m. • We allocate m/log m processors. • Since B is an array, processor Pi, 1 i m/log m can find the element i log m in O(1) time.
  23. 23. Page 23 Parallel Merging Complexity of Step 2 • In Step 2, m/log m processors do binary search in array A in O(log n) time each. • Hence the time complexity is O(log n) and the work done is (m log n)/ log m (m log(m + n)) / log m (m + n) for n,m 4. Hence the total work is O(m + n).
  24. 24. Page 24 Parallel Merging Complexity of Step 3 • In Step 3, we use m/log m processors • Each processor merges a pair Ai, Bi in O(log m) time.Hence the total work done is m. Theorem Let A and B be two sorted sequences each of length n. A and B can be merged in O(log n) time using O(n) operations in the CREW PRAM.
  25. 25. 25 Accelerated Cascading and Parallel List Ranking • We will first discuss a technique called accelerated cascading for designing very fast parallel algorithms. • We will then study a very important technique for ranking the elements of a list in parallel.
  26. 26. 26 Fast computation of maximum Input: An array A holding p elements from a linearly ordered universe S. We assume that all the elements in A are distinct. Output: The maximum element from the array A. We use a boolean array M such that M(k)=1 if and only if A(k) is the maximum element in A. Initialization: We allocate p processors to set each entry in M to 1.
  27. 27. 27 Fast computation of maximum Step 1: Assign p processors for each element in A, p2 processors overall. •Consider the p processors allocated to A(j). We name these processors as P1, P2,..., Pi,..., Pp. •Pi compares A(j) with A(i) : If A(i) > A(j) then M(j) := 0 else do nothing.
  28. 28. 28 Fast computation of maximum Step 2: At the end of Step 1, M(k) , 1 k p will be 1 if and only if A(k) is the maximum element. •We allocate p processors, one for each entry in M. •If the entry is 0, the processor does nothing. •If the entry is 1, it outputs the index k of the maximum element.
  29. 29. 29 Fast computation of maximum Complexity: The processor requirement is p2 and the time complexity is O(1). • We need concurrent write facility and hence the Common CRCW PRAM model.
  30. 30. 30 Optimal computation of maximum • This is the same algorithm which we used for adding n numbers.
  31. 31. 31 Optimal computation of maximum • This algorithm takes O(n) processors and O(log n) time. • We can reduce the processor complexity to O(n / log n). Hence the algorithm does optimal O(n) work.
  32. 32. 32 An O(log log n) time algorithm • Instead of a binary tree, we use a more complex tree. Assume that . • The root of the tree has children. • Each node at the i-th level has children for . • Each node at level k has two children. 2 2 k n 1 2 2 k n 1 2 2 k i 0 1i k
  33. 33. 33 An O(log log n) time algorithm Some Properties • The depth of the tree is k. Since • The number of nodes at the i-th level is 2 2 , loglog k n k n 2 2 ,for 0 .2 k k i i k
  34. 34. 34 An O(log log n) time algorithm The Algorithm • The algorithm proceeds level by level, starting from the leaves. • At every level, we compute the maximum of all the children of an internal node by the O(1) time algorithm. • The time complexity is O(log log n) since the depth of the tree is O(log log n).
  35. 35. 35 An O(log log n) time algorithm Total Work: • Recall that the O(1) time algorithm needs O(p2) work for p elements. • Each node at the i-th level has children. • So the total work for each node at the i-th level is . 1 2 2 k i 1 22 ( )2 k i O
  36. 36. 36 An O(log log n) time algorithm Total Work: • There are nodes at the i-th level. Hence the total work for the i-th level is: • For O(log log n) levels, the total work is O(n log log n) . This is suboptimal. 2 2 2 k k i 1 2 22 2 2 ( ) (2 ) ( )2 2 kk i k k i O O nO
  37. 37. 37 Accelerated cascading • The first algorithm which is based on a binary tree, is optimal but slow. • The second algorithm is suboptimal, but very fast. • We combine these two algorithms through the accelerated cascading strategy.
  38. 38. 38 Accelerated cascading • We start with the optimal algorithm until the size of the problem is reduced to a certain value. • Then we use the suboptimal but very fast algorithm.
  39. 39. 39 Accelerated cascading Phase 1. • We apply the binary tree algorithm, starting from the leaves and upto log log log n levels. • The number of candidates reduces to • The total work done so far is O(n) and the total time is O(log log log n) . logloglog 2 loglog .n n n n
  40. 40. 40 Accelerated cascading Phase 2. • In this phase, we use the fast algorithm on the remaining candidates. • The total work is . • The total time is . • Theorem: Maximum of n elements can be computed in O(log log n) time and O(n) work on the Common CRCW PRAM. ( ) loglog n n O n ( loglog ) ( )O n n O n (loglog ) (loglog )O n O n
  41. 41. 41 Two parallel list ranking algorithms • An O(log n) time and O(n log n) work list ranking algorithm. • An O(log n loglog n) time and O(n) work list ranking algorithm.
  42. 42. 42 List ranking Input: A linked list L of n elements. L is given in an array S such that the entry S(i) contains the index of the node which is the successor of the node i in L. Output: The distance of each node i from the end of the list.
  43. 43. 43 List ranking List ranking can be solved in O(n) time sequentially for a list of length n. •Hence, a work-optimal parallel algorithm should do only O(n) work.
  44. 44. 44 A simple list ranking algorithm Output: For each 1 i n, the distance R(i) of node i from the end of the list. begin for 1 i n do in parallel if S(i) 0 then R(i) := 1 else R(i) := 0 endfor while S(i) 0 and S(S(i)) 0 do Set R(i) := R(i) + R(S(i)) Set S(i) := S(S(i)) end
  45. 45. 45 A simple list ranking algorithm • At the start of an iteration of the while loop, R(i) counts the nodes in a sublist starting at i (a subset of nodes which are adjacent in the list).
  46. 46. 46 A simple list ranking algorithm • After the iteration, R(i) counts the nodes in a sublist of double the size. • When the while loop terminates, R(i) counts all the nodes starting from i and until the end of the list
  47. 47. 47 Complexity and model • The algorithm terminates after O(log n) iterations of the while loop. • The work complexity is O(n log n) since we allocate one processor for each node. • We need the CREW PRAM model since several nodes may try to read the same successor (S) values.
  48. 48. 48 Complexity and model Exercise : Modify the algorithm to run on the EREW PRAM with the same time and processor complexities.
  49. 49. 49 The strategy for an optimal algorithm • Our aim is to modify the simple algorithm so that it does optimal O(n) work. • The best algorithm would be the one which does O(n) work and takes O(log n) time. • There is an algorithm meeting these criteria, however the algorithm and its analysis are very involved.
  50. 50. 50 The strategy for an optimal algorithm • We will study an algorithm which does O(n) work and takes O(log n loglog n) time. • However, in future we will use the optimal algorithm for designing other algorithms.
  51. 51. 51 The strategy for an optimal algorithm 1. Shrink the initial list L by removing some of the nodes. The modified list should have O(n / log n) nodes. 2. Apply the pointer jumping technique (the suboptimal algorithm) on the list with O(n / log n) nodes.
  52. 52. 52 The strategy for an optimal algorithm 3. Restore the original list and rank all the nodes removed in Step 1. The important step is Step1. We need to choose a subset of nodes for removal.
  53. 53. 53 Independent sets Definition A set I of nodes is independent if whenever i I , S(i) I. The blue nodes form an independent set in this list
  54. 54. 54 Independent sets • The main task is to pick an independent set correctly. • We pick an independent set by first coloring the nodes of the list by two colors.
  55. 55. 55 2-coloring the nodes of a list Definition: A k-coloring of a graph G is a mapping: c : V 0,1,…,k - 1 such that c(i) c(j) if i, j E. • It is very easy to design an O(n) time sequential algorithm for 2-coloring the nodes of a linked list.
  56. 56. 56 2-coloring the nodes of a list • We will assume the following result: Theorem: A linked list with n nodes can be 2- colored in O(log n) time and O(n) work.
  57. 57. 57 Independent sets • When we 2-color the nodes of a list, alternate nodes get the same color. • Hence, we can remove the nodes of the same color to reduce the size of the original list from n to n/2.
  58. 58. 58 Independent sets • However, we need a list of size to run our pointer jumping algorithm for list ranking. • If we repeat the process loglog n time, we will reduce the size of the list to i.e., to loglog 2 n n log n n log n n
  59. 59. 59 Preserving the information • When we reduce the size of the list to , we have lost a lot of information because the removed nodes are no longer present in the list. • Hence, we have to put back the removed nodes in their original positions to correctly compute the ranks of all the nodes in the list. log n n
  60. 60. 60 Preserving the information • Note that we have removed the nodes in O(log log n) iterations. • So, we have to replace the nodes also in O(log log n) iterations.

×