Matrix Chain Scheduling Algorithm

2,395 views
2,304 views

Published on

Processor Allocation and Task Scheduling of Matrix Chain Products
on Parallel Systems(Parallelizing matrix chain products)

Heejo Lee, Jong Kim, Sungje Hong, Sunggu Lee

Dept of Computer Science and Engineering Pohang University of
Science and Technology, Korea

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,395
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
32
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Matrix Chain Scheduling Algorithm

  1. 1. Matrix Chain Scheduling Algorithm yen3 March 4, 2009 yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 1 / 30
  2. 2. Outline 1 Introduction About the paper The Problem Notation Problem Description 2 Processor Allocation for Matrix Products Processor Allocation DPA Algorithm 3 Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm Two-Pass Matrix Chain Scheduling Algorithm 4 Conclusion yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 2 / 30
  3. 3. Introduction About the paper About the paper Processor Allocation and Task Scheduling of Matrix Chain Products on Parallel Systems(Parallelizing matrix chain products) Heejo Lee, Jong Kim, Sungje Hong, Sunggu Lee Dept of Computer Science and Engineering Pohang University of Science and Technology, Korea Technical Report CS-HPC-97-003 April, 2003(December, 1997) yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 3 / 30
  4. 4. Introduction The Problem MCOP: the Matrix Chain Ordering Problem MCSP: the Matrix Chain Scheduling Problem The paper introduce a processor scheduling algorithm for MCSP which attempts to minimize the evaluations time of a chain of matrix products on a parallel computer, even at the expense of a slight increase in the total number of operations. yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 4 / 30
  5. 5. Introduction Notation A lot of symbol...Orz P: the number of processors in a parallel system. M: a matrix chain product with n matrices, i.e, M = M1 × M2 × · · · × Mn . Mi : an mi × mi+1 matrix (mi ≥ 1, 1 ≤ i ≤ n). L: a product sequence subtree of L for a matrix chain M. Li,j : the sequence subtree of L for (Mi × · · · × Mj ). yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 5 / 30
  6. 6. Introduction Notation A lot of symbol (cont’d) C : the minimum amount of computation for evaluating M. ∆C : the amount of increased computation by modifying the current sequence tree. pi,j : the number of processor assigned for evaluating (Mi × · · · × Mj ) (mi , mj , mk ): single matrix product for multiplying an mi × mj matrix by an mj × mk matrix. Φ(mi , mj , mk , p): the execution time of single matrix product (mi , mj , mk ) when p processors are allocated. D(x): the set of divisors of x, i.e., D(x) = {d|d divides x} yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 6 / 30
  7. 7. Introduction Notation A lot of symbol - 2003 new! LD(x, y ): the largest divisor in D(x) that is not larger than y . SD(x, y ): if x > y , then SD(x, y ) is the smallest divisor in D(x) that is larger then y. Otherwise, SD(x, y ) = x m: the largest dimension among all of the matrices, i.e., m = max1≤i≤n+1 (mi ) yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 7 / 30
  8. 8. Introduction Problem Description The Problem The problemis finding the optimal schedule with minimum evaluation time of M = M1 × M2 × · · · × Mn 1 on a P processor parallel system. The matrix multiplication parallel algorithm is based that multiplying A by B with p processors, the executions time Φ(mi , mj , mk , p)2 can be approx imated as follows. mi mj mk p if 1 ≤ p ≤ mi mk Φ(mi , mj , mk , p) ≈ mi mj mk p log( mip k ) if m mi mk < p ≤ mi mj mk 1 M: a matrix chain product with n matrices, i.e, M = M1 × M2 × · · · × Mn . 2 Φ(mi , mj , mk , p): the execution time of single matrix product (mi , mj , mk ) when p processors are allocated. yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 8 / 30
  9. 9. Introduction Problem Description The Problem - cont’d The matrix products can divide to two parts. The Time is 3 Ti,j ≈ min (Ti,k (pi,j ) + Tk+1,j (pi,j ) + Φ(mi , mk+1 , mj+1 , pi,j ), i≤k<j max(Ti,k (pi,k ), Tk+1,j (pk+1,j )) + Φ(mi , mk+1 , mj+1 , pi,j )) So, The paper define MCSP: find the product sequence for evaluating a chain of matrix chain of matrix products and the processor schedule for the sequence such that the evaluation time is minimized an a parallel system. 3 pi,j : the number of processor assigned for evaluating (Mi × · · · × Mj ) yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 9 / 30
  10. 10. Introduction Problem Description MCSP Complexity The paper assume that each matrix multiplication (mi × mj ) × (mj × mk ) has log(mj ) time complexity mi mj mk processors. The problem can be presented mini≤k<j max(Ti,k , Tk+1,j ) + log(mk+1 ) 1≤i <j ≤n Ti,j = 0 i = j, 1 ≤ i ≤ n but, MCSP is NP-hard. yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 10 / 30
  11. 11. Processor Allocation for Matrix Products Processor Allocation the Concurrent Computations of Multiple Matrix Products proportional allocation: the algorithm allocates processors proportional to the computation amount of each task. Trying to minimize the completion of all tasks by balancing the execution time of each task. yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 11 / 30
  12. 12. Processor Allocation for Matrix Products Processor Allocation the Concurrent Computations of Multiple Matrix Products proportional allocation: the algorithm allocates processors proportional to the computation amount of each task. Trying to minimize the completion of all tasks by balancing the execution time of each task. For example We have two matrix multiplication (2, 3, 8), (4, 4, 3) and 20 processor If we allocate 10 processors to each matrix multiplication, we will DA (2, 3, 8, 10) = D(2, 3, 8, 8) = 6 unit time DB (4, 4, 3, 10) = D(4, 4, 3, 6) = 8 unit time D(DA , DB ) = 8 unit time If we allocate 8 processors to A, 12 processors to B, we will DA (2, 3, 8, 8) = D(2, 3, 8, 8) = 6 unit time DB (4, 4, 3, 12) = D(4, 4, 3, 12) = 4 unit time D(DA , DB ) = 6 unit time yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 11 / 30
  13. 13. Processor Allocation for Matrix Products DPA Algorithm DPA for Two Matrix Products- O(P) time Discrete Processor Allocation for Two Matrix Products(DPA) Input: Two Matrix products X = (mx , mx+1 , mx+2 ) and Y = (my , my +1 , my +2 ) and a set of P processors. Output: The number of processors allocated to the matrix products X and Y , denoted as Px and Py , which satisfy 1 ≤ Px , Py ≤ P and Px + Py ≤ P yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 12 / 30
  14. 14. Processor Allocation for Matrix Products DPA Algorithm DPA for Two Matrix Products- O(P) time Discrete Processor Allocation for Two Matrix Products(DPA) mx mx+1 mx+2 1 Set Pprop = mx mx+1 mx+2 +my my +1 my +2 P yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 13 / 30
  15. 15. Processor Allocation for Matrix Products DPA Algorithm DPA for Two Matrix Products- O(P) time Discrete Processor Allocation for Two Matrix Products(DPA) mx mx+1 mx+2 1 Set Pprop = mx mx+1 mx+2 +my my +1 my +2 P 2 Find dx,i in D(mx , mx+2 ) which satisfies dx,i ≤ Pprop yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 13 / 30
  16. 16. Processor Allocation for Matrix Products DPA Algorithm DPA for Two Matrix Products- O(P) time Discrete Processor Allocation for Two Matrix Products(DPA) mx mx+1 mx+2 1 Set Pprop = mx mx+1 mx+2 +my my +1 my +2 P 2 Find dx,i in D(mx , mx+2 ) which satisfies dx,i ≤ Pprop 3 Find dy ,j in D(my , my +2 ) which satisfies dy ,j ≤ P − Pprop yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 13 / 30
  17. 17. Processor Allocation for Matrix Products DPA Algorithm DPA for Two Matrix Products- O(P) time Discrete Processor Allocation for Two Matrix Products(DPA) mx mx+1 mx+2 1 Set Pprop = mx mx+1 mx+2 +my my +1 my +2 P 2 Find dx,i in D(mx , mx+2 ) which satisfies dx,i ≤ Pprop 3 Find dy ,j in D(my , my +2 ) which satisfies dy ,j ≤ P − Pprop 4 If Φ(mx , mx+1 , mx+2 , dx,i ) < Φ(my , my +1 , my +2 , dy ,j ), then Px = dx,i , Py = P − dx,i . Otherwise Px = P − dy ,j , Py = dy ,j yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 13 / 30
  18. 18. Processor Allocation for Matrix Products DPA Algorithm DPA description in 2003 mx mx+1 mx+2 1 Pprop = P × mx mx+1 mx+2 +my my +1 my +2 2 di = LD(mx mx+2 , Pprop ) 3 di+1 = SD(mx mx+2 , Pprop ) 4 dj = LD(mj mj+2 , P − Pprop ) 5 dj+1 = SD(mj mj+2 , P − Pprop ) 6 if Φ(X , di ) ≥ Φ(Y , dj ) then 1 if max(Φ(X , di+1 ), Φ(Y , LD(P − di+1 ))) Px = di+1 , Py = LD(my my +2 , P − di+1 ) 2 else Px = di , Py = LD(my my +2 , P − di + 1) 3 elseif max(Φ(X , LD(P − dj+1 )), Φ(Y , dj+1 )) < Φ(Y , dj ) then Px = LD(mx mx+2 , P − dj+1 ), Py = dj+1 4 else Px = LD(mx mx + 2, P − dj ), Py = dj yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 14 / 30
  19. 19. Processor Allocation for Matrix Products DPA Algorithm DPA for Two Matrix Products - cont’d the na¨ algorithm time complexity is O(P) , P is the number of ıve processors.(The major reason is step 2,3) The completion time of two matrix products when processors are allocated by the DPA algorithm is shorter then or equal to that by the proportional allocation. DPA guarantees the minimum completion time for two matrix products. DPA can easily extended for k independent matrix products. yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 15 / 30
  20. 20. Processor Allocation for Matrix Products DPA Algorithm DPA for k Matrix Products DPA-k guarantees the minimum completion time of k independent matrix products. Discrete Processor Allocation for k Matrix Products (DPA-k) Input: k matrix produets Xi = (mi,1 , mi,2 , mi,3 ) for 1 ≤ i ≤ k are given on P processos (k ≤ P) Output: The number of allocated processors Pi for each matrix Xi which satisfies k Pi ≤ P i=1 yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 16 / 30
  21. 21. Processor Allocation for Matrix Products DPA Algorithm DPA for k Matrix Products - cont’d Discrete Processor Allocation for k Matrix Products (DPA-k) 1 For i = 1 to k do m m m 1 Pprop,i = Pk i,1 i,2 i,3 P j=1 mj,1 mj,2 mj,3 2 Find the maximum di,j in D(mi,1 , mi,3 ) which satisfies di,j ≤ Pprop, i 3 Let Pi = di,j yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 17 / 30
  22. 22. Processor Allocation for Matrix Products DPA Algorithm DPA for k Matrix Products - cont’d Discrete Processor Allocation for k Matrix Products (DPA-k) 1 For i = 1 to k do m m m 1 Pprop,i = Pk i,1 i,2 i,3 P j=1 mj,1 mj,2 mj,3 2 Find the maximum di,j in D(mi,1 , mi,3 ) which satisfies di,j ≤ Pprop, i 3 Let Pi = di,j k 2 While P − i=1 Pi > 0 do 1 Find the product Xi with the maimum Φ(mi,1 , mi,2 , mi,3 , Pi ) 2 If (Pi < mi,1 mi,3 ) then 1 Find the minimum di,j in D(mi,1 , mi,3 ) which satisfies di,j > Pi If (P − k Pi − (di,j − Pi ) > 0) then P 2 i=1 Pi = di,j Otherwise, stop the algorithm. else stop the algorithm yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 17 / 30
  23. 23. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm - Introduction Matrix Chain Scheduling Algorithm The algorithm has three stages. 1 Find optimal product sequence by MCOP yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 18 / 30
  24. 24. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm - Introduction Matrix Chain Scheduling Algorithm The algorithm has three stages. 1 Find optimal product sequence by MCOP 2 Top-Down Processor Assignment processors are partitioned and assigned to each subtree to balance the evaluation time of both matrix product chains. yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 18 / 30
  25. 25. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm - Introduction Matrix Chain Scheduling Algorithm The algorithm has three stages. 1 Find optimal product sequence by MCOP 2 Top-Down Processor Assignment processors are partitioned and assigned to each subtree to balance the evaluation time of both matrix product chains. 3 Bottom-Up Concurrent Execution The steps executes products independently from the leaf and tries to modify the product sequence to enhance concurrency so as to reduce the evaluation time of Ma . yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 18 / 30
  26. 26. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm - Introduction Matrix Chain Scheduling Algorithm The algorithm has three stages. 1 Find optimal product sequence by MCOP 2 Top-Down Processor Assignment processors are partitioned and assigned to each subtree to balance the evaluation time of both matrix product chains. 3 Bottom-Up Concurrent Execution The steps executes products independently from the leaf and tries to modify the product sequence to enhance concurrency so as to reduce the evaluation time of Ma . a M: a matrix chain product with n matrices, i.e, M = M1 × M2 × · · · × Mn . The Algorithm Time Complexity is O(n2 + nP) yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 18 / 30
  27. 27. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm First Step - Find Optimal Product Sequence by MCOP The optimal product sequence can be found of in O(n log(n)) time using a sequential algorithm. Many parallel algorithms have been studied which run in ploylog time on P processor system. yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 19 / 30
  28. 28. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm First Step - Find Optimal Product Sequence by MCOP The optimal product sequence can be found of in O(n log(n)) time using a sequential algorithm. Many parallel algorithms have been studied which run in ploylog time on P processor system. notation W [i][j]: the minimum number for operations for evaluating Li,j a . S[i][j]: the matrix index for partitioning the matrix chain (Mi × · · · × Mj ) a Li,j : the sequence subtree of L for (Mi × · · · × Mj ). yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 19 / 30
  29. 29. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm Second Step - Top-Down Processor Assigment pi,j 4 processors were assigned to Li,j 5 W [i][S[i][j]] pi,j × W [i][S[i][j]]+W [S[i][j]+1][j] processors are assigned to the subtree Li,S[i][j] . W [S[i][j]+1][j] pi,j × W [i][S[i][j]]+W [S[i][j]+1][j] processors are assigned to the subtree LS[i][j]+1,j It define the tree recursively. 4 pi,j : the number of processor assigned for evaluating (Mi × · · · × Mj ) 5 Li,j : the sequence subtree of L for (Mi × · · · × Mj ). yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 20 / 30
  30. 30. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm Second Step - For Example For example, given a chain of 8 matrices with dimensions {3, 8, 9, 5, 3, 3, 3, 4} on 64 processors parallel system. yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 21 / 30
  31. 31. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm Third Step - Bottom-Up Concurrent Execution There are idle processors in the execution of Li,j , the step try to modify the product sequence to use these idle processors. There are two condition. y > x + 1 and the node associated with My +1 has the left child node associated with My in the sequence tree. y < x − 1 and the node associated with My has the right child node associated with My +1 in the sequence tree. yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 22 / 30
  32. 32. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm Sequence modification yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 23 / 30
  33. 33. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm If y > z then ∆C = my +1 my +2 (my − mz ) + mz my (my +2 − my +1 )6 If y < z then ∆C = my +1 my +2 (my − mz+1 ) + mz+1 my (my +2 − my +1 ) Lemma If a leaf product (Mx , Mx+1 ) has a candidate product (My My +1 ) and the DPA algorithm will allocate px and py processors to the two matrix products, respectively, then evaluation using the modified sequence reduces the evaluation time when ∆C < min(Φ(mx , mx+1 , mx+2 , mx mx+2 )×(px +py −mx mx+2 ), my my +1 my +2 ) 6 ∆C : the amount of increased computation by modifying the current sequence tree. yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 24 / 30
  34. 34. Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm Candidate Products yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 25 / 30
  35. 35. Matrix Chain Scheduling Algorithm Two-Pass Matrix Chain Scheduling Algorithm Stage-1 MCOP Two-Pass Matrix Chain Scheduling Algorithm - Stage 1 Stage-1 MCOP 1 Find the optimal product sequence for the MCOP by using a parallel algorithm. 2 Generate the sequence tree L. The Dynamic Programing Algorhtm - O(n2 ) to O(n3 ) The sequential algorithm - O(n log(n)) The Parallel Algorithm - O(log3 n) yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 26 / 30
  36. 36. Matrix Chain Scheduling Algorithm Two-Pass Matrix Chain Scheduling Algorithm Stage-2 Top-Down Processor Assignment Two-Pass Matrix Chain Scheduling Algorithm - Stage 2 Stage-2 Top-Down Processor Assignment 1 Intialize i = 1, j = n, pi,j = P. W [i][S[i][j]] 2 If i is not S[i][j], then allocate pi,j × W [i][S[i][j]]+W [S[i][j]+1][j] processors to Li,S[i][j] .ab W [S[i][j]+1][j] 3 If j is not S[i][j] + 1, then allocate pi,j × W [i][S[i][j]]+W [S[i][j]+1][j] processors to LS[i][j]+1,j . 4 If i is j + 1 or j, then finish this stage; otherwise, call this algorithm recursively, once with i = i, j = S[i][j] and once i = S[i][j] + 1, j = j. a W [i][j]: the minimum number for operations for evaluating Li,j . b S[i][j]: the matrix index for partitioning the matrix chain (Mi × · · · × Mj ) yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 27 / 30
  37. 37. Matrix Chain Scheduling Algorithm Two-Pass Matrix Chain Scheduling Algorithm Stage-3 Bottom-Up Concurrent Execution Two-Pass Matrix Chain Scheduling Algorithm - Stage 3 Stage-3 Bottom-Up Concurrent Execution 1. Let (Mk , Mk+1 ) be a leaf product and pk,k+1 be the number of processors allocated to the leaf product. If pk,k+1 < mk mk+2 , then go to 5. 2. Find a candidate product by tracing ancestors of the leaf product using postorder traversal. If there is no such candidate product, go to 5. 3. Let the product (Ml Ml+1 ) be a candidate product found by tracing ancestors of the leaf product (Mk Mk+1 ). Check whether the candidate product satisfies Lemma 2. If not, go to 2. yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 28 / 30
  38. 38. Matrix Chain Scheduling Algorithm Two-Pass Matrix Chain Scheduling Algorithm Stage-3 Bottom-Up Concurrent Execution Two-Pass Matrix Chain Scheduling Algorithm - Stage 3 Stage-3 Bottom-Up Concurrent Execution 4. Modify the sequence tree such that the candidate product (Ml Ml+1 ) can be executed concurrently with (Mk , Mk+1 ). Relocate pk pk+1 processors using the DPA algorithm and go to 1 for each leaf product of two split subtrees. 5. Schedule the leaf product on min(pi,j , mk mk+2 ) processors. Set the parent of the leaf procuctas a new leaf product. yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 29 / 30
  39. 39. Conclusion Conclusion We discuss the Matrix Chain Products Problem, DPA Algorithm, Matrix Chain Scheduling Algorithm. The paper emphasizes the optimal matrix chain products tree modification on parallel system. I will try to reduce my slide XD. yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 30 / 30

×