Tall-and-Skinny QRFactorizations inMapReduce                PAUL G. CONSTANTINE                          AUSTIN BENSON !  ...
Questions?                                                            Most recent code at      http://github.com/arbenson/...
                                                               the solution of                                            ...
Tall-and-Skinny                         Amatrices (m ≫ n)                                                          4      ...
Tall-and-Skinny matrices !    (m ≫ n) arise in        regression with many samples    block iterative methods    panel fac...
The Database       Input "                                                      Time history"             s1 -> f1    Para...
Dynamic ModeDecompositionOne simulation, ~10TB of data, compute theSVD of a space-by-time matrix.                    DMD v...
MapReduceIt’s a computational model "and a framework.                                                         8           ...
MapReduce                                                    9             David Gleich · Purdue   Cornell CS
The MapReduce FrameworkOriginated at Google for indexing web   Data scalablepages and computing PageRank.                M...
Computing variance in MapReduce          Run 1                Run 2                           Run 3T=1   T=2    T=3   T=1 ...
Mesh point variance in MapReduce             Run 1                 Run 2                            Run 3 T=1     T=2     ...
MapReduce vs. Hadoop.     MapReduce!             Hadoop!     A computation          An implementation     model with:"    ...
Current state of the art forMapReduce QRMapReduce is often used to compute theprincipal components of large datasets.These...
MapReduce is great for TSQR!!You don’t need  ATAData A tall and skinny (TS) matrix by rowsInput 500,000,000-by-50 matrix"E...
Tall-and-Skinny QR                                                    16             David Gleich · Purdue   Cornell CS
Communication avoiding QRCommunication avoiding TSQR (Demmel et al. 2008) First, do QR                                    ...
Serial QR factorizations!Fully serialet al. 2008)  (Demmel TSQR                   Compute QR of    ,                   rea...
Tall-and-skinny matrixMapReduce matrix storagestorage in MapReduce                                                        ...
Algorithm                                             Data Rows of a matrix              A1   A1                        Ma...
Key Limitation      Computes only R and not Q                   Can get Q via Q = AR-1 with another MR      iteration.    ...
Achieving numerical stability     norm ( QTQ – I )                         AR-1                                           ...
Why MapReduce?                                                  23           David Gleich · Purdue   Cornell CS
In hadoopy  Full code in hadoopyimport random, numpy, hadoopy                            def close(self):class SerialTSQR:...
Fault injection                              200                                      Faults (200M by 200)  Time to comple...
How to get Q?                                                    26             David Gleich · Purdue   Cornell CS
Idea 1 (unstable)                             Mapper 1                                 R-1                             A1 ...
There’s a famous quote that “two iterationsIdea 2 (better)                    of iterative refinement are enough” attribute...
Communication avoiding QRCommunication avoiding TSQR (Demmel et al. 2008) First, do QR                                    ...
Idea 3 (best!)                                            3. Distribute the                                               ...
The price is right!   2500             Full TSQR is             faster than             refinement for   … and not any seco...
What can we do now?                                                  32           David Gleich · Purdue   Cornell CS
PCA of 80,000,000!         images                                                       First 16                          ...
A Large Scale ExampleNonlinear heat transfer model80k nodes, 300 time-steps104 basis runsSVD of 24m x 104 data matrix 500x...
What’s next?       Investigate randomized algorithms for       computing SVDs for fatter matrices.                        ...
Questions?                                                            Most recent code at      http://github.com/arbenson/...
Upcoming SlideShare
Loading in …5
×

Direct tall-and-skinny QR factorizations in MapReduce architectures

780 views

Published on

An updated version of the TSQR talk I gave at Cor

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
780
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Direct tall-and-skinny QR factorizations in MapReduce architectures

  1. 1. Tall-and-Skinny QRFactorizations inMapReduce PAUL G. CONSTANTINE AUSTIN BENSON ! JOE NICHOLS ! STANFORD UNIVERSITYDAVID F. GLEICH ! JAMES DEMMEL ! PURDUE UNIVERSITY UC BERKELEYCOMPUTER SCIENCE ! JOE RUTHRUFF ! DEPARTMENT JEREMY TEMPLETON ! SANDIA 1 David Gleich · Purdue Cornell CS
  2. 2. Questions? Most recent code at http://github.com/arbenson/mrtsqr 2David Gleich · Purdue Cornell CS
  3. 3.    the solution of QR is block norQuick review of QRQR Factorization    is    orthogonal (   ) “normalize” a v usually genera computing    in    is    upper triangular.Let    , real Using QR for regression    is given by    the solution of    0 A QR is = Q block normalization   is    orthogonal (   ) “normalize” a vector R usually generalizes to computing    in the QR   is    upper David Gleich (Sandia) triangular. MapReduce 2011 0 A = Q R 3David Gleich (Sandia) MapReduce 2011 David Gleich · Purdue Cornell4/22 CS
  4. 4. Tall-and-Skinny Amatrices (m ≫ n) 4 David Gleich · Purdue Cornell CS
  5. 5. Tall-and-Skinny matrices ! (m ≫ n) arise in regression with many samples block iterative methods panel factorizations model reduction problems!A general linear models " with many samples tall-and-skinny SVD/PCA From tinyimages" All of these applications ! collection need a QR factorization of ! a tall-and-skinny matrix.! some only need  R ! 5 David Gleich · Purdue Cornell CS
  6. 6. The Database Input " Time history" s1 -> f1 Parameters of simulation s2 -> f2 s f" ~100GB sk -> fk 2 3 A single simulationThe simulation as a vector q(x1 , t1 , s) 6 . . 7 at one time step 6 . 7 6 7 6q(xn , t1 , s)7 6 7 6q(x1 , t2 , s)7 6 7 ⇥ ⇤ f(s) = 6 . 7 6 6 . . 7 7 X = f(s1 ) f(s2 ) ... f(sp ) 6q(xn , t2 , s)7 6 7 6 . 7 The database as a very" 4 . . 5 tall-and-skinny matrix q(xn , tk , s) 6 David Gleich · Purdue Cornell CS
  7. 7. Dynamic ModeDecompositionOne simulation, ~10TB of data, compute theSVD of a space-by-time matrix. DMD video 7 David Gleich · Purdue Cornell CS
  8. 8. MapReduceIt’s a computational model "and a framework. 8 David Gleich · Purdue Cornell CS
  9. 9. MapReduce 9 David Gleich · Purdue Cornell CS
  10. 10. The MapReduce FrameworkOriginated at Google for indexing web Data scalablepages and computing PageRank. Maps M M 1 2 1 M 2 M Reduce M M R 3 4 MExpress algorithms in " 3 R 4 M Mdata-local operations. 5 M Shuffle 5Implement one type of Fault-tolerance by designcommunication: shuffle. Input stored in triplicate Reduce input/" MShuffle moves all data with M output on disk Rthe same key to the same M R Mreducer. Map output" persisted to disk" 10 before shuffle David Gleich · Purdue Cornell CS
  11. 11. Computing variance in MapReduce Run 1 Run 2 Run 3T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3 11 David Gleich · Purdue Cornell CS
  12. 12. Mesh point variance in MapReduce Run 1 Run 2 Run 3 T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3 M M M1. Each mapper out- 2. Shuffle moves allputs the mesh points values from the samewith the same key. mesh point to the R R same reducer. 3. Reducers just compute a numerical variance. 12 David Gleich · Purdue Cornell CS
  13. 13. MapReduce vs. Hadoop. MapReduce! Hadoop! A computation An implementation model with:" of MapReduce Map a local data using the HDFS transform" parallel file-system. Shuffle a grouping Others ! function " Pheonix++, Twisted, Reduce an Google MapReduce, aggregation spark … 13 David Gleich · Purdue Cornell CS
  14. 14. Current state of the art forMapReduce QRMapReduce is often used to compute theprincipal components of large datasets.These approaches all form the normal equations T A Aand work with it. 14 David Gleich · Purdue Cornell CS
  15. 15. MapReduce is great for TSQR!!You don’t need  ATAData A tall and skinny (TS) matrix by rowsInput 500,000,000-by-50 matrix"Each record 1-by-50 row"HDFS Size 183.6 GBTime to compute read A 253 sec. write A 848 sec.!Time to compute  R in qr(A) 526 sec. w/ Q=AR-1 1618 sec. "Time to compute Q in qr(A) 3090 sec. (numerically stable)! 15/22 David Gleich · Purdue Cornell CS
  16. 16. Tall-and-Skinny QR 16 David Gleich · Purdue Cornell CS
  17. 17. Communication avoiding QRCommunication avoiding TSQR (Demmel et al. 2008) First, do QR Second, compute factorizations a QR factorization of each local of the new “R” matrix    17 Demmel et al.David Communicating avoiding parallel and sequential QR. 2008. Gleich · Purdue Cornell CS
  18. 18. Serial QR factorizations!Fully serialet al. 2008) (Demmel TSQR Compute QR of    , read    , update QR, … 18 Demmel et al. 2008. Communicating avoiding parallel Cornell CS QR. David Gleich · Purdue and sequential
  19. 19. Tall-and-skinny matrixMapReduce matrix storagestorage in MapReduce   A1Key is an arbitrary row-idValue is the    array for A2 a row. A3Each submatrix    is an input split. A4 You can also store multiple rows together. It goes a little faster. 19 David Gleich · Purdue Cornell CSDavid Gleich (Sandia) MapReduce 2011 10/2
  20. 20. Algorithm Data Rows of a matrix A1 A1 Map QR factorization of rows A2 qr Reduce QR factorization of rows A2 Q2 R2Mapper 1 qrSerial TSQR A3 A3 Q3 R3 A4 qr emit A4 Q4 R4 A5 A5 qr A6 A6 Q6 R6Mapper 2 qrSerial TSQR A7 A7 Q7 R7 A8 qr emit A8 Q8 R8 R4 R4Reducer 1Serial TSQR qr emit R8 R8 Q R 20 David Gleich · Purdue Cornell CS
  21. 21. Key Limitation Computes only R and not Q Can get Q via Q = AR-1 with another MR iteration. Numerical stability: dubious T kQ Q Ik is large although iterative refinement helps. 21 David Gleich · Purdue Cornell CS
  22. 22. Achieving numerical stability norm ( QTQ – I ) AR-1 AR + " -1 ent iterative refinem Direct TSQR 105 1020 Condition number 22 David Gleich · Purdue Cornell CS
  23. 23. Why MapReduce? 23 David Gleich · Purdue Cornell CS
  24. 24. In hadoopy Full code in hadoopyimport random, numpy, hadoopy def close(self):class SerialTSQR: self.compress() def __init__(self,blocksize,isreducer): for row in self.data: key = random.randint(0,2000000000) self.bsize=blocksize yield key, row self.data = [] if isreducer: self.__call__ = self.reducer def mapper(self,key,value): else: self.__call__ = self.mapper self.collect(key,value) def reducer(self,key,values): def compress(self): for value in values: self.mapper(key,value) R = numpy.linalg.qr( numpy.array(self.data),r) if __name__==__main__: # reset data and re-initialize to R mapper = SerialTSQR(blocksize=3,isreducer=False) self.data = [] reducer = SerialTSQR(blocksize=3,isreducer=True) for row in R: hadoopy.run(mapper, reducer) self.data.append([float(v) for v in row]) def collect(self,key,value): self.data.append(value) if len(self.data)>self.bsize*len(self.data[0]): self.compress() 24 David Gleich (Sandia) MapReduceDavid 2011 Gleich · Purdue Cornell CS 13/22
  25. 25. Fault injection 200 Faults (200M by 200) Time to completion (sec) With 1/5 tasks failing, No faults (200M by 200) the job only takes twice 100 Faults (800M by 10) as long. No faults " (800M by 10) 10 100 1000 1/Prob(failure) – mean number of success per failure 25 David Gleich · Purdue Cornell CS
  26. 26. How to get Q? 26 David Gleich · Purdue Cornell CS
  27. 27. Idea 1 (unstable) Mapper 1 R-1 A1 Q1 R-1 A2 Q2 R TSQR R-1 A3 Q3 Dist ribu R-1 te A4 Q4 R 27 David Gleich · Purdue Cornell CS
  28. 28. There’s a famous quote that “two iterationsIdea 2 (better) of iterative refinement are enough” attributed to Parlett Mapper 1 Mapper 2 R-1 T-1 A1 Q1 Q1 Q1 R-1 T-1 A2 Q2 Q2 Q2 R T TSQR TSQR R-1 T-1 Dist A3 Q3 Q3 Q3 Dist ribu ribu R-1 T-1 te te A4 Q4 Q4 A4 Q4 R T 28 David Gleich · Purdue Cornell CS
  29. 29. Communication avoiding QRCommunication avoiding TSQR (Demmel et al. 2008) First, do QR Second, compute factorizations a QR factorization of each local of the new “R” matrix    29 Demmel et al.David Communicating avoiding parallel and sequential QR. 2008. Gleich · Purdue Cornell CS
  30. 30. Idea 3 (best!) 3. Distribute the pieces of Q*1 and form the true Q Mapper 1 Mapper 3 Task 2 R1 Q11 A1 Q1 R1 Q11 R Q1 Q1 R2 Q21 Q output R output R2 R3 Q31 Q21 A2 Q2 Q2 Q2 R4 Q41 R3 Q31 2. Collect R on one A3 Q3 Q3 Q3 node, compute Qs for each piece R4 Q41 A4 Q4 Q4 Q4 1. Output local Q and R in separate files 30 David Gleich · Purdue Cornell CS
  31. 31. The price is right! 2500 Full TSQR is faster than refinement for … and not any seconds few columns slower for many columns. 500 31 David Gleich · Purdue Cornell CS
  32. 32. What can we do now? 32 David Gleich · Purdue Cornell CS
  33. 33. PCA of 80,000,000! images First 16 columns of V as images1000 pixels R    V SVD (principal TSQR components) 80,000,000 images Top 100 A X singular values Zero" mean" rows 33/22 MapReduce Post Processing David Gleich · Purdue Cornell CS
  34. 34. A Large Scale ExampleNonlinear heat transfer model80k nodes, 300 time-steps104 basis runsSVD of 24m x 104 data matrix 500x reduction in wall clock time(100x including the SVD) 34 David Gleich · Purdue ICASSP
  35. 35. What’s next? Investigate randomized algorithms for computing SVDs for fatter matrices. Halko, 9 RANDOMIZED ALGORITHMS FOR MATRIX APPROXIMATION Algorithm: Randomized PCA Martinsson, Tropp. an q, this procedure computes an kapproximate rank-2k factorization Given exponent m × n matrix A, the number of principal components, and an SIREV 2011 U ΣV ∗ . The columns of U estimate the first 2k principal components of A. A: Stage Generate an n × 2k Gaussian test matrix Ω. 1 2 Form Y = (AA∗ )q AΩ by multiplying alternately with A and A∗ 3 Construct a matrix Q whose columns form an orthonormal basis for the B: range of Y . Stage Form B = Q∗ A. 1 2 Compute an SVD of the small matrix: B = U ΣV ∗ . 3 Set U = QU . 35 singular spectrum of the data matrix often decays quite slowly. To address thisCornell CS David Gleich · Purdue diffi-
  36. 36. Questions? Most recent code at http://github.com/arbenson/mrtsqr 36David Gleich · Purdue Cornell CS

×