SlideShare a Scribd company logo
1 of 36
Download to read offline
Tall-and-Skinny QR
Factorizations in
MapReduce
                PAUL G. CONSTANTINE
                          AUSTIN BENSON !
                          JOE NICHOLS !
                            STANFORD UNIVERSITY
DAVID F. GLEICH !         JAMES DEMMEL !
 PURDUE UNIVERSITY
         UC BERKELEY
COMPUTER SCIENCE !        JOE RUTHRUFF !
 DEPARTMENT
              JEREMY TEMPLETON !
                            SANDIA





                                                             1
                      David Gleich · Purdue
   Cornell CS
Questions?
                                       
                     Most recent code at
      http://github.com/arbenson/mrtsqr




                                       2
David Gleich · Purdue
   Cornell CS
                                                               the solution of

                                                                                                   QR is block nor

Quick review of QR
QR Factorization
                                     is               orthogonal (                       )         “normalize” a v
                                                                                                   usually genera
                                                                                                   computing    in
                                     is               upper triangular.
Let                                   , real                              Using QR for regression

                                                                                        is given by
                                                                          the solution of   
                                                                                                   0
                                                                         A QR is =       Q
                                                                                 block normalization
   is                    orthogonal (                     )                “normalize” a vector
                                                                                                     R
                                                                           usually generalizes to
                                                                           computing    in the QR
   is                   upper David Gleich (Sandia)
                              triangular.                                       MapReduce 2011




                                                                            0
                                             A        =       Q
                                                                                R




                                                                                                               3
David Gleich (Sandia)                             MapReduce 2011 David   Gleich · Purdue
        Cornell4/22
                                                                                                         CS
Tall-and-Skinny                         A
matrices (m ≫ n)




                                                          4
               David Gleich · Purdue
       Cornell CS
Tall-and-Skinny matrices !
    (m ≫ n) arise in
    


    regression with many samples
    block iterative methods
    panel factorizations
    model reduction problems!
A
    general linear models "
    with many samples
    tall-and-skinny SVD/PCA
    
                                               From tinyimages"
    All of these applications !                       collection
    need a QR factorization of !
    a tall-and-skinny matrix.!
       some only need  R !




                                                                           5
                                   David Gleich · Purdue
    Cornell CS
The Database

       Input "                                                      Time history"             s1 -> f1
    Parameters
                                                     of simulation
            s2 -> f2
                              s
                                         f"                      
                                                                      ~100GB
                 sk -> fk

                                     2                 3 A single simulation
The simulation as a vector




                                        q(x1 , t1 , s)
                                      6       .
                                              .        7 at one time step
                                      6       .        7
                                      6                7
                                      6q(xn , t1 , s)7
                                      6                7
                                      6q(x1 , t2 , s)7
                                      6                7                   ⇥                          ⇤
                               f(s) = 6       .        7
                                      6
                                      6
                                              .
                                              .        7
                                                       7             X = f(s1 ) f(s2 ) ... f(sp )
                                      6q(xn , t2 , s)7
                                      6                7
                                      6       .        7                     The database as a very"
                                      4       .
                                              .        5
                                                                              tall-and-skinny matrix
                                        q(xn , tk , s)




                                                                                                              6
                                                                   David Gleich · Purdue
       Cornell CS
Dynamic Mode
Decomposition
One simulation, ~10TB of data, compute the
SVD of a space-by-time matrix.



                    DMD video




                                                                   7
                            David Gleich · Purdue
   Cornell CS
MapReduce

It’s a computational model "
and a framework.




                                                         8
                  David Gleich · Purdue
   Cornell CS
MapReduce




                                                    9
             David Gleich · Purdue
   Cornell CS
The MapReduce Framework
Originated at Google for indexing web   Data scalable
pages and computing PageRank.
                Maps
                        M         M
                                                                           1
        2
                                        1
     M

                                       2
     M
                                                    Reduce
                                                                           M         M
                                                      R                    3
        4
                                               M
Express algorithms in "
                                        3
                                                      R
                                        4
     M                                M
data-local operations.
                 5
     M Shuffle
                        5



Implement one type of                   Fault-tolerance by design
communication: shuffle.
                      Input stored in triplicate
                                                                    Reduce input/"
                                                        M
Shuffle moves all data with                              M
                                                                    output on disk
                                                                 R
the same key to the same                                M
                                                                 R
                                                        M
reducer.
                                                   Map output"
                                                            persisted to disk"




                                                                                          10
                                                            before shuffle
                                         David Gleich · Purdue
     Cornell CS
Computing variance in MapReduce
          Run 1
                Run 2
                           Run 3


T=1
   T=2
    T=3
   T=1
   T=2
    T=3
       T=1
      T=2
       T=3




                                                                                11
                                     David Gleich · Purdue
       Cornell CS
Mesh point variance in MapReduce
             Run 1
                 Run 2
                            Run 3


 T=1
     T=2
     T=3
    T=1
   T=2
    T=3
       T=1
      T=2
        T=3
            M
                       M
                           M


1. Each mapper out-                                                2. Shuffle moves all
puts the mesh points                                               values from the same
with the same key.
                                                mesh point to the
                          R
                         R
            same reducer.


  3. Reducers just
  compute a numerical
  variance.




                                                                                      12
                                          David Gleich · Purdue
        Cornell CS
MapReduce vs. Hadoop.

     MapReduce!             Hadoop!
     A computation          An implementation
     model with:"           of MapReduce
     Map a local data       using the HDFS
     transform"             parallel file-system.
     Shuffle a grouping      Others !
     function "
                               Pheonix++, Twisted,
     Reduce an                 Google MapReduce,
     aggregation 
             spark …




                                                                13
                         David Gleich · Purdue
   Cornell CS
Current state of the art for
MapReduce QR

MapReduce is often used to compute the
principal components of large datasets.

These approaches all form the normal equations

                      T
                     A A
and work with it.




                                                                  14
                           David Gleich · Purdue
   Cornell CS
MapReduce is great for TSQR!!
You don’t need  ATA
Data A tall and skinny (TS) matrix by rows

Input 500,000,000-by-50 matrix"
Each record 1-by-50 row"
HDFS Size 183.6 GB

Time to compute read A 253 sec. write A 848 sec.!
Time to compute  R in qr(A) 526 sec. w/ Q=AR-1 1618 sec. "
Time to compute Q in qr(A) 3090 sec. (numerically stable)!





                                                                         15/22
                                  David Gleich · Purdue
   Cornell CS
Tall-and-Skinny QR




                                                    16
             David Gleich · Purdue
   Cornell CS
Communication avoiding QR
Communication avoiding TSQR
 (Demmel et al. 2008)



 First, do QR                                          Second, compute
 factorizations                                        a QR factorization
 of each local                                         of the new “R”
 matrix   




                                                                                    17
                  Demmel et al.David Communicating avoiding parallel and sequential QR.
                               2008. Gleich · Purdue
           Cornell CS
Serial QR factorizations!
Fully serialet al. 2008)
  (Demmel TSQR

                   Compute QR of    ,
                   read    , update QR, …




                                                                                         18
                   Demmel et al. 2008. Communicating avoiding parallel Cornell CS
 QR.
                                   David Gleich · Purdue
              and sequential
Tall-and-skinny matrix
MapReduce matrix storage
storage in MapReduce

  
                                                                            A1

Key is an arbitrary row-id
Value is the       array for                                                A2
  a row.

                                                                            A3
Each submatrix          is an
  input split.
                                                                            A4

                                    You can also store multiple rows
                                      together. It goes a little faster.




                                                                                 19
                                      David Gleich · Purdue
       Cornell CS
David Gleich (Sandia)           MapReduce 2011                                    10/2
Algorithm
                                             Data Rows of a matrix
              A1   A1                        Map QR factorization of rows
                   A2
                        qr                   Reduce QR factorization of rows
              A2             Q2   R2
Mapper 1                                qr
Serial TSQR   A3                  A3          Q3    R3
                                                    A4    qr                emit
              A4                                               Q4      R4

              A5   A5
                        qr
              A6   A6        Q6   R6
Mapper 2                                qr
Serial TSQR   A7                  A7          Q7    R7

                                                    A8    qr                emit
              A8                                               Q8      R8


              R4   R4
Reducer 1
Serial TSQR             qr             emit
              R8   R8        Q    R




                                                                                           20
                                              David Gleich · Purdue
         Cornell CS
Key Limitation
      Computes only R and not Q 
      
      

      Can get Q via Q = AR-1 with another MR
      iteration. 
      Numerical stability: dubious 
                   T
          
    kQ Q       Ik is large
      although iterative refinement helps.
      




                                                                  21
                           David Gleich · Purdue
   Cornell CS
Achieving numerical stability
     norm ( QTQ – I )




                         AR-1



                                              AR + "
                                                -1

                                                  ent
                                 iterative refinem      Direct TSQR

                            105
                                1020
                              Condition number




                                                                                      22
                                               David Gleich · Purdue
   Cornell CS
Why MapReduce?




                                                  23
           David Gleich · Purdue
   Cornell CS
In hadoopy
  Full code in hadoopy
import random, numpy, hadoopy                            def close(self):
class SerialTSQR:                                          self.compress()
 def __init__(self,blocksize,isreducer):                   for row in self.data:
                                                            key = random.randint(0,2000000000)
   self.bsize=blocksize                                     yield key, row
   self.data = []
   if isreducer: self.__call__ = self.reducer            def mapper(self,key,value):
   else: self.__call__ = self.mapper                      self.collect(key,value)

                                                         def reducer(self,key,values):
 def compress(self):                                      for value in values: self.mapper(key,value)
  R = numpy.linalg.qr(
         numpy.array(self.data),'r')                    if __name__=='__main__':
  # reset data and re-initialize to R                     mapper = SerialTSQR(blocksize=3,isreducer=False)
  self.data = []                                          reducer = SerialTSQR(blocksize=3,isreducer=True)
  for row in R:                                           hadoopy.run(mapper, reducer)
   self.data.append([float(v) for v in row])

 def collect(self,key,value):
  self.data.append(value)
  if len(self.data)>self.bsize*len(self.data[0]):
    self.compress()




                                                                                                             24
  David Gleich (Sandia)                             MapReduceDavid
                                                             2011    Gleich · Purdue
        Cornell CS
   13/22
Fault injection
                              200
                                      Faults (200M by 200)
  Time to completion (sec)




                                                                                             With 1/5
                                                                                             tasks failing,
                                               No faults (200M by 200)
                      the job only
                                                                                             takes twice
                              100
         Faults (800M by 10)
                              as long.

                                     No faults "
                                     (800M by 10)

                                         10
                 100
                 1000

                                1/Prob(failure) – mean number of success per failure




                                                                                                               25
                                                                    David Gleich · Purdue
       Cornell CS
How to get Q?




                                                    26
             David Gleich · Purdue
   Cornell CS
Idea 1 (unstable)

                             Mapper 1
                                 R-1
                             A1
           Q1


                                    R-1
                             A2
           Q2
              R
      TSQR
                                    R-1
                             A3
           Q3
              Dist
                   ribu




                                    R-1
                        te




                             A4 
          Q4
                     R




                                                                                  27
                                           David Gleich · Purdue
   Cornell CS
There’s a famous quote that “two iterations
Idea 2 (better)
                    of iterative refinement are enough” attributed
                                    to Parlett


                             Mapper 1
                                  Mapper 2
                                 R-1
                                       T-1
                             A1
             Q1
                        Q1
              Q1


                                    R-1
                                       T-1
                             A2
             Q2
                        Q2
              Q2
              R
                                       T
      TSQR
                                          TSQR
                                    R-1
                                       T-1
              Dist




                             A3
             Q3
                        Q3
              Q3




                                                         Dist
                   ribu




                                                              ribu
                                    R-1
                                       T-1
                        te




                                                                   te
                             A4 
            Q4
                        Q4 
                                                                        A4               Q4
                     R




                                                                T




                                                                                               28
                                             David Gleich · Purdue
            Cornell CS
Communication avoiding QR
Communication avoiding TSQR
 (Demmel et al. 2008)



 First, do QR                                          Second, compute
 factorizations                                        a QR factorization
 of each local                                         of the new “R”
 matrix   




                                                                                    29
                  Demmel et al.David Communicating avoiding parallel and sequential QR.
                               2008. Gleich · Purdue
           Cornell CS
Idea 3 (best!)
                                            3. Distribute the
                                                           pieces of Q*1 and
                                                           form the true Q

       Mapper 1
                                                           Mapper 3
                                                 Task 2
                         R1
                                                   Q11
        A1
        Q1
                     R1
   Q11
 R
                   Q1
          Q1
                                           R2
    Q21




                                                               Q output
                               R output
                         R2
               R3
    Q31
                           Q21
        A2
        Q2
                                                     Q2
          Q2
                                           R4
    Q41
                         R3
                                                     Q31
                                    2. Collect R on one
        A3
        Q3
                                                     Q3
          Q3
                                    node, compute Qs
                                    for each piece
                         R4
                                                     Q41
        A4 
       Q4
                                                     Q4
          Q4

               1. Output local Q and
               R in separate files




                                                                                               30
                                             David Gleich · Purdue
              Cornell CS
The price is right! 

  2500
             Full TSQR is
             faster than
             refinement for   … and not any
 seconds




             few columns
    slower for many
                             columns.




      500




                                                                    31
                             David Gleich · Purdue
   Cornell CS
What can we do now?




                                                  32
           David Gleich · Purdue
   Cornell CS
PCA of 80,000,000!
         images
                                                       First 16
                                                                      columns
                                                                        of V as
                                                                       images
1000 pixels
                                       R                       V
                                              SVD
                   (principal
                                      TSQR
                        components)
 80,000,000 images




                                                           Top 100
                      A           X                        singular
                                                           values
                          Zero"
                          mean"
                          rows




                                                                                                             33/22
                          MapReduce                  Post Processing

                                                                      David Gleich · Purdue
   Cornell CS
A Large Scale Example




Nonlinear heat transfer model
80k nodes, 300 time-steps
104 basis runs
SVD of 24m x 104 data matrix
 500x reduction in wall clock time
(100x including the SVD)




                                                                           34
                                       David Gleich · Purdue 
   ICASSP
What’s next?
       Investigate randomized algorithms for
       computing SVDs for fatter matrices.
                                                                                     Halko,
                                                                                     9
       
                RANDOMIZED ALGORITHMS FOR MATRIX APPROXIMATION


                            Algorithm: Randomized PCA                                Martinsson,
                                                                                     Tropp.
       
 an q, this procedure computes an kapproximate rank-2k factorization
       Given
       exponent
                m × n matrix A, the number of principal components, and an
                                                                                     SIREV 2011
        U ΣV ∗ . The columns of U estimate the first 2k principal components of A.
       
 A:
       Stage
           Generate an n × 2k Gaussian test matrix Ω.
       
        1
        2  Form Y = (AA∗ )q AΩ by multiplying alternately with A and A∗
        3 Construct a matrix Q whose columns form an orthonormal basis for the

       
 B:
        range of Y .

       Stage
            Form B = Q∗ A.
       
        1
        2   Compute an SVD of the small matrix: B = U ΣV ∗ .
        3   Set U = QU .




                                                                                                 35
    singular spectrum of the data matrix often decays quite slowly. To address thisCornell CS
                                                 David Gleich · Purdue
             diffi-
Questions?
                                       
                     Most recent code at
      http://github.com/arbenson/mrtsqr




                                       36
David Gleich · Purdue
   Cornell CS

More Related Content

What's hot

study Streaming Multigrid For Gradient Domain Operations On Large Images
study Streaming Multigrid For Gradient Domain Operations On Large Imagesstudy Streaming Multigrid For Gradient Domain Operations On Large Images
study Streaming Multigrid For Gradient Domain Operations On Large ImagesChiamin Hsu
 
Geolocation techniques
Geolocation techniquesGeolocation techniques
Geolocation techniquesSpringer
 
Cutting Edge Predictive Modeling For Classification
Cutting Edge Predictive Modeling For ClassificationCutting Edge Predictive Modeling For Classification
Cutting Edge Predictive Modeling For ClassificationPankaj Sharma
 
Coherent feedback formulation of a continuous quantum error correction protocol
Coherent feedback formulation of a continuous quantum error correction protocolCoherent feedback formulation of a continuous quantum error correction protocol
Coherent feedback formulation of a continuous quantum error correction protocolhendrai
 
A charge recycling three phase dual rail pre charge logic based flip-flop
A charge recycling three phase dual rail pre charge logic based flip-flopA charge recycling three phase dual rail pre charge logic based flip-flop
A charge recycling three phase dual rail pre charge logic based flip-flopVLSICS Design
 
A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...
A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...
A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...vijayakrishna rowthu
 
icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Izukun
 
icml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part IIicml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part IIzukun
 
Kernel for Chordal Vertex Deletion
Kernel for Chordal Vertex DeletionKernel for Chordal Vertex Deletion
Kernel for Chordal Vertex DeletionAkankshaAgrawal55
 

What's hot (12)

study Streaming Multigrid For Gradient Domain Operations On Large Images
study Streaming Multigrid For Gradient Domain Operations On Large Imagesstudy Streaming Multigrid For Gradient Domain Operations On Large Images
study Streaming Multigrid For Gradient Domain Operations On Large Images
 
Geolocation techniques
Geolocation techniquesGeolocation techniques
Geolocation techniques
 
Ba36317323
Ba36317323Ba36317323
Ba36317323
 
Cutting Edge Predictive Modeling For Classification
Cutting Edge Predictive Modeling For ClassificationCutting Edge Predictive Modeling For Classification
Cutting Edge Predictive Modeling For Classification
 
PIMRC 2012
PIMRC 2012PIMRC 2012
PIMRC 2012
 
Coherent feedback formulation of a continuous quantum error correction protocol
Coherent feedback formulation of a continuous quantum error correction protocolCoherent feedback formulation of a continuous quantum error correction protocol
Coherent feedback formulation of a continuous quantum error correction protocol
 
A charge recycling three phase dual rail pre charge logic based flip-flop
A charge recycling three phase dual rail pre charge logic based flip-flopA charge recycling three phase dual rail pre charge logic based flip-flop
A charge recycling three phase dual rail pre charge logic based flip-flop
 
A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...
A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...
A Unified PDE model for image multi-phase segmentation and grey-scale inpaint...
 
Knn solution
Knn solutionKnn solution
Knn solution
 
icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part I
 
icml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part IIicml2004 tutorial on spectral clustering part II
icml2004 tutorial on spectral clustering part II
 
Kernel for Chordal Vertex Deletion
Kernel for Chordal Vertex DeletionKernel for Chordal Vertex Deletion
Kernel for Chordal Vertex Deletion
 

Viewers also liked

Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveDavid Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignmentDavid Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...David Gleich
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignmentDavid Gleich
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...David Gleich
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsDavid Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...David Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationDavid Gleich
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...David Gleich
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisDavid Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 

Viewers also liked (20)

Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 
A history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspectiveA history of PageRank from the numerical computing perspective
A history of PageRank from the numerical computing perspective
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Iterative methods for network alignment
Iterative methods for network alignmentIterative methods for network alignment
Iterative methods for network alignment
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
Anti-differentiating approximation algorithms: A case study with min-cuts, sp...
 
A multithreaded method for network alignment
A multithreaded method for network alignmentA multithreaded method for network alignment
A multithreaded method for network alignment
 
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
 
The power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulantsThe power and Arnoldi methods in an algebra of circulants
The power and Arnoldi methods in an algebra of circulants
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
A dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportationA dynamical system for PageRank with time-dependent teleportation
A dynamical system for PageRank with time-dependent teleportation
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
 
MapReduce for scientific simulation analysis
MapReduce for scientific simulation analysisMapReduce for scientific simulation analysis
MapReduce for scientific simulation analysis
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 

Similar to Direct tall-and-skinny QR factorizations in MapReduce architectures

FR1.L09 - PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...
FR1.L09 -	PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...FR1.L09 -	PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...
FR1.L09 - PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...grssieee
 
Simulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsSimulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...
CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...
CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...Xiaoyu Shi
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slidesSara Asher
 
Oxidised cosmic acceleration
Oxidised cosmic accelerationOxidised cosmic acceleration
Oxidised cosmic accelerationdhwesley
 

Similar to Direct tall-and-skinny QR factorizations in MapReduce architectures (9)

FR1.L09 - PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...
FR1.L09 -	PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...FR1.L09 -	PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...
FR1.L09 - PREDICTIVE QUANTIZATION OF DECHIRPED SPOTLIGHT-MODE SAR RAW DATA IN...
 
Simulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsSimulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific Datasets
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Surveys
SurveysSurveys
Surveys
 
Calculus 11.2
Calculus 11.2Calculus 11.2
Calculus 11.2
 
CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...
CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...
CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...
 
Shuronr
ShuronrShuronr
Shuronr
 
Svm map reduce_slides
Svm map reduce_slidesSvm map reduce_slides
Svm map reduce_slides
 
Oxidised cosmic acceleration
Oxidised cosmic accelerationOxidised cosmic acceleration
Oxidised cosmic acceleration
 

More from David Gleich

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksDavid Gleich
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduceDavid Gleich
 
Matrix methods for Hadoop
Matrix methods for HadoopMatrix methods for Hadoop
Matrix methods for HadoopDavid Gleich
 

More from David Gleich (13)

Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
Correlation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networksCorrelation clustering and community detection in graphs and networks
Correlation clustering and community detection in graphs and networks
 
Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
Matrix methods for Hadoop
Matrix methods for HadoopMatrix methods for Hadoop
Matrix methods for Hadoop
 

Direct tall-and-skinny QR factorizations in MapReduce architectures

  • 1. Tall-and-Skinny QR Factorizations in MapReduce PAUL G. CONSTANTINE AUSTIN BENSON ! JOE NICHOLS ! STANFORD UNIVERSITY DAVID F. GLEICH ! JAMES DEMMEL ! PURDUE UNIVERSITY UC BERKELEY COMPUTER SCIENCE ! JOE RUTHRUFF ! DEPARTMENT JEREMY TEMPLETON ! SANDIA 1 David Gleich · Purdue Cornell CS
  • 2. Questions? Most recent code at http://github.com/arbenson/mrtsqr 2 David Gleich · Purdue Cornell CS
  • 3.    the solution of QR is block nor Quick review of QR QR Factorization    is    orthogonal (   ) “normalize” a v usually genera computing    in    is    upper triangular. Let    , real Using QR for regression    is given by    the solution of    0 A QR is = Q block normalization    is    orthogonal (   ) “normalize” a vector R usually generalizes to computing    in the QR    is    upper David Gleich (Sandia) triangular. MapReduce 2011 0 A = Q R 3 David Gleich (Sandia) MapReduce 2011 David Gleich · Purdue Cornell4/22 CS
  • 4. Tall-and-Skinny A matrices (m ≫ n) 4 David Gleich · Purdue Cornell CS
  • 5. Tall-and-Skinny matrices ! (m ≫ n) arise in regression with many samples block iterative methods panel factorizations model reduction problems! A general linear models " with many samples tall-and-skinny SVD/PCA From tinyimages" All of these applications ! collection need a QR factorization of ! a tall-and-skinny matrix.! some only need  R ! 5 David Gleich · Purdue Cornell CS
  • 6. The Database Input " Time history" s1 -> f1 Parameters of simulation s2 -> f2 s f" ~100GB sk -> fk 2 3 A single simulation The simulation as a vector q(x1 , t1 , s) 6 . . 7 at one time step 6 . 7 6 7 6q(xn , t1 , s)7 6 7 6q(x1 , t2 , s)7 6 7 ⇥ ⇤ f(s) = 6 . 7 6 6 . . 7 7 X = f(s1 ) f(s2 ) ... f(sp ) 6q(xn , t2 , s)7 6 7 6 . 7 The database as a very" 4 . . 5 tall-and-skinny matrix q(xn , tk , s) 6 David Gleich · Purdue Cornell CS
  • 7. Dynamic Mode Decomposition One simulation, ~10TB of data, compute the SVD of a space-by-time matrix. DMD video 7 David Gleich · Purdue Cornell CS
  • 8. MapReduce It’s a computational model " and a framework. 8 David Gleich · Purdue Cornell CS
  • 9. MapReduce 9 David Gleich · Purdue Cornell CS
  • 10. The MapReduce Framework Originated at Google for indexing web Data scalable pages and computing PageRank. Maps M M 1 2 1 M 2 M Reduce M M R 3 4 M Express algorithms in " 3 R 4 M M data-local operations. 5 M Shuffle 5 Implement one type of Fault-tolerance by design communication: shuffle. Input stored in triplicate Reduce input/" M Shuffle moves all data with M output on disk R the same key to the same M R M reducer. Map output" persisted to disk" 10 before shuffle David Gleich · Purdue Cornell CS
  • 11. Computing variance in MapReduce Run 1 Run 2 Run 3 T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3 11 David Gleich · Purdue Cornell CS
  • 12. Mesh point variance in MapReduce Run 1 Run 2 Run 3 T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3 M M M 1. Each mapper out- 2. Shuffle moves all puts the mesh points values from the same with the same key. mesh point to the R R same reducer. 3. Reducers just compute a numerical variance. 12 David Gleich · Purdue Cornell CS
  • 13. MapReduce vs. Hadoop. MapReduce! Hadoop! A computation An implementation model with:" of MapReduce Map a local data using the HDFS transform" parallel file-system. Shuffle a grouping Others ! function " Pheonix++, Twisted, Reduce an Google MapReduce, aggregation spark … 13 David Gleich · Purdue Cornell CS
  • 14. Current state of the art for MapReduce QR MapReduce is often used to compute the principal components of large datasets. These approaches all form the normal equations T A A and work with it. 14 David Gleich · Purdue Cornell CS
  • 15. MapReduce is great for TSQR!! You don’t need  ATA Data A tall and skinny (TS) matrix by rows Input 500,000,000-by-50 matrix" Each record 1-by-50 row" HDFS Size 183.6 GB Time to compute read A 253 sec. write A 848 sec.! Time to compute  R in qr(A) 526 sec. w/ Q=AR-1 1618 sec. " Time to compute Q in qr(A) 3090 sec. (numerically stable)! 15/22 David Gleich · Purdue Cornell CS
  • 16. Tall-and-Skinny QR 16 David Gleich · Purdue Cornell CS
  • 17. Communication avoiding QR Communication avoiding TSQR (Demmel et al. 2008) First, do QR Second, compute factorizations a QR factorization of each local of the new “R” matrix    17 Demmel et al.David Communicating avoiding parallel and sequential QR. 2008. Gleich · Purdue Cornell CS
  • 18. Serial QR factorizations! Fully serialet al. 2008) (Demmel TSQR Compute QR of    , read    , update QR, … 18 Demmel et al. 2008. Communicating avoiding parallel Cornell CS QR. David Gleich · Purdue and sequential
  • 19. Tall-and-skinny matrix MapReduce matrix storage storage in MapReduce    A1 Key is an arbitrary row-id Value is the    array for A2 a row. A3 Each submatrix    is an input split. A4 You can also store multiple rows together. It goes a little faster. 19 David Gleich · Purdue Cornell CS David Gleich (Sandia) MapReduce 2011 10/2
  • 20. Algorithm Data Rows of a matrix A1 A1 Map QR factorization of rows A2 qr Reduce QR factorization of rows A2 Q2 R2 Mapper 1 qr Serial TSQR A3 A3 Q3 R3 A4 qr emit A4 Q4 R4 A5 A5 qr A6 A6 Q6 R6 Mapper 2 qr Serial TSQR A7 A7 Q7 R7 A8 qr emit A8 Q8 R8 R4 R4 Reducer 1 Serial TSQR qr emit R8 R8 Q R 20 David Gleich · Purdue Cornell CS
  • 21. Key Limitation Computes only R and not Q Can get Q via Q = AR-1 with another MR iteration. Numerical stability: dubious T kQ Q Ik is large although iterative refinement helps. 21 David Gleich · Purdue Cornell CS
  • 22. Achieving numerical stability norm ( QTQ – I ) AR-1 AR + " -1 ent iterative refinem Direct TSQR 105 1020 Condition number 22 David Gleich · Purdue Cornell CS
  • 23. Why MapReduce? 23 David Gleich · Purdue Cornell CS
  • 24. In hadoopy Full code in hadoopy import random, numpy, hadoopy def close(self): class SerialTSQR: self.compress() def __init__(self,blocksize,isreducer): for row in self.data: key = random.randint(0,2000000000) self.bsize=blocksize yield key, row self.data = [] if isreducer: self.__call__ = self.reducer def mapper(self,key,value): else: self.__call__ = self.mapper self.collect(key,value) def reducer(self,key,values): def compress(self): for value in values: self.mapper(key,value) R = numpy.linalg.qr( numpy.array(self.data),'r') if __name__=='__main__': # reset data and re-initialize to R mapper = SerialTSQR(blocksize=3,isreducer=False) self.data = [] reducer = SerialTSQR(blocksize=3,isreducer=True) for row in R: hadoopy.run(mapper, reducer) self.data.append([float(v) for v in row]) def collect(self,key,value): self.data.append(value) if len(self.data)>self.bsize*len(self.data[0]): self.compress() 24 David Gleich (Sandia) MapReduceDavid 2011 Gleich · Purdue Cornell CS 13/22
  • 25. Fault injection 200 Faults (200M by 200) Time to completion (sec) With 1/5 tasks failing, No faults (200M by 200) the job only takes twice 100 Faults (800M by 10) as long. No faults " (800M by 10) 10 100 1000 1/Prob(failure) – mean number of success per failure 25 David Gleich · Purdue Cornell CS
  • 26. How to get Q? 26 David Gleich · Purdue Cornell CS
  • 27. Idea 1 (unstable) Mapper 1 R-1 A1 Q1 R-1 A2 Q2 R TSQR R-1 A3 Q3 Dist ribu R-1 te A4 Q4 R 27 David Gleich · Purdue Cornell CS
  • 28. There’s a famous quote that “two iterations Idea 2 (better) of iterative refinement are enough” attributed to Parlett Mapper 1 Mapper 2 R-1 T-1 A1 Q1 Q1 Q1 R-1 T-1 A2 Q2 Q2 Q2 R T TSQR TSQR R-1 T-1 Dist A3 Q3 Q3 Q3 Dist ribu ribu R-1 T-1 te te A4 Q4 Q4 A4 Q4 R T 28 David Gleich · Purdue Cornell CS
  • 29. Communication avoiding QR Communication avoiding TSQR (Demmel et al. 2008) First, do QR Second, compute factorizations a QR factorization of each local of the new “R” matrix    29 Demmel et al.David Communicating avoiding parallel and sequential QR. 2008. Gleich · Purdue Cornell CS
  • 30. Idea 3 (best!) 3. Distribute the pieces of Q*1 and form the true Q Mapper 1 Mapper 3 Task 2 R1 Q11 A1 Q1 R1 Q11 R Q1 Q1 R2 Q21 Q output R output R2 R3 Q31 Q21 A2 Q2 Q2 Q2 R4 Q41 R3 Q31 2. Collect R on one A3 Q3 Q3 Q3 node, compute Qs for each piece R4 Q41 A4 Q4 Q4 Q4 1. Output local Q and R in separate files 30 David Gleich · Purdue Cornell CS
  • 31. The price is right! 2500 Full TSQR is faster than refinement for … and not any seconds few columns slower for many columns. 500 31 David Gleich · Purdue Cornell CS
  • 32. What can we do now? 32 David Gleich · Purdue Cornell CS
  • 33. PCA of 80,000,000! images First 16 columns of V as images 1000 pixels R    V SVD (principal TSQR components) 80,000,000 images Top 100 A X singular values Zero" mean" rows 33/22 MapReduce Post Processing David Gleich · Purdue Cornell CS
  • 34. A Large Scale Example Nonlinear heat transfer model 80k nodes, 300 time-steps 104 basis runs SVD of 24m x 104 data matrix 500x reduction in wall clock time (100x including the SVD) 34 David Gleich · Purdue ICASSP
  • 35. What’s next? Investigate randomized algorithms for computing SVDs for fatter matrices. Halko, 9 RANDOMIZED ALGORITHMS FOR MATRIX APPROXIMATION Algorithm: Randomized PCA Martinsson, Tropp. an q, this procedure computes an kapproximate rank-2k factorization Given exponent m × n matrix A, the number of principal components, and an SIREV 2011 U ΣV ∗ . The columns of U estimate the first 2k principal components of A. A: Stage Generate an n × 2k Gaussian test matrix Ω. 1 2 Form Y = (AA∗ )q AΩ by multiplying alternately with A and A∗ 3 Construct a matrix Q whose columns form an orthonormal basis for the B: range of Y . Stage Form B = Q∗ A. 1 2 Compute an SVD of the small matrix: B = U ΣV ∗ . 3 Set U = QU . 35 singular spectrum of the data matrix often decays quite slowly. To address thisCornell CS David Gleich · Purdue diffi-
  • 36. Questions? Most recent code at http://github.com/arbenson/mrtsqr 36 David Gleich · Purdue Cornell CS