What you can do with a Tall-and-Skinny !
QR Factorization on Hadoop: !
Large regressions, Principal Components

Slides bit.ly/16LS8Vk
                                               @dgleich
Code github.com/dgleich/mrtsqr
                            dgleich@purdue.edu



DAVID F. GLEICH
ASSISTANT PROFESSOR !
COMPUTER SCIENCE !
PURDUE UNIVERSITY





                                                                              1
                                  David Gleich · Purdue
    bit.ly/16LS8Vk
Why you should stay …
you like advanced machine learning techniques

you want to understand how to compute the
singular values and vectors of a huge matrix
(that’s tall and skinny)

you want to learn about large-scale regression,
and principal components from a matrix
perspective




                                                                       2
                            David Gleich · Purdue
   bit.ly/16LS8Vk
What I’m going to assume
you know 


MapReduce

Python

Some simple matrix manipulation




                                                                     3
                          David Gleich · Purdue
   bit.ly/16LS8Vk
Tall-and-Skinny
    matrices
    
    (m ≫ n) 
    Many rows (like a billion)
A
    A few columns (under 10,000)

                regression and!
                general linear models!
                with many samples!                  From tinyimages"
                
                                                           collection
        Used in
 block iterative methods
                panel factorizations
                

                approximate kernel k-means 
                

                big-data SVD/PCA!




                                                                                   4
                                        David Gleich · Purdue
   bit.ly/16LS8Vk
If you have tons of small
records, then there is probably
a tall-and-skinny matrix
somwhere




                                                            5
                 David Gleich · Purdue
   bit.ly/16LS8Vk
Tall-and-skinny matrices are
common in BigData

A : m x n, m ≫ n
                                             A1

Key is an arbitrary row-id
                                                              A2
Value is the 1 x n array "
for a row
                                                              A3

Each submatrix Ai is an "
                                                              A4 
the input to a map task.




                                                                         6
                              David Gleich · Purdue
   bit.ly/16LS8Vk
PCA of 80,000,000!
         images
1000 pixels
                                                                              1


                                                                             0.8                                                           0




                                                                                                                    Fraction of variance
                                                      Fraction of variance
 80,000,000 images




                                                                             0.6                                                           0


                      A                                                      0.4                                                           0


                                                                             0.2                                                           0


                          First 16 columns of V as                            0
                                                                                   20      40     60     80   100
                                           images
                                   Principal Components



                                                      Figure 5: The 16 most impo
                                                      nent basis functions (by row




                                                                                                                7
  Constantine & Gleich, MapReduce 2011.
             David Gleich · Purdue
 bit.ly/16LS8Vk
via       the sum of red-pixel values in each image as a linear combi-
            nation of the gray values in each image. Formally, if ri is the
time
 and
             Regression with 80,000,000
            sum of the red components in all pixels of image i, and Gi,j
            is the gray value of the jth pixel in image i, then we wanted
per-
 ates
             images
      q        q
            to find min i (ri ≠ j Gi,j sj )2 . There is no particular im-
 (for       portance to this regression problem, we use it merely as a
            demonstration.
    1000 pixels
 on),
split       The coe cients sj are dis-
  file       played as an image to approx.
                 The goal was at the right.
d by        They reveal regionsthere was
                 how much red of the im-
 test       age in a picture fromimportant
                  that are not as the
     80,000,000 images




  the       in determining the overall red
                 value of the grayscale
 r in       component of an image. The
                 pixels only. 
         A color scale varies from light-
final
 size       blue (strongly measure of blue
                 We get a negative) to
pers        (0) howred (strongly positive).
                 and much “redness”
            The computation took 30 min-
                 each pixel contributes to
1000        utes using the Dumbo frame-
                 the whole.
            work and a two-iteration job with 250 intermediate reducers.
 h is
  the          We also solved a principal component problem to find a
hav-        principal component basis for each image. Let G be matrix
final        of Gi,j ’s from the regression andDavidui be the meanbit.ly/16LS8Vk
                                               let Gleich · Purdue
 of the ith




                                                                                   8
Let’s talk about QR!




                                                            9
                 David Gleich · Purdue
   bit.ly/16LS8Vk
QR Factorization and the
Gram Schmidt process

                                    Consider a set of vectors v1 to
                                    vn. Set u1 to be v1.
                                    
                                    Create a new vector u2 by
                                    removing any “component” of
                                    u1 from v2.
                                    
                                    Create a new vector u3 by
                                    removing any “component” of
                                    u1 and u2 from v3.
                                    
                                    …
    “Gram-Schmidt process” "




                                                                          10
    from Wikipedia
                               David Gleich · Purdue
   bit.ly/16LS8Vk
QR Factorization and the
Gram Schmidt process

                                   v1 = a1 u1
                                   v2 = b1 u1 + b2 u2
                                   v3 = c1 u1 + c2 u2 + c3 u3
   ⇥                       ⇤
       v1   v2    v3    ...
                               2                                3
                                  a1      b1       c1      ...
        ⇥                      ⇤6 0
                                6         b2       c2      ... 77
   = u1          u2    v3   ... 6 0       0        c3      ... 7
                                4                               5
                                   .
                                   .       .
                                           .        .
                                                    .      ..
                                   .       .        .         .




                                                                           11
                                David Gleich · Purdue
   bit.ly/16LS8Vk
QR Factorization and the
Gram Schmidt process

                              v1 = a1 u1
                              v2 = b1 u1 + b2 u2
                              v3 = c1 u1 + c2 u2 + c3 u3
 For this problem
                        V = UR                 All vectors in U
                                               are at right
                                               angles, i.e. they
What it’s usually"
written as by others
   A = QR                 are decoupled




                                                                       12
                            David Gleich · Purdue
   bit.ly/16LS8Vk
QR Factorization and the
Gram Schmidt process

              R
                       v1 = a1 u1
                       v2 = b1 u1 + b2 u2
                       v3 = c1 u1 + c2 u2 + c3 u3

 A   =
   Q                             All vectors in U
                                        are at right
                                        angles, i.e. they
                                        are decoupled




                                                                13
                     David Gleich · Purdue
   bit.ly/16LS8Vk
PCA of 80,000,000!
         images
                                                       First 16
                                                                      columns
                                                                        of V as
                                                                       images
1000 pixels
                                       R                       V
                                              SVD
                   (principal
                                      TSQR
                        components)
 80,000,000 images




                                                           Top 100
                      A           X                        singular
                                                           values
                          Zero"
                          mean"
                          rows



                          MapReduce                  Post Processing




                                                                                                                 14
  Constantine & Gleich, MapReduce 2010.
                              David Gleich · Purdue
   bit.ly/16LS8Vk
Input 500,000,000-by-100 matrix
Each record 1-by-100 row
HDFS Size 423.3 GB
Time to compute  colsum( A ) 161 sec.
Time to compute R in qr( A ) 387 sec.





                                                                       15
                            David Gleich · Purdue
   bit.ly/16LS8Vk
The rest of the talk!
     Full TSQR code in hadoopy
import random, numpy, hadoopy                       def close(self):
class SerialTSQR:                                    self.compress()
 def __init__(self,blocksize,isreducer):             for row in self.data:
   self.bsize=blocksize                                key = random.randint(0,2000000000)
   self.data = []                                      yield key, row
   if isreducer: self.__call__ = self.reducer
   else: self.__call__ = self.mapper                def mapper(self,key,value):
                                                     self.collect(key,value)
 def compress(self):
  R = numpy.linalg.qr(                              def reducer(self,key,values):
    numpy.array(self.data),'r')                      for value in values: self.mapper(key,value)
  # reset data and re-initialize to R
  self.data = []                                if __name__=='__main__':
  for row in R:                                   mapper = SerialTSQR(blocksize=3,isreducer=False)
   self.data.append([float(v) for v in row])      reducer = SerialTSQR(blocksize=3,isreducer=True)
                                                  hadoopy.run(mapper, reducer)
 def collect(self,key,value):
  self.data.append(value)
  if len(self.data)>self.bsize*len(self.data[0]):
    self.compress()




                                                                                                     16
                                                          David Gleich · Purdue
   bit.ly/16LS8Vk
Communication avoiding QR (Demmel et al. 2008) !
     on MapReduce (Constantine and Gleich, 2010)
                                             Algorithm
                                             Data Rows of a matrix
              A1   A1                        Map QR factorization of rows
                   A2
                        qr                   Reduce QR factorization of rows
              A2             Q2   R2
Mapper 1                                qr
Serial TSQR   A3                  A3          Q3   R3

                                                   A4   qr             emit
              A4                                             Q4   R4

              A5   A5
                        qr
              A6   A6        Q6   R6
Mapper 2                                qr
Serial TSQR   A7                  A7          Q7   R7

                                                   A8   qr             emit
              A8                                             Q8   R8


              R4   R4
Reducer 1
Serial TSQR             qr             emit
              R8   R8        Q    R




                                                                                                        17
                                                             David Gleich · Purdue
   bit.ly/16LS8Vk
The rest of the talk!
     Full TSQR code in hadoopy
import random, numpy, hadoopy                       def close(self):
class SerialTSQR:                                    self.compress()
 def __init__(self,blocksize,isreducer):             for row in self.data:
   self.bsize=blocksize                                key = random.randint(0,2000000000)
   self.data = []                                      yield key, row
   if isreducer: self.__call__ = self.reducer
   else: self.__call__ = self.mapper                def mapper(self,key,value):
                                                     self.collect(key,value)
 def compress(self):
  R = numpy.linalg.qr(                              def reducer(self,key,values):
    numpy.array(self.data),'r')                      for value in values: self.mapper(key,value)
  # reset data and re-initialize to R
  self.data = []                                if __name__=='__main__':
  for row in R:                                   mapper = SerialTSQR(blocksize=3,isreducer=False)
   self.data.append([float(v) for v in row])      reducer = SerialTSQR(blocksize=3,isreducer=True)
                                                  hadoopy.run(mapper, reducer)
 def collect(self,key,value):
  self.data.append(value)
  if len(self.data)>self.bsize*len(self.data[0]):
    self.compress()




                                                                                                     18
                                                          David Gleich · Purdue
   bit.ly/16LS8Vk
Too many maps cause too
much data to one reducer!

                Each image is 5k.
                Each HDFS block has "
                12,800 images.
                6,250 total blocks.
                Each map outputs "
                1000-by-1000 matrix
                One reducer gets a 6.25M-
                by-1000 matrix (50GB)




                                                                 19
                      David Gleich · Purdue
   bit.ly/16LS8Vk
map           emit                         reduce          emit                                      reduce        emit
                R1                                           R2,1                                                      R
    A1    Mapper 1-1
                                               S1   Reducer 1-1
                                                                                                       S(2)
                                                                                                       A2     Reducer 2-1
         Serial TSQR                                Serial TSQR                                               Serial TSQR




                                                                                          shuffle
                                                                           identity map
         map           emit                         reduce          emit
                R2                                           R2,2
    A2    Mapper 1-2                    S(1)   S
                                               A2   Reducer 1-2
                              shuffle

         Serial TSQR                                Serial TSQR

A
         map           emit                         reduce          emit
                R3                                           R2,3
    A3    Mapper 1-3
                                               S3
                                               A2   Reducer 1-3
         Serial TSQR                                Serial TSQR


         map           emit
                R4
    A3
     4    Mapper 1-4
         Serial TSQR




                                                                                                                             20
                       Iteration 1                                                                  Iteration 2
                                                        David Gleich · Purdue
                         bit.ly/16LS8Vk
Input 500,000,000-by-100 matrix
Each record 1-by-100 row
HDFS Size 423.3 GB
Time to compute  colsum( A ) 161 sec.
Time to compute R in qr( A ) 387 sec.





                                                                       21
                            David Gleich · Purdue
   bit.ly/16LS8Vk
Hadoop streaming isn’t
always slow!
Synthetic data test on 100,000,000-by-500 matrix (~500GB)
Codes implemented in MapReduce streaming
Matrix stored as TypedBytes lists of doubles
Python frameworks use Numpy+ATLAS matrix.
Custom C++ TypedBytes reader/writer with ATLAS matrix.

                Iter 1
          Iter 2
             Overall"
                 Total (secs.)
   Total (secs.)
      Total (secs.)
    Dumbo
       960
             217
                1177
    Hadoopy
     612
             118
                730
    C++!         350!             37!                 387!
    Java
        436
             66
                 502




                                                                                    22
                                         David Gleich · Purdue
   bit.ly/16LS8Vk
Use multiple iterations for
problems with many columns
                           Cols.
     Iters.
   Split"    Maps
     Secs.
                                                (MB)
Increasing split size
                           50
        1
        64
       8000
      388
improves performance
(accounts for Hadoop       –
         –
        256
      2000
      184

data movement)
            –
         –
        512
      1000
      149

Increasing iterations      –
         2
        64
       8000
      425
helps for problems with    –
         –
        256
      2000
      220
many columns.
             –
         –
        512
      1000
      191

(1000 columns with 64-     1000
      1
        512
      1000
      666
MB split size overloaded
                           –
         2
        64
       6000
      590
the single reducer.)
                           –
         –
        256
      2000
      432

                           –
         –
        512
      1000
      337




                                                                             23
                                 David Gleich · Purdue
   bit.ly/16LS8Vk
More about how to !
compute a regression

                                      2
          min kAx bk
                XX
                                                                2
          = min  (   Aij xj                            bi )
                        i        j
 A                                        b1
                            A1       A1
                                                 Q2        b2 = Q2T b1
                                            qr
                            A2       A2               R2
          Mapper 1                                         qr
          Serial TSQR       A3                        A3

     b
                     A4




                                                                                  24
                                     David Gleich · Purdue
     bit.ly/16LS8Vk
TSQR code in hadoopy for
    regressions
import random, numpy, hadoopy                       def close(self):
class SerialTSQR:                                    self.compress()
 def __init__(self,blocksize,isreducer):             for i,row in enumerate(self.data):
   […]                                                 key = random.randint(0,2000000000)
                                                       yield key, (row, self.rhs[i])
 def compress(self):
  Q,R = numpy.linalg.qr(                            def mapper(self,key,value):
    numpy.array(self.data), ‘full’)                  self.collect(key,unpack(value))
  # reset data and re-initialize to R
  self.data = []                                    def reducer(self,key,values):
  for row in R:                                      for value in values: self.mapper(key,
   self.data.append([float(v) for v in row])          unpack(value))
  self.rhs = list( numpy.dot(Q.T,
                    numpy.array(self.rhs) )     if __name__=='__main__':
                                                  mapper = SerialTSQR(blocksize=3,isreducer=False)
 def collect(self,key,valuerhs):                  reducer = SerialTSQR(blocksize=3,isreducer=True)
  self.data.append(valuerhs[0])                   hadoopy.run(mapper, reducer)
  self.rhs.append(valuerhs[1])
  if len(self.data)>self.bsize*len(self.data[0]):
    self.compress()




                                                                                                      25
                                                           David Gleich · Purdue
   bit.ly/16LS8Vk
More about how to !
compute a regression
                              min kAx         bk2
                               = min kQRx            bk2
                                   Orthogonal or “right angle” matrices"
                                   don’t change vector magnitude
                                          T                T     2
                    QT b
      = min kQ QRx               Q bk
 A             R               = min kRx          Q T bk2
         QR"
         for "                       This is a tiny linear system!
      Regression
                            def compute_x(output):!
                              R,y = load_from_hdfs(output)!
                              x = numpy.linalg.solve(R,y)!
                              write_output(x,output+’-x’)!
     b




                                                                            26
                                David Gleich · Purdue
    bit.ly/16LS8Vk
We do a similar step for the
PCA and compute the 1000-
by-1000 SVD on one machine




                                                          27
               David Gleich · Purdue
   bit.ly/16LS8Vk
Getting the matrix Q is tricky!




                                                             28
                  David Gleich · Purdue
   bit.ly/16LS8Vk
What about the matrix Q?

We want Q to be                                           Constantine & Gleich,
                                                          MapReduce 2011
numerically
orthogonal.
                             Prior work
                     norm ( QTQ – I )



                                         AR-1
A condition number
measures problem                                                               Benson, Gleich,
sensitivity.
                                                                  Demmel, Submitted 
                                                              AR + "
                                                                -1


                                                                  nt
         Direct TSQR
                                                           refineme
                                                 iterative                     Benson, Gleich, "
Prior methods all                                                              Demmel, Submitted
failed without any                           105
                                  1020
warning.
                                      Condition number




                                                                                                    29
                                                      David Gleich · Purdue
     bit.ly/16LS8Vk
Taking care of business by
keeping track of Q
                                        3. Distribute the
                                                           pieces of Q*1 and
                                                           form the true Q

       Mapper 1
                                                           Mapper 3
                                                 Task 2
                         R1
                                                   Q11
       A1
         Q1
                     R1
   Q11
 R
                   Q1
          Q1
                                           R2
    Q21




                                                               Q output
                               R output
                         R2
               R3
    Q31
                           Q21
       A2
         Q2
                                                     Q2
          Q2
                                           R4
    Q41
                         R3
                                                     Q31
                                    2. Collect R on one
       A3
         Q3
                                                     Q3
          Q3
                                    node, compute Qs
                                    for each piece
                         R4
                                                     Q41
       A4 
        Q4
                                                     Q4
          Q4

               1. Output local Q and
               R in separate files




                                                                                               30
                                             David Gleich · Purdue
          bit.ly/16LS8Vk
Code available from
github.com/arbenson/mrtsqr
…
it isn’t too bad.




                                                                31
                     David Gleich · Purdue
   bit.ly/16LS8Vk
Future work … more columns!

With ~3000 columns, one 64MB chunk is a local
QR computation. 

Could “iterate in blocks of 3000” columns to
continue … maybe “efficient” for 10,000 columns

Need different ideas for 100,000 columns
(randomized methods?)




                                                                      32
                           David Gleich · Purdue
   bit.ly/16LS8Vk
Questions?

www.cs.purdue.edu/~dgleich
@dgleich
dgleich@purdue.edu





                                                           33
                David Gleich · Purdue
   bit.ly/16LS8Vk

What you can do with a tall-and-skinny QR factorization in Hadoop: Principal components and large regressions

  • 1.
    What you cando with a Tall-and-Skinny ! QR Factorization on Hadoop: ! Large regressions, Principal Components Slides bit.ly/16LS8Vk @dgleich Code github.com/dgleich/mrtsqr dgleich@purdue.edu DAVID F. GLEICH ASSISTANT PROFESSOR ! COMPUTER SCIENCE ! PURDUE UNIVERSITY 1 David Gleich · Purdue bit.ly/16LS8Vk
  • 2.
    Why you shouldstay … you like advanced machine learning techniques you want to understand how to compute the singular values and vectors of a huge matrix (that’s tall and skinny) you want to learn about large-scale regression, and principal components from a matrix perspective 2 David Gleich · Purdue bit.ly/16LS8Vk
  • 3.
    What I’m goingto assume you know MapReduce Python Some simple matrix manipulation 3 David Gleich · Purdue bit.ly/16LS8Vk
  • 4.
    Tall-and-Skinny matrices (m ≫ n) Many rows (like a billion) A A few columns (under 10,000) regression and! general linear models! with many samples! From tinyimages" collection Used in block iterative methods panel factorizations approximate kernel k-means big-data SVD/PCA! 4 David Gleich · Purdue bit.ly/16LS8Vk
  • 5.
    If you havetons of small records, then there is probably a tall-and-skinny matrix somwhere 5 David Gleich · Purdue bit.ly/16LS8Vk
  • 6.
    Tall-and-skinny matrices are commonin BigData A : m x n, m ≫ n A1 Key is an arbitrary row-id A2 Value is the 1 x n array " for a row A3 Each submatrix Ai is an " A4 the input to a map task. 6 David Gleich · Purdue bit.ly/16LS8Vk
  • 7.
    PCA of 80,000,000! images 1000 pixels 1 0.8 0 Fraction of variance Fraction of variance 80,000,000 images 0.6 0 A 0.4 0 0.2 0 First 16 columns of V as 0 20 40 60 80 100 images Principal Components Figure 5: The 16 most impo nent basis functions (by row 7 Constantine & Gleich, MapReduce 2011. David Gleich · Purdue bit.ly/16LS8Vk
  • 8.
    via the sum of red-pixel values in each image as a linear combi- nation of the gray values in each image. Formally, if ri is the time and Regression with 80,000,000 sum of the red components in all pixels of image i, and Gi,j is the gray value of the jth pixel in image i, then we wanted per- ates images q q to find min i (ri ≠ j Gi,j sj )2 . There is no particular im- (for portance to this regression problem, we use it merely as a demonstration. 1000 pixels on), split The coe cients sj are dis- file played as an image to approx. The goal was at the right. d by They reveal regionsthere was how much red of the im- test age in a picture fromimportant that are not as the 80,000,000 images the in determining the overall red value of the grayscale r in component of an image. The pixels only. A color scale varies from light- final size blue (strongly measure of blue We get a negative) to pers (0) howred (strongly positive). and much “redness” The computation took 30 min- each pixel contributes to 1000 utes using the Dumbo frame- the whole. work and a two-iteration job with 250 intermediate reducers. h is the We also solved a principal component problem to find a hav- principal component basis for each image. Let G be matrix final of Gi,j ’s from the regression andDavidui be the meanbit.ly/16LS8Vk let Gleich · Purdue of the ith 8
  • 9.
    Let’s talk aboutQR! 9 David Gleich · Purdue bit.ly/16LS8Vk
  • 10.
    QR Factorization andthe Gram Schmidt process Consider a set of vectors v1 to vn. Set u1 to be v1. Create a new vector u2 by removing any “component” of u1 from v2. Create a new vector u3 by removing any “component” of u1 and u2 from v3. … “Gram-Schmidt process” " 10 from Wikipedia David Gleich · Purdue bit.ly/16LS8Vk
  • 11.
    QR Factorization andthe Gram Schmidt process v1 = a1 u1 v2 = b1 u1 + b2 u2 v3 = c1 u1 + c2 u2 + c3 u3 ⇥ ⇤ v1 v2 v3 ... 2 3 a1 b1 c1 ... ⇥ ⇤6 0 6 b2 c2 ... 77 = u1 u2 v3 ... 6 0 0 c3 ... 7 4 5 . . . . . . .. . . . . 11 David Gleich · Purdue bit.ly/16LS8Vk
  • 12.
    QR Factorization andthe Gram Schmidt process v1 = a1 u1 v2 = b1 u1 + b2 u2 v3 = c1 u1 + c2 u2 + c3 u3 For this problem V = UR All vectors in U are at right angles, i.e. they What it’s usually" written as by others A = QR are decoupled 12 David Gleich · Purdue bit.ly/16LS8Vk
  • 13.
    QR Factorization andthe Gram Schmidt process R v1 = a1 u1 v2 = b1 u1 + b2 u2 v3 = c1 u1 + c2 u2 + c3 u3 A = Q All vectors in U are at right angles, i.e. they are decoupled 13 David Gleich · Purdue bit.ly/16LS8Vk
  • 14.
    PCA of 80,000,000! images First 16 columns of V as images 1000 pixels R    V SVD (principal TSQR components) 80,000,000 images Top 100 A X singular values Zero" mean" rows MapReduce Post Processing 14 Constantine & Gleich, MapReduce 2010. David Gleich · Purdue bit.ly/16LS8Vk
  • 15.
    Input 500,000,000-by-100 matrix Eachrecord 1-by-100 row HDFS Size 423.3 GB Time to compute  colsum( A ) 161 sec. Time to compute R in qr( A ) 387 sec. 15 David Gleich · Purdue bit.ly/16LS8Vk
  • 16.
    The rest ofthe talk! Full TSQR code in hadoopy import random, numpy, hadoopy def close(self): class SerialTSQR: self.compress() def __init__(self,blocksize,isreducer): for row in self.data: self.bsize=blocksize key = random.randint(0,2000000000) self.data = [] yield key, row if isreducer: self.__call__ = self.reducer else: self.__call__ = self.mapper def mapper(self,key,value): self.collect(key,value) def compress(self): R = numpy.linalg.qr( def reducer(self,key,values): numpy.array(self.data),'r') for value in values: self.mapper(key,value) # reset data and re-initialize to R self.data = [] if __name__=='__main__': for row in R: mapper = SerialTSQR(blocksize=3,isreducer=False) self.data.append([float(v) for v in row]) reducer = SerialTSQR(blocksize=3,isreducer=True) hadoopy.run(mapper, reducer) def collect(self,key,value): self.data.append(value) if len(self.data)>self.bsize*len(self.data[0]): self.compress() 16 David Gleich · Purdue bit.ly/16LS8Vk
  • 17.
    Communication avoiding QR(Demmel et al. 2008) ! on MapReduce (Constantine and Gleich, 2010) Algorithm Data Rows of a matrix A1 A1 Map QR factorization of rows A2 qr Reduce QR factorization of rows A2 Q2 R2 Mapper 1 qr Serial TSQR A3 A3 Q3 R3 A4 qr emit A4 Q4 R4 A5 A5 qr A6 A6 Q6 R6 Mapper 2 qr Serial TSQR A7 A7 Q7 R7 A8 qr emit A8 Q8 R8 R4 R4 Reducer 1 Serial TSQR qr emit R8 R8 Q R 17 David Gleich · Purdue bit.ly/16LS8Vk
  • 18.
    The rest ofthe talk! Full TSQR code in hadoopy import random, numpy, hadoopy def close(self): class SerialTSQR: self.compress() def __init__(self,blocksize,isreducer): for row in self.data: self.bsize=blocksize key = random.randint(0,2000000000) self.data = [] yield key, row if isreducer: self.__call__ = self.reducer else: self.__call__ = self.mapper def mapper(self,key,value): self.collect(key,value) def compress(self): R = numpy.linalg.qr( def reducer(self,key,values): numpy.array(self.data),'r') for value in values: self.mapper(key,value) # reset data and re-initialize to R self.data = [] if __name__=='__main__': for row in R: mapper = SerialTSQR(blocksize=3,isreducer=False) self.data.append([float(v) for v in row]) reducer = SerialTSQR(blocksize=3,isreducer=True) hadoopy.run(mapper, reducer) def collect(self,key,value): self.data.append(value) if len(self.data)>self.bsize*len(self.data[0]): self.compress() 18 David Gleich · Purdue bit.ly/16LS8Vk
  • 19.
    Too many mapscause too much data to one reducer! Each image is 5k. Each HDFS block has " 12,800 images. 6,250 total blocks. Each map outputs " 1000-by-1000 matrix One reducer gets a 6.25M- by-1000 matrix (50GB) 19 David Gleich · Purdue bit.ly/16LS8Vk
  • 20.
    map emit reduce emit reduce emit R1 R2,1 R A1 Mapper 1-1 S1 Reducer 1-1 S(2) A2 Reducer 2-1 Serial TSQR Serial TSQR Serial TSQR shuffle identity map map emit reduce emit R2 R2,2 A2 Mapper 1-2 S(1) S A2 Reducer 1-2 shuffle Serial TSQR Serial TSQR A map emit reduce emit R3 R2,3 A3 Mapper 1-3 S3 A2 Reducer 1-3 Serial TSQR Serial TSQR map emit R4 A3 4 Mapper 1-4 Serial TSQR 20 Iteration 1 Iteration 2 David Gleich · Purdue bit.ly/16LS8Vk
  • 21.
    Input 500,000,000-by-100 matrix Eachrecord 1-by-100 row HDFS Size 423.3 GB Time to compute  colsum( A ) 161 sec. Time to compute R in qr( A ) 387 sec. 21 David Gleich · Purdue bit.ly/16LS8Vk
  • 22.
    Hadoop streaming isn’t alwaysslow! Synthetic data test on 100,000,000-by-500 matrix (~500GB) Codes implemented in MapReduce streaming Matrix stored as TypedBytes lists of doubles Python frameworks use Numpy+ATLAS matrix. Custom C++ TypedBytes reader/writer with ATLAS matrix. Iter 1 Iter 2 Overall" Total (secs.) Total (secs.) Total (secs.) Dumbo 960 217 1177 Hadoopy 612 118 730 C++! 350! 37! 387! Java 436 66 502 22 David Gleich · Purdue bit.ly/16LS8Vk
  • 23.
    Use multiple iterationsfor problems with many columns Cols. Iters. Split" Maps Secs. (MB) Increasing split size 50 1 64 8000 388 improves performance (accounts for Hadoop – – 256 2000 184 data movement) – – 512 1000 149 Increasing iterations – 2 64 8000 425 helps for problems with – – 256 2000 220 many columns. – – 512 1000 191 (1000 columns with 64- 1000 1 512 1000 666 MB split size overloaded – 2 64 6000 590 the single reducer.) – – 256 2000 432 – – 512 1000 337 23 David Gleich · Purdue bit.ly/16LS8Vk
  • 24.
    More about howto ! compute a regression 2 min kAx bk XX 2 = min ( Aij xj bi ) i j A b1 A1 A1 Q2 b2 = Q2T b1 qr A2 A2 R2 Mapper 1 qr Serial TSQR A3 A3 b A4 24 David Gleich · Purdue bit.ly/16LS8Vk
  • 25.
    TSQR code inhadoopy for regressions import random, numpy, hadoopy def close(self): class SerialTSQR: self.compress() def __init__(self,blocksize,isreducer): for i,row in enumerate(self.data): […] key = random.randint(0,2000000000) yield key, (row, self.rhs[i]) def compress(self): Q,R = numpy.linalg.qr( def mapper(self,key,value): numpy.array(self.data), ‘full’) self.collect(key,unpack(value)) # reset data and re-initialize to R self.data = [] def reducer(self,key,values): for row in R: for value in values: self.mapper(key, self.data.append([float(v) for v in row]) unpack(value)) self.rhs = list( numpy.dot(Q.T, numpy.array(self.rhs) ) if __name__=='__main__': mapper = SerialTSQR(blocksize=3,isreducer=False) def collect(self,key,valuerhs): reducer = SerialTSQR(blocksize=3,isreducer=True) self.data.append(valuerhs[0]) hadoopy.run(mapper, reducer) self.rhs.append(valuerhs[1]) if len(self.data)>self.bsize*len(self.data[0]): self.compress() 25 David Gleich · Purdue bit.ly/16LS8Vk
  • 26.
    More about howto ! compute a regression min kAx bk2 = min kQRx bk2 Orthogonal or “right angle” matrices" don’t change vector magnitude T T 2 QT b = min kQ QRx Q bk A R = min kRx Q T bk2 QR" for " This is a tiny linear system! Regression def compute_x(output):! R,y = load_from_hdfs(output)! x = numpy.linalg.solve(R,y)! write_output(x,output+’-x’)! b 26 David Gleich · Purdue bit.ly/16LS8Vk
  • 27.
    We do asimilar step for the PCA and compute the 1000- by-1000 SVD on one machine 27 David Gleich · Purdue bit.ly/16LS8Vk
  • 28.
    Getting the matrixQ is tricky! 28 David Gleich · Purdue bit.ly/16LS8Vk
  • 29.
    What about thematrix Q? We want Q to be Constantine & Gleich, MapReduce 2011 numerically orthogonal. Prior work norm ( QTQ – I ) AR-1 A condition number measures problem Benson, Gleich, sensitivity. Demmel, Submitted AR + " -1 nt Direct TSQR refineme iterative Benson, Gleich, " Prior methods all Demmel, Submitted failed without any 105 1020 warning. Condition number 29 David Gleich · Purdue bit.ly/16LS8Vk
  • 30.
    Taking care ofbusiness by keeping track of Q 3. Distribute the pieces of Q*1 and form the true Q Mapper 1 Mapper 3 Task 2 R1 Q11 A1 Q1 R1 Q11 R Q1 Q1 R2 Q21 Q output R output R2 R3 Q31 Q21 A2 Q2 Q2 Q2 R4 Q41 R3 Q31 2. Collect R on one A3 Q3 Q3 Q3 node, compute Qs for each piece R4 Q41 A4 Q4 Q4 Q4 1. Output local Q and R in separate files 30 David Gleich · Purdue bit.ly/16LS8Vk
  • 31.
    Code available from github.com/arbenson/mrtsqr … itisn’t too bad. 31 David Gleich · Purdue bit.ly/16LS8Vk
  • 32.
    Future work …more columns! With ~3000 columns, one 64MB chunk is a local QR computation. Could “iterate in blocks of 3000” columns to continue … maybe “efficient” for 10,000 columns Need different ideas for 100,000 columns (randomized methods?) 32 David Gleich · Purdue bit.ly/16LS8Vk
  • 33.

Editor's Notes

  • #8 I think this took 30 minutes using our slowest codes. Our fastest codes should take it down to about 3-4 minutes. You’ll probably wait longer to get your job scheduled.
  • #14 I think this took 30 minutes using our slowest codes. Our fastest codes should take it down to about 3-4 minutes. You’ll probably wait longer to get your job scheduled.