1) Randomized numerical linear algebra (RandNLA) algorithms can be used to solve large-scale least-squares problems by computing a randomized sketch of the design matrix in two steps and then obtaining approximate solutions.
2) The document implements and evaluates these RandNLA algorithms in Apache Spark on datasets up to terabytes in size, finding that Spark is well-suited due to the algorithms' parallelism and Spark's ability to cache data in memory.
3) The evaluation compares the performance of low-precision solvers that directly use the sketch and high-precision solvers that employ the sketch as a preconditioner, finding that both approaches can efficiently solve least-squares problems on large datasets.
RandNLA Matrix Implementing Randomized Matrix Algorithms in Spark
1. Implementing Randomized Matrix Algorithms in Spark
XLDB 2015
Jiyan Yang1
, Jey Kottalam2
, Mohitdeep Singh3
, Oliver R¨ubel4
, Curt Fischer4
, Benjamin P. Bowen4
, Michael W. Mahoney2
, Prabhat4
1 Stanford University 2 University of California at Berkeley 3 Georgia Institute of Technology 4 Lawrence Berkeley National Laboratory
RandNLA for Least-Squares Problems
Randomized Numerical Linear Algebra (RandNLA) algorithms can be used to
solve large-scale least-squares problems. They consist of two steps. First, com-
pute a randomized sketch for the design matrix. A sketch can be viewed as a
compressed representation of the linear system. In this work, we evaluate the
following six different kinds of sketch matrices:
• Via random projection with 4 different underlying projection methods (Sparse,
Gaussian, Rademacher, SRDHT).
• Via random sampling according to approximate leverage scores or uniform
distribution.
Second, one can then use the sketch in one of the following two ways to obtain
low-precision approximate solutions or high-precision approximation solutions to
the original problem, respectively.
• Low-precision solvers: solve the subproblem induced by the sketch.
• High-precision solvers: use the sketch to construct a preconditioner and then
invoke iterative algorithms such as LSQR.
In this work, we implement these algorithms in Spark on dataset with size up
to terabytes. All the experiments are performed on a cluster with 16 nodes,
each of which has 8 CPU cores at clock rate 2.5GHz with 25GB RAM. The
following two reasons make Spark favorable in this task. First, algorithms for
computing the sketch are essentially parallel. It is straightforward to implement
in distributed and parallel environments. Second, being able to cache the data
in memory makes it fast to execute iterative algorithms.
Overall Performance of Low-precision Solvers
Here, we evaluate the low-precision solvers on the following two types of datasets:
• UB (matrices with uniform leverage scores and bad condition number);
• NB (matrices with nonuniform leverage scores and bad condition number).
Recall that, low-precision solvers obtain the solution by solving the subproblem
after computing the sketch.
103
104
sketch size
10-2
10-1
100
101
102
103
(a) x − x∗
2/ x∗
2
103
104
sketch size
10-3
10-2
10-1
100
101
PROJ CW
PROJ GAUSSIAN
PROJ RADEMACHER
PROJ SRDHT
SAMP APPR
SAMP UNIF
(b) |f − f∗
|/|f∗
|
103
104
sketch size
102
103
104
(c) Running time(sec)
1e7 × 1000 UB matrix
103
104
105
sketch size
10-2
10-1
100
101
102
103
(d) x − x∗
2/ x∗
2
103
104
105
sketch size
10-3
10-2
10-1
100
101
PROJ CW
PROJ GAUSSIAN
PROJ RADEMACHER
PROJ SRDHT
SAMP APPR
SAMP UNIF
(e) |f − f∗
|/|f∗
|
103
104
105
sketch size
102
103
104
(f) Running time(sec)
1e7 × 1000 NB matrix
Figure 1: Evaluation of low-precision solvers on 2 different types of matrices of
size 1e7 by 1000.
Performance of Low-precision Solvers When n or d Changes
We evaluate the performance of the low-precision solvers on NB matrices with
changing n or d. For a matrix with size n by d, it is generated by stacking
an NB matrix with size 2.5e5 by d vertically REPNUM = n/2.5e5 times. By
this way, the condition number remains the same. The coherence the matrix is
1/REPNUM.
106
107
108
sketch size
10-3
10-2
10-1
100
101
102
103
(a) x − x∗
2/ x∗
2
106
107
108
n
10-3
10-2
10-1
100
101
PROJ CW
PROJ GAUSSIAN
PROJ RADEMACHER
PROJ SRDHT
SAMP APPR
(b) |f − f∗
|/|f∗
|
106
107
108
n
102
103
104
(c) Running time(sec)
n ∈ [2.5e5, 1e8], d = 1000, s = 5e4
101
102
103
d
10-2
10-1
100
101
102
(d) x − x∗
2/ x∗
2
101
102
103
d
10-4
10-3
10-2
10-1
100
101
102
103
PROJ CW
PROJ GAUSSIAN
PROJ RADEMACHER
PROJ SRDHT
SAMP APPR
(e) |f − f∗
|/|f∗
|
101
102
103
d
102
103
104
(f) Running time(sec)
d ∈ [10, 2000], n = 1e7, s = 5e4
Figure 2: Performance of low-precision solvers on NB matrices with varying n
and d. For each method, the sketch size is fixed to be 5e4.
Performance of High-precision Solvers
Recall that, alternatively, one can use the sketch to construct preconditioners
and invoke iterative algorithms such as LSQR to obtain high-precision solvers.
Here, we evaluate the performance of high-precision solvers with several under-
lying randomized sketches.
2 4 6 8 10 12
number of iteration
101
10-1
10-3
(a) x − x∗
2/ x∗
2
2 4 6 8 10 12
number of iteration
10-1
10-4
10-7
10-10
10-13
(b) |f − f∗
|/|f∗
|
2 4 6 8 10 12
number of iteration
0
2500
5000
7500
NOCO
PROJ CW
PROJ GAUSSIAN
SAMP APPR
(c) Running time(sec)
small sketch size
2 4 6 8 10 12
number of iteration
101
10-1
10-3
(d) x − x∗
2/ x∗
2
2 4 6 8 10 12
number of iteration
10-1
10-4
10-7
10-10
10-13
(e) |f − f∗
|/|f∗
|
2 4 6 8 10 12
number of iteration
0
2500
5000
7500
NOCO
PROJ CW
PROJ GAUSSIAN
SAMP APPR
(f) Running time(sec)
large sketch size
Figure 3: Evaluation of LSQR with randomized preconditioner on an NB matrix
with size 1e8 by 1000 and condition number 1e6. In above, by small sketch
size, we mean 5e3 for all the methods. By large sketch size, we mean 3e5 for
PROJ CW, 1e4 for PROJ GAUSSIAN and 5e4 for SAMP APPR.
Quality of Randomized Preconditioners
After computing a sketch ΠA, R−1 acts as preconditioner for A where R is the
factor from QR decomposition of ΠA. Here we evaluate κ(AR−1).
c PROJ CW PROJ GAUSSIAN PROJ RADEMACHER PROJ SRDHT SAMP APPR
5e2 1.08e8 2.17e3 1.42e3 1.19e2 1.21e2
1e3 1.1e6 5.7366 5.6006 7.1958 75.0290
5e3 5.5e5 1.9059 1.9017 1.9857 25.8725
1e4 5.1e5 1.5733 1.5656 1.6167 17.0679
5e4 1.8e5 1.2214 1.2197 1.2293 6.9109
1e5 1.1376 1.1505 1.1502 1.1502 4.7573
Table 1: Quality of preconditioning on an NB matrix with size 1e6 by 500 using
several kinds of embeddings.
RandNLA for CX Decomposition
Given an n × d matrix A, CX decomposition decomposes A into two matrices
C and X, where C is an n × c matrix that consists of c actual columns of A,
and X is a c × d matrix such that the residual error A − CX F is as small
as possible.
The algorithm is as follows. Given A and a rank parameter k, use RandNLA
Algorithms to approximate the leverage scores associated with rank k; The
matrix C can be constructed by sampling c columns from A based on the
leverage scores associated with k; Construct X based on C.
Applications to Mass Spectrometry Imaging Analysis
Next, we show results on a real dataset. The data for this analysis is provided
by Norman Lewis’s group at Washington State Univeristy and is derived from
a MALDI-IMS-MSI analysis of a rhizome of the drug-producing green shrub
Podophyllum hexandrum.
The size of the dataset is approximately 2.5e5 by 1.6e7 (511-by-507 spatial pixels
by 460342 m/z channels by 200 ion mobility channels) which presents challenges
for analysis and interpretation. We invoked CX decomposition with RandNLA
to identify informative ions and pixels and investigate the reconstruction error
based on the selected elements in Spark using the same cluster.
453.1028(74) 615.1427(87) 342.175(62) 381.083(58)
(a) Ion-intensity visualization at 4 important m/z and τ peaks. The intensities are integrated over a small range
of m/z and τ values. The value shown in each image is m/z(τ).
303.0215(52) 303.0215(59)
(b) Ion-intensities visualization at m/z = 303.0215 and two
different τ peaks.
0 20 40 60 80 100
number of ions selected
0.15
0.25
0.35
0.45
reconstructionerror
greedy rank-c
randomized rank-c
greedy rank-15
best rank-15
(c) Reconstruction error A−CX F / A F of
CX decomposition.
Figure 4: Results of CX decomposition with RandNLA on real MSI dataset.