Low-rank matrix approximations in Python by Christian Thurau PyData 2014

Low-rank matrix approximations with Python
Christian Thurau

Table of Contents
1 Intro
2 The Basics
3 Matrix approximation
4 Some methods
5 Matrix Factorization with Python
6 Example & Conclusion
2

For Starters...
Observations
• Data matrix factorization has become an important tool in
information retrieval, data mining, and pattern recognition
• Nowadays, typical data matrices are HUGE
• Examples include:
• Gene expression data and microarrays
• Digital images
• Term by document matrices
• User ratings for movies, products, ...
• Graph adjacency matrices
3

Matrix Factorization
• given a matrix
V
• determine matrices
W and H
• such that
V = WH or V ≈ WH
• characteristics such as entries, shape, rank of V , W , and H will
depend on application context
4

The Basics
matrix factorization allows for:
• solving linear equations
• transforming data
• compressing data
matrix factorization facilitates subsequent processing in:
• information retrieval
• pattern recognition
• data mining
5

Low-rank Matrix Approximations
• Aapproximate V
V ≈ WH
• where
V ∈ Rm×n
W ∈ Rm×k
H ∈ Rk×n
• and
rank(W ) ≪ rank(V )
k ≪ min(m, n)
V
=
W H
6

Matrix Approximation
• If
V = WH
• then
vi,j = wi,∗h∗,j
=
k∑
x=1
wi,x hx,j
V
=
W H
7

Matrix Approximation
• More importantly:
v∗,j = Wh∗,j
=
k∑
x=1
w∗,x hx,j
• therefore
W ↔ ”basis” matrix
H ↔ coeﬃcient matrix
V
=
W H
= + +
8

On Matrix Factorization Methods
• matrix factorization ↔ data transformation
• matrix rank reduction ↔ data compression
• Common form: V = WH
• Broad range of methods:
• K-means clustering
• SVD/PCA
• Non-negative Matrix Factorization
• Archetypal Analysis
• Binary matrix factorization
• CUR decomposition
• ...
• Each method yields a unique view on data . . .
• . . . and is suited for diﬀerent tasks
9

K-means Clustering1
• Baseline clustering method
• Constrained quadradic optimization problem:
min
W ,H
∥V − WH∥2
s.t. H = [0; 1],
∑
k
hk,i = 1
• Find W , H using expectation maximization
• Optimal k-means partitioning is np-hard
• Goal: group similar data points
• Interesting: K-means clustering is matrix factorization
1
J.B. MacQueen, Some Methods for classiﬁcation and Analysis of Multivariate
Observations”. Berkeley Symposium on Mathematical Statistics and Probability. 1967
10

K-means Clustering is Matrix Factorization!







x1,1 x1,2 x1,3 . . . x1,n
x2,1 x2,2 x2,3 . . . x2,n
x3,1 x3,2 x3,3 . . . x3,n
..
.
..
.
..
.
...
..
.
xm,1 xm,2 xm,3 . . . xm,n














b1,1 b1,2 b1,3
b2,1 b2,2 b2,3
b3,1 b3,2 b2,3
..
.
..
.
..
.
bn,1 bn,2 bn,3









0 1 1 . . . 0
1 0 0 . . . 0
0 0 0 . . . 1


• i.e. for X ∈ Rm×n, and B ∈ Rn×3, and A ∈ R3×n as above, the
product
XBA = MA
realizes an assignment
xi → mj , where mj = Xbj
11

Example: K-means
≈ 0.0 + 0.0 . . . 1.0 . . . 0.0 =
• Similar images are grouped into k groups
• Approximate data by mapping each data point onto the mean of a
cluster regions
12

Python Matrix Factorization Toolbox (PyMF)2
• Started in 2010 at Fraunhofer IAIS/University of Bonn
• Vast number of diﬀerent methods!
• Supports hdf5/h5py and sparse matrices
How to factorize a data matrix V :
>>>import pymf
>>>import numpy as np
>>>data = np.array([[1.0, 0.0, 2.0], [0.0, 1.0, 1.0]])
>>>mdl = pymf.kmeans.Kmeans(data, num_bases=2)
>>>mdl.factorize(niter=10) # optimize for WH
>>>V_approx = np.dot(mdl.W, mdl.H) # V = WH
2
http://github.com/cthurau/pymf
13

Python Matrix Factorization Toolbox (PyMF)2
• Restarted development a few weeks back ;)
• Looking for contributors!
How to map data onto W :
>>>import pymf
>>>test_data = np.array([[1.0], [0.3]])
>>>mdl_test = pymf.kmeans.Kmeans(test_data, num_bases=2)
>>>mdl_test.W = mdl.W # mdl.W -> existing basis W
>>>mdl_test.factorize(compute_w=False)
>>>test_datx_approx = np.dot(mdl.W, mdl_test.H)
2
http://github.com/cthurau/pymf
14

PCA
Principal Component Analysis (PCA)3
• SVD/PCA are baseline matrix factorization methods
• Optimize:
min
W ,H
∥V − WH∥2
s.t. W T
W = I
• Restrict W to singular vectors of V (orthogonal matrix)
• Can (usually does) violate non-negativity
• Goal: best possible matrix approximation for a given k
• Great for compression or ﬁltering out noise!
3
K. Pearson, On Lines and Planes of Closest Fit to Systems of Points in Space,
Philosophical Magazine, 1901.
15

Example PCA
>>>from pymf.pca import PCA
>>>mdl = PCA(data, num_bases=2)
>>>mdl.factorize()
>>>V_approx = np.dot(mdl.W, mdl.H)
• Usage for data analysis questionable
• Basis vectors usually not interpretable
V
≈
Vapprox
W = . . .
16

Non-negative Matrix Factorization4
• For V ≥ 0 constrained quadradic optimization problem:
min
W ,H
∥V − WH∥2
s.t. W ≥ 0
H ≥ 0
• a globally optimal solution provably exists; algorithms guaranteed to
ﬁnd it remain elusive; exact NMF is NP hard
• Often W converges to partial representations
• Active area of research
• Goal: reconstruct data by independent parts
4
D.D. Lee and H.S. Seung, Learning the Parts of Objects by Non-Negative Matrix
Factorization, Nature, 401(6755), 1999
17

Example NMF
>>>from pymf.nmf import NMF
>>>mdl = NMF(data, num_bases=2, iter=50)
>>>mdl.factorize()
• Additive combination of parts
• Interesting options for data analysis
V
≈
Vapprox
W = . . .
18

Archetypal Analysis5
• Convexity constrained quadratic optmization problem:
min
W ,H
∥V − VWH∥2
s.t. wl,i ≥ 0,
∑
l
wl,i = 1
hk,i ≥ 0,
∑
k
hk,i = 1
• Reconstruct data by its archetypes, i.e. convex combinations of polar
opposites
• Yields novel and intuitive insights into data
• Great for interpretable data representations!
• O(n2), but: eﬃcient approximations for large data exist
5
A. Cutler and L. Breiman, Archetypal Analysis, in Technometrics 36(4), 1994
19

Example Archetypal Analysis
>>>from pymf.aa import AA
>>>mdl = AA(data, num_bases=2, iter=50)
>>>mdl.factorize()
• Existent data points as basis vectors
• Convex combination allows a
probablilist interpretation
V
≈
Vapprox
W = . . .
20

Method Summary
• Common form: V = WH (or V = VWH)
W constraint H constraint Outcome
PCA - - compressed V
K-means - H = [0; 1],
∑
k hk,i = 1 groups
NMF W ≥ 0 H ≥ 0 parts
AA W ≥ 0,
∑
l wl,i = 1 H ≥ 0,
∑
k hk,i = 1 opposites
• Doesn’t only work for images ;)
• More complex constraints usually result in more complex solvers
• Active area of research deals with approximations for large data
21

Large matrices: PyMF and h5py
>>> import h5py
>>> import numpy as np
>>> from pymf.sivm import SIVM # uses [6]
>>> file = h5py.File(’myfile.hdf5’, ’w’)
>>> file[’dataset’] = np.random.random((100,1000))
>>> file[’W’] = np.random.random((100,10))
>>> file[’H’] = np.random.random((10,1000))
>>> sivm_mdl = SIVM(file[’dataset’], num_bases=10)
>>> sivm_mdl.W = file[’W’]
>>> sivm_mdl.H = file[’H’]
>>> sivm_mdl.factorize()
6
Thurau, Kersting, and Bauckhage, ”Simplex volume maximization for descriptive
web scale matrix factorization”, CIKM’2010
22

Take Home Message
• Most clustering, and data analysis methods are matrix
approximations
• Imposed constraints shape the factorization
• Imposed constraints yield diﬀerent views on data
• One of the most eﬀective and versatile tools for data exploration!
• Python implementation → http://github.com/cthurau/pymf
24

Thank you for your attention!
christian.thurau@unbelievable-machine.com

Low-rank matrix approximations in Python by Christian Thurau PyData 2014

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Low-rank matrix approximations in Python by Christian Thurau PyData 2014

Similar to Low-rank matrix approximations in Python by Christian Thurau PyData 2014 (20)

More from PyData

More from PyData (20)

Recently uploaded

Recently uploaded (20)

Low-rank matrix approximations in Python by Christian Thurau PyData 2014