SlideShare a Scribd company logo
1 of 25
Download to read offline
Low-rank matrix approximations with Python
Christian Thurau
Table of Contents
1 Intro
2 The Basics
3 Matrix approximation
4 Some methods
5 Matrix Factorization with Python
6 Example & Conclusion
2
For Starters...
Observations
• Data matrix factorization has become an important tool in
information retrieval, data mining, and pattern recognition
• Nowadays, typical data matrices are HUGE
• Examples include:
• Gene expression data and microarrays
• Digital images
• Term by document matrices
• User ratings for movies, products, ...
• Graph adjacency matrices
3
Matrix Factorization
• given a matrix
V
• determine matrices
W and H
• such that
V = WH or V ≈ WH
• characteristics such as entries, shape, rank of V , W , and H will
depend on application context
4
The Basics
matrix factorization allows for:
• solving linear equations
• transforming data
• compressing data
matrix factorization facilitates subsequent processing in:
• information retrieval
• pattern recognition
• data mining
5
Low-rank Matrix Approximations
• Aapproximate V
V ≈ WH
• where
V ∈ Rm×n
W ∈ Rm×k
H ∈ Rk×n
• and
rank(W ) ≪ rank(V )
k ≪ min(m, n)
V
=
W H
6
Matrix Approximation
• If
V = WH
• then
vi,j = wi,∗h∗,j
=
k∑
x=1
wi,x hx,j
V
=
W H
7
Matrix Approximation
• More importantly:
v∗,j = Wh∗,j
=
k∑
x=1
w∗,x hx,j
• therefore
W ↔ ”basis” matrix
H ↔ coefficient matrix
V
=
W H
= + +
8
On Matrix Factorization Methods
• matrix factorization ↔ data transformation
• matrix rank reduction ↔ data compression
• Common form: V = WH
• Broad range of methods:
• K-means clustering
• SVD/PCA
• Non-negative Matrix Factorization
• Archetypal Analysis
• Binary matrix factorization
• CUR decomposition
• ...
• Each method yields a unique view on data . . .
• . . . and is suited for different tasks
9
K-means Clustering1
• Baseline clustering method
• Constrained quadradic optimization problem:
min
W ,H
∥V − WH∥2
s.t. H = [0; 1],
∑
k
hk,i = 1
• Find W , H using expectation maximization
• Optimal k-means partitioning is np-hard
• Goal: group similar data points
• Interesting: K-means clustering is matrix factorization
1
J.B. MacQueen, Some Methods for classification and Analysis of Multivariate
Observations”. Berkeley Symposium on Mathematical Statistics and Probability. 1967
10
K-means Clustering is Matrix Factorization!







x1,1 x1,2 x1,3 . . . x1,n
x2,1 x2,2 x2,3 . . . x2,n
x3,1 x3,2 x3,3 . . . x3,n
..
.
..
.
..
.
...
..
.
xm,1 xm,2 xm,3 . . . xm,n














b1,1 b1,2 b1,3
b2,1 b2,2 b2,3
b3,1 b3,2 b2,3
..
.
..
.
..
.
bn,1 bn,2 bn,3









0 1 1 . . . 0
1 0 0 . . . 0
0 0 0 . . . 1


• i.e. for X ∈ Rm×n, and B ∈ Rn×3, and A ∈ R3×n as above, the
product
XBA = MA
realizes an assignment
xi → mj , where mj = Xbj
11
Example: K-means
≈ 0.0 + 0.0 . . . 1.0 . . . 0.0 =
• Similar images are grouped into k groups
• Approximate data by mapping each data point onto the mean of a
cluster regions
12
Python Matrix Factorization Toolbox (PyMF)2
• Started in 2010 at Fraunhofer IAIS/University of Bonn
• Vast number of different methods!
• Supports hdf5/h5py and sparse matrices
How to factorize a data matrix V :
>>>import pymf
>>>import numpy as np
>>>data = np.array([[1.0, 0.0, 2.0], [0.0, 1.0, 1.0]])
>>>mdl = pymf.kmeans.Kmeans(data, num_bases=2)
>>>mdl.factorize(niter=10) # optimize for WH
>>>V_approx = np.dot(mdl.W, mdl.H) # V = WH
2
http://github.com/cthurau/pymf
13
Python Matrix Factorization Toolbox (PyMF)2
• Restarted development a few weeks back ;)
• Looking for contributors!
How to map data onto W :
>>>import pymf
>>>import numpy as np
>>>test_data = np.array([[1.0], [0.3]])
>>>mdl_test = pymf.kmeans.Kmeans(test_data, num_bases=2)
>>>mdl_test.W = mdl.W # mdl.W -> existing basis W
>>>mdl_test.factorize(compute_w=False)
>>>test_datx_approx = np.dot(mdl.W, mdl_test.H)
2
http://github.com/cthurau/pymf
14
PCA
Principal Component Analysis (PCA)3
• SVD/PCA are baseline matrix factorization methods
• Optimize:
min
W ,H
∥V − WH∥2
s.t. W T
W = I
• Restrict W to singular vectors of V (orthogonal matrix)
• Can (usually does) violate non-negativity
• Goal: best possible matrix approximation for a given k
• Great for compression or filtering out noise!
3
K. Pearson, On Lines and Planes of Closest Fit to Systems of Points in Space,
Philosophical Magazine, 1901.
15
Example PCA
>>>from pymf.pca import PCA
>>>import numpy as np
>>>mdl = PCA(data, num_bases=2)
>>>mdl.factorize()
>>>V_approx = np.dot(mdl.W, mdl.H)
• Usage for data analysis questionable
• Basis vectors usually not interpretable
V
≈
Vapprox
W = . . .
16
Non-negative Matrix Factorization4
• For V ≥ 0 constrained quadradic optimization problem:
min
W ,H
∥V − WH∥2
s.t. W ≥ 0
H ≥ 0
• a globally optimal solution provably exists; algorithms guaranteed to
find it remain elusive; exact NMF is NP hard
• Often W converges to partial representations
• Active area of research
• Goal: reconstruct data by independent parts
4
D.D. Lee and H.S. Seung, Learning the Parts of Objects by Non-Negative Matrix
Factorization, Nature, 401(6755), 1999
17
Example NMF
>>>from pymf.nmf import NMF
>>>import numpy as np
>>>mdl = NMF(data, num_bases=2, iter=50)
>>>mdl.factorize()
>>>V_approx = np.dot(mdl.W, mdl.H)
• Additive combination of parts
• Interesting options for data analysis
V
≈
Vapprox
W = . . .
18
Archetypal Analysis5
• Convexity constrained quadratic optmization problem:
min
W ,H
∥V − VWH∥2
s.t. wl,i ≥ 0,
∑
l
wl,i = 1
hk,i ≥ 0,
∑
k
hk,i = 1
• Reconstruct data by its archetypes, i.e. convex combinations of polar
opposites
• Yields novel and intuitive insights into data
• Great for interpretable data representations!
• O(n2), but: efficient approximations for large data exist
5
A. Cutler and L. Breiman, Archetypal Analysis, in Technometrics 36(4), 1994
19
Example Archetypal Analysis
>>>from pymf.aa import AA
>>>import numpy as np
>>>mdl = AA(data, num_bases=2, iter=50)
>>>mdl.factorize()
>>>V_approx = np.dot(mdl.W, mdl.H)
• Existent data points as basis vectors
• Convex combination allows a
probablilist interpretation
V
≈
Vapprox
W = . . .
20
Method Summary
• Common form: V = WH (or V = VWH)
W constraint H constraint Outcome
PCA - - compressed V
K-means - H = [0; 1],
∑
k hk,i = 1 groups
NMF W ≥ 0 H ≥ 0 parts
AA W ≥ 0,
∑
l wl,i = 1 H ≥ 0,
∑
k hk,i = 1 opposites
• Doesn’t only work for images ;)
• More complex constraints usually result in more complex solvers
• Active area of research deals with approximations for large data
21
Large matrices: PyMF and h5py
>>> import h5py
>>> import numpy as np
>>> from pymf.sivm import SIVM # uses [6]
>>> file = h5py.File(’myfile.hdf5’, ’w’)
>>> file[’dataset’] = np.random.random((100,1000))
>>> file[’W’] = np.random.random((100,10))
>>> file[’H’] = np.random.random((10,1000))
>>> sivm_mdl = SIVM(file[’dataset’], num_bases=10)
>>> sivm_mdl.W = file[’W’]
>>> sivm_mdl.H = file[’H’]
>>> sivm_mdl.factorize()
6
Thurau, Kersting, and Bauckhage, ”Simplex volume maximization for descriptive
web scale matrix factorization”, CIKM’2010
22
7
Science, 2010: Vol. 330
Take Home Message
• Most clustering, and data analysis methods are matrix
approximations
• Imposed constraints shape the factorization
• Imposed constraints yield different views on data
• One of the most effective and versatile tools for data exploration!
• Python implementation → http://github.com/cthurau/pymf
24
Thank you for your attention!
christian.thurau@unbelievable-machine.com

More Related Content

What's hot

PRML 1.6 情報理論
PRML 1.6 情報理論PRML 1.6 情報理論
PRML 1.6 情報理論sleepy_yoshi
 
Deep Mixtures of Factor Analysers
Deep Mixtures of Factor AnalysersDeep Mixtures of Factor Analysers
Deep Mixtures of Factor AnalysersJunya Saito
 
[Tokyor08] Rによるデータサイエンス 第2部 第3章 対応分析
[Tokyor08] Rによるデータサイエンス第2部 第3章 対応分析[Tokyor08] Rによるデータサイエンス第2部 第3章 対応分析
[Tokyor08] Rによるデータサイエンス 第2部 第3章 対応分析Yohei Sato
 
診断研究のメタアナリシスをやってみる(みたい)。
診断研究のメタアナリシスをやってみる(みたい)。診断研究のメタアナリシスをやってみる(みたい)。
診断研究のメタアナリシスをやってみる(みたい)。Takashi Fujiwara
 
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++sleepy_yoshi
 
はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)Motoya Wakiyama
 
「3.1.2最小二乗法の幾何学」PRML勉強会4 @筑波大学 #prml学ぼう
「3.1.2最小二乗法の幾何学」PRML勉強会4 @筑波大学 #prml学ぼう 「3.1.2最小二乗法の幾何学」PRML勉強会4 @筑波大学 #prml学ぼう
「3.1.2最小二乗法の幾何学」PRML勉強会4 @筑波大学 #prml学ぼう Junpei Tsuji
 
企業の中の経済学
企業の中の経済学企業の中の経済学
企業の中の経済学Yusuke Kaneko
 
FDRの使い方 (Kashiwa.R #3)
FDRの使い方 (Kashiwa.R #3)FDRの使い方 (Kashiwa.R #3)
FDRの使い方 (Kashiwa.R #3)Haruka Ozaki
 
統計的因果推論 勉強用 isseing333
統計的因果推論 勉強用 isseing333統計的因果推論 勉強用 isseing333
統計的因果推論 勉強用 isseing333Issei Kurahashi
 
YOU は何して VLDB2020 Tokyo へ? (グラフ編)
YOU は何して VLDB2020 Tokyo へ? (グラフ編)YOU は何して VLDB2020 Tokyo へ? (グラフ編)
YOU は何して VLDB2020 Tokyo へ? (グラフ編)Junya Arai
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithmVinit Dantkale
 
Cheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnCheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnKarlijn Willems
 
Causal discovery and prediction mechanisms
Causal discovery and prediction mechanismsCausal discovery and prediction mechanisms
Causal discovery and prediction mechanismsShiga University, RIKEN
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classificationsathish sak
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章Shuyo Nakatani
 
Natural language processing techniques transition from machine learning to de...
Natural language processing techniques transition from machine learning to de...Natural language processing techniques transition from machine learning to de...
Natural language processing techniques transition from machine learning to de...Divya Gera
 

What's hot (20)

PRML 1.6 情報理論
PRML 1.6 情報理論PRML 1.6 情報理論
PRML 1.6 情報理論
 
Deep Mixtures of Factor Analysers
Deep Mixtures of Factor AnalysersDeep Mixtures of Factor Analysers
Deep Mixtures of Factor Analysers
 
[Tokyor08] Rによるデータサイエンス 第2部 第3章 対応分析
[Tokyor08] Rによるデータサイエンス第2部 第3章 対応分析[Tokyor08] Rによるデータサイエンス第2部 第3章 対応分析
[Tokyor08] Rによるデータサイエンス 第2部 第3章 対応分析
 
診断研究のメタアナリシスをやってみる(みたい)。
診断研究のメタアナリシスをやってみる(みたい)。診断研究のメタアナリシスをやってみる(みたい)。
診断研究のメタアナリシスをやってみる(みたい)。
 
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
TokyoNLP#7 きれいなジャイアンのカカカカ☆カーネル法入門-C++
 
はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)
 
「3.1.2最小二乗法の幾何学」PRML勉強会4 @筑波大学 #prml学ぼう
「3.1.2最小二乗法の幾何学」PRML勉強会4 @筑波大学 #prml学ぼう 「3.1.2最小二乗法の幾何学」PRML勉強会4 @筑波大学 #prml学ぼう
「3.1.2最小二乗法の幾何学」PRML勉強会4 @筑波大学 #prml学ぼう
 
企業の中の経済学
企業の中の経済学企業の中の経済学
企業の中の経済学
 
Re revenge chap03-1
Re revenge chap03-1Re revenge chap03-1
Re revenge chap03-1
 
FDRの使い方 (Kashiwa.R #3)
FDRの使い方 (Kashiwa.R #3)FDRの使い方 (Kashiwa.R #3)
FDRの使い方 (Kashiwa.R #3)
 
統計的因果推論 勉強用 isseing333
統計的因果推論 勉強用 isseing333統計的因果推論 勉強用 isseing333
統計的因果推論 勉強用 isseing333
 
YOU は何して VLDB2020 Tokyo へ? (グラフ編)
YOU は何して VLDB2020 Tokyo へ? (グラフ編)YOU は何して VLDB2020 Tokyo へ? (グラフ編)
YOU は何して VLDB2020 Tokyo へ? (グラフ編)
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
 
Cheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learnCheat Sheet for Machine Learning in Python: Scikit-learn
Cheat Sheet for Machine Learning in Python: Scikit-learn
 
Causal discovery and prediction mechanisms
Causal discovery and prediction mechanismsCausal discovery and prediction mechanisms
Causal discovery and prediction mechanisms
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章
 
Lec 4,5
Lec 4,5Lec 4,5
Lec 4,5
 
Natural language processing techniques transition from machine learning to de...
Natural language processing techniques transition from machine learning to de...Natural language processing techniques transition from machine learning to de...
Natural language processing techniques transition from machine learning to de...
 
Roughset & it’s variants
Roughset & it’s variantsRoughset & it’s variants
Roughset & it’s variants
 

Viewers also liked

Hierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationHierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationAlexander Litvinenko
 
Text mining, By Hadi Mohammadzadeh
Text mining, By Hadi MohammadzadehText mining, By Hadi Mohammadzadeh
Text mining, By Hadi MohammadzadehHadi Mohammadzadeh
 
Zavala lilia tecnologia
Zavala lilia tecnologiaZavala lilia tecnologia
Zavala lilia tecnologiaAngela Zavala
 
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...PyData
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypetPyData
 
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"PyData
 
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014PyData
 
Nipype
NipypeNipype
NipypePyData
 
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014PyData
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...PyData
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...PyData
 
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischInteractive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischPyData
 
How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...PyData
 
Doing frequentist statistics with scipy
Doing frequentist statistics with scipyDoing frequentist statistics with scipy
Doing frequentist statistics with scipyPyData
 
Python resampling
Python resamplingPython resampling
Python resamplingPyData
 
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerPyData
 
Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataFang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataPyData
 
Promoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPromoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPyData
 
Making your code faster cython and parallel processing in the jupyter notebook
Making your code faster   cython and parallel processing in the jupyter notebookMaking your code faster   cython and parallel processing in the jupyter notebook
Making your code faster cython and parallel processing in the jupyter notebookPyData
 

Viewers also liked (20)

Hierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationHierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimation
 
Text mining, By Hadi Mohammadzadeh
Text mining, By Hadi MohammadzadehText mining, By Hadi Mohammadzadeh
Text mining, By Hadi Mohammadzadeh
 
Zavala lilia tecnologia
Zavala lilia tecnologiaZavala lilia tecnologia
Zavala lilia tecnologia
 
Query Based Summarization
Query Based SummarizationQuery Based Summarization
Query Based Summarization
 
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypet
 
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
 
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
 
Nipype
NipypeNipype
Nipype
 
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
 
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischInteractive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
 
How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...
 
Doing frequentist statistics with scipy
Doing frequentist statistics with scipyDoing frequentist statistics with scipy
Doing frequentist statistics with scipy
 
Python resampling
Python resamplingPython resampling
Python resampling
 
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
 
Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataFang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
 
Promoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPromoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices Environment
 
Making your code faster cython and parallel processing in the jupyter notebook
Making your code faster   cython and parallel processing in the jupyter notebookMaking your code faster   cython and parallel processing in the jupyter notebook
Making your code faster cython and parallel processing in the jupyter notebook
 

Similar to Low-rank matrix approximations in Python by Christian Thurau PyData 2014

Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdfRahul926331
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx36rajneekant
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptxAbdusSadik
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxRohanBorgalli
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the WeightsMark Chang
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the WeightsMark Chang
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagationDong Guo
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleHakka Labs
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modeljins0618
 
Advanced matlab codigos matematicos
Advanced matlab codigos matematicosAdvanced matlab codigos matematicos
Advanced matlab codigos matematicosKmilo Bolaños
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approachnozomuhamada
 
Training machine learning k means 2017
Training machine learning k means 2017Training machine learning k means 2017
Training machine learning k means 2017Iwan Sofana
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive ModellingRajiv Advani
 

Similar to Low-rank matrix approximations in Python by Christian Thurau PyData 2014 (20)

Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptx
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptx
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
09Evaluation_Clustering.pdf
09Evaluation_Clustering.pdf09Evaluation_Clustering.pdf
09Evaluation_Clustering.pdf
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagation
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
Advanced matlab codigos matematicos
Advanced matlab codigos matematicosAdvanced matlab codigos matematicos
Advanced matlab codigos matematicos
 
MLE.pdf
MLE.pdfMLE.pdf
MLE.pdf
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach
 
Training machine learning k means 2017
Training machine learning k means 2017Training machine learning k means 2017
Training machine learning k means 2017
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 

More from PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

More from PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Recently uploaded

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 

Recently uploaded (20)

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

Low-rank matrix approximations in Python by Christian Thurau PyData 2014

  • 1. Low-rank matrix approximations with Python Christian Thurau
  • 2. Table of Contents 1 Intro 2 The Basics 3 Matrix approximation 4 Some methods 5 Matrix Factorization with Python 6 Example & Conclusion 2
  • 3. For Starters... Observations • Data matrix factorization has become an important tool in information retrieval, data mining, and pattern recognition • Nowadays, typical data matrices are HUGE • Examples include: • Gene expression data and microarrays • Digital images • Term by document matrices • User ratings for movies, products, ... • Graph adjacency matrices 3
  • 4. Matrix Factorization • given a matrix V • determine matrices W and H • such that V = WH or V ≈ WH • characteristics such as entries, shape, rank of V , W , and H will depend on application context 4
  • 5. The Basics matrix factorization allows for: • solving linear equations • transforming data • compressing data matrix factorization facilitates subsequent processing in: • information retrieval • pattern recognition • data mining 5
  • 6. Low-rank Matrix Approximations • Aapproximate V V ≈ WH • where V ∈ Rm×n W ∈ Rm×k H ∈ Rk×n • and rank(W ) ≪ rank(V ) k ≪ min(m, n) V = W H 6
  • 7. Matrix Approximation • If V = WH • then vi,j = wi,∗h∗,j = k∑ x=1 wi,x hx,j V = W H 7
  • 8. Matrix Approximation • More importantly: v∗,j = Wh∗,j = k∑ x=1 w∗,x hx,j • therefore W ↔ ”basis” matrix H ↔ coefficient matrix V = W H = + + 8
  • 9. On Matrix Factorization Methods • matrix factorization ↔ data transformation • matrix rank reduction ↔ data compression • Common form: V = WH • Broad range of methods: • K-means clustering • SVD/PCA • Non-negative Matrix Factorization • Archetypal Analysis • Binary matrix factorization • CUR decomposition • ... • Each method yields a unique view on data . . . • . . . and is suited for different tasks 9
  • 10. K-means Clustering1 • Baseline clustering method • Constrained quadradic optimization problem: min W ,H ∥V − WH∥2 s.t. H = [0; 1], ∑ k hk,i = 1 • Find W , H using expectation maximization • Optimal k-means partitioning is np-hard • Goal: group similar data points • Interesting: K-means clustering is matrix factorization 1 J.B. MacQueen, Some Methods for classification and Analysis of Multivariate Observations”. Berkeley Symposium on Mathematical Statistics and Probability. 1967 10
  • 11. K-means Clustering is Matrix Factorization!        x1,1 x1,2 x1,3 . . . x1,n x2,1 x2,2 x2,3 . . . x2,n x3,1 x3,2 x3,3 . . . x3,n .. . .. . .. . ... .. . xm,1 xm,2 xm,3 . . . xm,n               b1,1 b1,2 b1,3 b2,1 b2,2 b2,3 b3,1 b3,2 b2,3 .. . .. . .. . bn,1 bn,2 bn,3          0 1 1 . . . 0 1 0 0 . . . 0 0 0 0 . . . 1   • i.e. for X ∈ Rm×n, and B ∈ Rn×3, and A ∈ R3×n as above, the product XBA = MA realizes an assignment xi → mj , where mj = Xbj 11
  • 12. Example: K-means ≈ 0.0 + 0.0 . . . 1.0 . . . 0.0 = • Similar images are grouped into k groups • Approximate data by mapping each data point onto the mean of a cluster regions 12
  • 13. Python Matrix Factorization Toolbox (PyMF)2 • Started in 2010 at Fraunhofer IAIS/University of Bonn • Vast number of different methods! • Supports hdf5/h5py and sparse matrices How to factorize a data matrix V : >>>import pymf >>>import numpy as np >>>data = np.array([[1.0, 0.0, 2.0], [0.0, 1.0, 1.0]]) >>>mdl = pymf.kmeans.Kmeans(data, num_bases=2) >>>mdl.factorize(niter=10) # optimize for WH >>>V_approx = np.dot(mdl.W, mdl.H) # V = WH 2 http://github.com/cthurau/pymf 13
  • 14. Python Matrix Factorization Toolbox (PyMF)2 • Restarted development a few weeks back ;) • Looking for contributors! How to map data onto W : >>>import pymf >>>import numpy as np >>>test_data = np.array([[1.0], [0.3]]) >>>mdl_test = pymf.kmeans.Kmeans(test_data, num_bases=2) >>>mdl_test.W = mdl.W # mdl.W -> existing basis W >>>mdl_test.factorize(compute_w=False) >>>test_datx_approx = np.dot(mdl.W, mdl_test.H) 2 http://github.com/cthurau/pymf 14
  • 15. PCA Principal Component Analysis (PCA)3 • SVD/PCA are baseline matrix factorization methods • Optimize: min W ,H ∥V − WH∥2 s.t. W T W = I • Restrict W to singular vectors of V (orthogonal matrix) • Can (usually does) violate non-negativity • Goal: best possible matrix approximation for a given k • Great for compression or filtering out noise! 3 K. Pearson, On Lines and Planes of Closest Fit to Systems of Points in Space, Philosophical Magazine, 1901. 15
  • 16. Example PCA >>>from pymf.pca import PCA >>>import numpy as np >>>mdl = PCA(data, num_bases=2) >>>mdl.factorize() >>>V_approx = np.dot(mdl.W, mdl.H) • Usage for data analysis questionable • Basis vectors usually not interpretable V ≈ Vapprox W = . . . 16
  • 17. Non-negative Matrix Factorization4 • For V ≥ 0 constrained quadradic optimization problem: min W ,H ∥V − WH∥2 s.t. W ≥ 0 H ≥ 0 • a globally optimal solution provably exists; algorithms guaranteed to find it remain elusive; exact NMF is NP hard • Often W converges to partial representations • Active area of research • Goal: reconstruct data by independent parts 4 D.D. Lee and H.S. Seung, Learning the Parts of Objects by Non-Negative Matrix Factorization, Nature, 401(6755), 1999 17
  • 18. Example NMF >>>from pymf.nmf import NMF >>>import numpy as np >>>mdl = NMF(data, num_bases=2, iter=50) >>>mdl.factorize() >>>V_approx = np.dot(mdl.W, mdl.H) • Additive combination of parts • Interesting options for data analysis V ≈ Vapprox W = . . . 18
  • 19. Archetypal Analysis5 • Convexity constrained quadratic optmization problem: min W ,H ∥V − VWH∥2 s.t. wl,i ≥ 0, ∑ l wl,i = 1 hk,i ≥ 0, ∑ k hk,i = 1 • Reconstruct data by its archetypes, i.e. convex combinations of polar opposites • Yields novel and intuitive insights into data • Great for interpretable data representations! • O(n2), but: efficient approximations for large data exist 5 A. Cutler and L. Breiman, Archetypal Analysis, in Technometrics 36(4), 1994 19
  • 20. Example Archetypal Analysis >>>from pymf.aa import AA >>>import numpy as np >>>mdl = AA(data, num_bases=2, iter=50) >>>mdl.factorize() >>>V_approx = np.dot(mdl.W, mdl.H) • Existent data points as basis vectors • Convex combination allows a probablilist interpretation V ≈ Vapprox W = . . . 20
  • 21. Method Summary • Common form: V = WH (or V = VWH) W constraint H constraint Outcome PCA - - compressed V K-means - H = [0; 1], ∑ k hk,i = 1 groups NMF W ≥ 0 H ≥ 0 parts AA W ≥ 0, ∑ l wl,i = 1 H ≥ 0, ∑ k hk,i = 1 opposites • Doesn’t only work for images ;) • More complex constraints usually result in more complex solvers • Active area of research deals with approximations for large data 21
  • 22. Large matrices: PyMF and h5py >>> import h5py >>> import numpy as np >>> from pymf.sivm import SIVM # uses [6] >>> file = h5py.File(’myfile.hdf5’, ’w’) >>> file[’dataset’] = np.random.random((100,1000)) >>> file[’W’] = np.random.random((100,10)) >>> file[’H’] = np.random.random((10,1000)) >>> sivm_mdl = SIVM(file[’dataset’], num_bases=10) >>> sivm_mdl.W = file[’W’] >>> sivm_mdl.H = file[’H’] >>> sivm_mdl.factorize() 6 Thurau, Kersting, and Bauckhage, ”Simplex volume maximization for descriptive web scale matrix factorization”, CIKM’2010 22
  • 24. Take Home Message • Most clustering, and data analysis methods are matrix approximations • Imposed constraints shape the factorization • Imposed constraints yield different views on data • One of the most effective and versatile tools for data exploration! • Python implementation → http://github.com/cthurau/pymf 24
  • 25. Thank you for your attention! christian.thurau@unbelievable-machine.com