SlideShare a Scribd company logo
Dictionary Learning for
Massive Matrix Factorization
Arthur Mensch, Julien Mairal,
Ga¨el Varoquaux, Bertrand Thirion
Inria/CEA Parietal, Inria Thoth
June 20, 2016
Matrix factorization
X
p
n
D
p
k
=
A
n
k
1X ∈ Rp×n = DA ∈ Rp×k × Rk×n
Flexible tool for unsupervised data analysis
Dataset has lower underlying complexity than appearing size
How to scale it to very large datasets ? (Brain imaging, 2TB)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 1 / 19
Matrix factorization
X
p
n
D
p
k
=
A
n
k
1
Low rank factorization : k < p
X
p
n
D
p
k
=
A
n
k
1
...with optional sparse factors
→ interpretable data (fMRI, genetics, topic modeling)
X
p
n
D
p
k
=
A
n
k
1Overcomplete dictionary learning k p - sparse A
[Olshausen and Field, 1997]
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 2 / 19
Formalism and methods
Non-convex formulation
min
D∈C,A∈Rk×n
X − DA 2
2 + λΩ(A)
Constraints on D
Penalty on A ( 1, 2)
Naive resolution
Alternated minimization: use full X at each iteration
Very slow : single iteration in O(p n)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 3 / 19
Online matrix factorization
Stream (xt), update D at each t [Mairal et al., 2010]
Single iteration in O(p), a few epochs
xt
p
n
D
p
k
=
αt
n
k
streaming
1
Large n, regular p, eg image patches:
p = 256 n ≈ 106
1GB
Both (sparse) low-rank factorization / sparse coding
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 4 / 19
Scaling-up for massive matrices
Functional MRI (HCP dataset)
Brain “movies” : space × time
Extract k sparse networks
p = 2 · 105 n = 2 · 106 2 TB
Way larger than vision problems
Unusual setting: data is large in
both directions
Also useful in collaborative filtering
X
Voxels
Time
=
D A
k spatial maps Time
x
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 5 / 19
Scaling-up for massive matrices
Out-of-the-box online algorithm ?
xt
p
n
D
p
k
=
αt
n
k
1Limited time budget ?
Need to accomodate large p
235 h run time
1 full epoch
10 h run time
1
24
epoch
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 6 / 19
Scaling-up in both directions
X
p
n
Batch → online
xt
n
Steaming
Handle large n
1
xt
p
n
Streaming
Mtxt
n
Streaming
Subsampling
Handle large p
Online → double online
1
Online learning + partial random access to samples
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 7 / 19
Scaling-up in both directions
xt
p
n
Streaming
Mtxt
n
Streaming
Subsampling
Handle large p
Online → double online
1
Low-distorsion lemma [Johnson and Lindenstrauss, 1984]
Random linear alebra [Halko et al., 2009]
Sketching for data reduction [Pilanci and Wainwright, 2014]
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 8 / 19
Algorithm design
Online dictionary learning [Mairal et al., 2010]
1 Compute code – O(p)
αt = argmin
α∈Rk
xt − Dt−1α 2
2 + λΩ(αt)
2 Update surrogate – O(p)
gt =
1
t
t
i=1
xi − Dαi
2
2
3 Minimize surrogate – O(p)
Dt = argmin
D∈C
gt(D) = argmin
D∈C
Tr (D DAt − D Bt)
xt access → O(p) algorithm (complexity dependency in p)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 9 / 19
Introducing subsampling
Iteration cost in O(p): can we reduce it?
xt → Mtxt, p → rk Mt = s
Use only Mtxt in algorithm
computation: complexity in O(s)
Mtxt
p
n
Streaming
Subsampling
1Our contribution
Adapt the 3 parts of the algorith to obtain O(s) complexity
1 Code
computation
2 Surrogate
update
3 Surrogate
minimization
[Szab´o et al., 2011]: dictionary learning with missing value – O(p)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 10 / 19
1. Code computation
Linear regression with random sampling
αt = argmin
α∈Rk
Mt(xt − Dt−1αt) 2
2 + λ
rk Mt
p
Ω(α)
approximative solution of
αt = argmin
α∈Rk
xt − Dt−1αt
2
2 + λΩ(α)
validity in high dimension, with incoherent features:
D MtD ≈
s
p
D D D Mtxt ≈
s
p
D xt
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 11 / 19
2. Surrogate update
Original algorithm: At and Bt used in dictionary update
At = 1
t
t
i=1 αi αi same as in online algorithm
Bt = (1 − 1
t )Bt−1 + 1
t xtαt = 1
t
t
i=1 xi αi Forbidden
Partial update of Bt at each iteration
Bt =
1
t
i=1 Mi
t
i=1
Mi xi αi
Only MtB is updated
Behaves like Ex[xα] for large t
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 12 / 19
3. Surrogate minimization
Original algorithm : block coordinate descent with projection on C
min
D∈C
gt(D) Dj ← p⊥
Cj
(Dj −
1
Aj,j
(D(At)j − (Bt)j ))
Forbidden update of full D at iteration t
Cautious update
Leave dictionary unchanged for unseen features (I − Mt)
min
D∈C
(I−Mt )D=(I−Mt )Dt−1
gt(D)
O(s) update in block coordinate descent
Dj ← p⊥
Cr
j
(Dj −
1
(At)j,j
(Mt(D(At)j − (Bt)j )))
1 ball Cr
j = {D ∈ C, MtD 1 ≤ MtDt−1 1}
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 13 / 19
Resting-state fMRI
HCP dataset
One brain image per second
200 sessions n = 2 · 106 p = 2 · 105
D AX
Voxels
Time
=
k spatial maps Time
x
Sparse decomposition k = 70 C = Bk
1 Ω = · 2
2
Validation
Increase reduction factor p
s
Objective function on test set vs CPU time
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 14 / 19
Resting-state fMRI
Online dictionary learning
235 h run time
1 full epoch
10 h run time
1
24 epoch
Proposed method
10 h run time
1
2 epoch, reduction r=12
Qualitatively, usable maps are obtained 10× faster
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 15 / 19
Resting-state fMRI
.1 h 1 h 10 h 100 h
CPU
time
2.20
2.25
2.30
2.35
2.40
Objectivevalueontestset ×108
Original online algorithm
Reduction factor r 4 8 12
No reduction
Speed-up close to reduction factor p
s
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 16 / 19
Resting-state fMRI
100 1000 Epoch 4000Records
2.16
2.17
2.18
2.19
2.20
2.21
2.22
2.23
2.24
Objectivevalueontestset ×108
λ = 10−4
No reduction
(original alg.)
r = 4
r = 8
r = 12
Convergence speed / number of seen records
Information is acquired faster
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 17 / 19
Collaborative filtering
Mtxt movie ratings from user t
vs. coordinate descent for MMMF loss (no hyperparameters)
100 s 1000 s
0.93
0.94
0.95
0.96
0.97
0.98
0.99 Netflix (140M)
Coordinate descent
Proposed
(full projection)
Proposed
(partial projection)
Dataset Test RMSE Speed
CD MODL -up
ML 1M 0.872 0.866 ×0.75
ML 10M 0.802 0.799 ×3.7
NF (140M) 0.938 0.934 ×6.8
Outperform coordinate descent beyond 10M ratings
Same prediction performance
Speed-up 6.8× on Netflix
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 18 / 19
Conclusion
Take-home message
Loading stochastic subsets of sample
streams can drastically accelerates online
matrix factorization
Mtxt
p
n
Streaming
Subsampling
1
Reduce CPU (+IO) load at each iteration
cf Gradient Descent vs SGD
An order of magnitude speed-up on two different problems
Python package http://github.com/arthurmensch/modl
Heuristic at contribution time
A follow-up algorithm has convergence guarantees
Questions ? (Poster # 41 this afternoon)
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 19 / 19
Bibliography I
[Halko et al., 2009] Halko, N., Martinsson, P.-G., and Tropp, J. A. (2009).
Finding structure with randomness: Probabilistic algorithms for constructing
approximate matrix decompositions.
arXiv:0909.4061 [math].
[Johnson and Lindenstrauss, 1984] Johnson, W. B. and Lindenstrauss, J.
(1984).
Extensions of Lipschitz mappings into a Hilbert space.
Contemporary mathematics, 26(189-206):1.
[Mairal et al., 2010] Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2010).
Online learning for matrix factorization and sparse coding.
The Journal of Machine Learning Research, 11:19–60.
[Olshausen and Field, 1997] Olshausen, B. A. and Field, D. J. (1997).
Sparse coding with an overcomplete basis set: A strategy employed by V1?
Vision Research, 37(23):3311–3325.
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 20 / 19
Bibliography II
[Pilanci and Wainwright, 2014] Pilanci, M. and Wainwright, M. J. (2014).
Iterative Hessian sketch: Fast and accurate solution approximation for
constrained least-squares.
arXiv:1411.0347 [cs, math, stat].
[Szab´o et al., 2011] Szab´o, Z., P´oczos, B., and Lorincz, A. (2011).
Online group-structured dictionary learning.
In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 2865–2872. IEEE.
[Yu et al., 2012] Yu, H.-F., Hsieh, C.-J., and Dhillon, I. (2012).
Scalable coordinate descent approaches to parallel matrix factorization for
recommender systems.
In Proceedings of the International Conference on Data Mining, pages
765–774. IEEE.
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 21 / 19
Appendix
Collaborative filtering
Streaming uncomplete data
Mt is imposed by user t
Data stream : Mtxt movies ranked by user t
Proposed by [Szab´o et al., 2011]), with O(p) complexity
Validation: Test RMSE (rating prediction) vs CPU time
Baseline: Coordinate descent solver [Yu et al., 2012] solving
related loss
n
i=1
( Mt(Xt − Dαt) 2
2 + λ αt
2
2) + λ D 2
2
Fastest solver available apart from SGD – no hyperparameters
Our method is not sensitive to hyperparameters
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 22 / 19
Algorithm
Our algorithm
1 Code computation
αt = argmin
α∈Rk
Mt (xt − Dt−1α) 2
2
+ λ
rk Mt
p
Ω(αt )
2 Surrogate aggregation
At =
1
t
t
i=1
αi αi
Bt = Bt−1 +
1
t
i=1 Mi
(Mt xt αt − Mt Bt−1)
3 Surrogate minimization
Mt Dj ← p⊥
Cj
(Mt Dj −
1
(At )j,j
Mt (D(At )j −(Bt )j ))
Original online MF
1 Code computation
αt = argmin
α∈Rk
xt − Dt−1α 2
2
+ λΩ(αt )
2 Surrogate aggregation
At =
1
t
t
i=1
αi αi
Bt = Bt−1 +
1
t
(xt αt − Bt−1)
3 Surrogate minimization
Dj ← p⊥
Cr
j
(Dj −
1
(At )j,j
(D(At )j −(Bt )j ))
Arthur Mensch Dictionary Learning for Massive Matrix Factorization 23 / 19

More Related Content

What's hot

Probabilistic Approaches to Shadow Maps Filtering
Probabilistic Approaches to Shadow Maps FilteringProbabilistic Approaches to Shadow Maps Filtering
Probabilistic Approaches to Shadow Maps Filtering
Marco Salvi
 
Deep-Learning Based Stereo Super-Resolution
Deep-Learning Based Stereo Super-ResolutionDeep-Learning Based Stereo Super-Resolution
Deep-Learning Based Stereo Super-Resolution
NAVER Engineering
 
CitizenLab pitch-A citizensourcing platform for cities by Co-Founder, Aline M...
CitizenLab pitch-A citizensourcing platform for cities by Co-Founder, Aline M...CitizenLab pitch-A citizensourcing platform for cities by Co-Founder, Aline M...
CitizenLab pitch-A citizensourcing platform for cities by Co-Founder, Aline M...
Betagroup
 
3 d image processsing operations
3 d image processsing operations3 d image processsing operations
3 d image processsing operations
MUTHUKUMAR MANIVANNAN
 
Matlab Image Restoration Techniques
Matlab Image Restoration TechniquesMatlab Image Restoration Techniques
Matlab Image Restoration Techniques
DataminingTools Inc
 
Image Processing Basics
Image Processing BasicsImage Processing Basics
Image Processing Basics
Nam Le
 
Enhancement in frequency domain
Enhancement in frequency domainEnhancement in frequency domain
Enhancement in frequency domainAshish Kumar
 
브이월(VWALL) 사업계획서(초안)
브이월(VWALL) 사업계획서(초안)브이월(VWALL) 사업계획서(초안)
브이월(VWALL) 사업계획서(초안)
무현 윤
 
You only look once
You only look onceYou only look once
You only look once
Gin Kyeng Lee
 
전형규, Vertex Post-Processing Framework, NDC2011
전형규, Vertex Post-Processing Framework, NDC2011전형규, Vertex Post-Processing Framework, NDC2011
전형규, Vertex Post-Processing Framework, NDC2011devCAT Studio, NEXON
 
Computer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientComputer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and Gradient
Ahmed Gad
 
The single image dehazing based on efficient transmission estimation
The single image dehazing based on efficient transmission estimationThe single image dehazing based on efficient transmission estimation
The single image dehazing based on efficient transmission estimation
AVVENIRE TECHNOLOGIES
 
Spatial domain and filtering
Spatial domain and filteringSpatial domain and filtering
Spatial domain and filtering
University of Potsdam
 
인공지능 방법론 - Deep Learning 쉽게 이해하기
인공지능 방법론 - Deep Learning 쉽게 이해하기인공지능 방법론 - Deep Learning 쉽게 이해하기
인공지능 방법론 - Deep Learning 쉽게 이해하기
Byoung-Hee Kim
 
물리기반렌더링 알레고리드믹 한국어 번역
물리기반렌더링 알레고리드믹 한국어 번역물리기반렌더링 알레고리드믹 한국어 번역
물리기반렌더링 알레고리드믹 한국어 번역
Dae Hyek KIM
 
Image processing, Noise, Noise Removal filters
Image processing, Noise, Noise Removal filtersImage processing, Noise, Noise Removal filters
Image processing, Noise, Noise Removal filters
Kuppusamy P
 
Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...
Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...
Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...
KwanyoungKim7
 
Image denoising
Image denoisingImage denoising
Image denoising
Abdur Rehman
 
Image segmentation
Image segmentationImage segmentation
Image segmentationDeepak Kumar
 
A version of watershed algorithm for color image segmentation
A version of watershed algorithm for color image segmentationA version of watershed algorithm for color image segmentation
A version of watershed algorithm for color image segmentation
Habibur Rahman
 

What's hot (20)

Probabilistic Approaches to Shadow Maps Filtering
Probabilistic Approaches to Shadow Maps FilteringProbabilistic Approaches to Shadow Maps Filtering
Probabilistic Approaches to Shadow Maps Filtering
 
Deep-Learning Based Stereo Super-Resolution
Deep-Learning Based Stereo Super-ResolutionDeep-Learning Based Stereo Super-Resolution
Deep-Learning Based Stereo Super-Resolution
 
CitizenLab pitch-A citizensourcing platform for cities by Co-Founder, Aline M...
CitizenLab pitch-A citizensourcing platform for cities by Co-Founder, Aline M...CitizenLab pitch-A citizensourcing platform for cities by Co-Founder, Aline M...
CitizenLab pitch-A citizensourcing platform for cities by Co-Founder, Aline M...
 
3 d image processsing operations
3 d image processsing operations3 d image processsing operations
3 d image processsing operations
 
Matlab Image Restoration Techniques
Matlab Image Restoration TechniquesMatlab Image Restoration Techniques
Matlab Image Restoration Techniques
 
Image Processing Basics
Image Processing BasicsImage Processing Basics
Image Processing Basics
 
Enhancement in frequency domain
Enhancement in frequency domainEnhancement in frequency domain
Enhancement in frequency domain
 
브이월(VWALL) 사업계획서(초안)
브이월(VWALL) 사업계획서(초안)브이월(VWALL) 사업계획서(초안)
브이월(VWALL) 사업계획서(초안)
 
You only look once
You only look onceYou only look once
You only look once
 
전형규, Vertex Post-Processing Framework, NDC2011
전형규, Vertex Post-Processing Framework, NDC2011전형규, Vertex Post-Processing Framework, NDC2011
전형규, Vertex Post-Processing Framework, NDC2011
 
Computer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientComputer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and Gradient
 
The single image dehazing based on efficient transmission estimation
The single image dehazing based on efficient transmission estimationThe single image dehazing based on efficient transmission estimation
The single image dehazing based on efficient transmission estimation
 
Spatial domain and filtering
Spatial domain and filteringSpatial domain and filtering
Spatial domain and filtering
 
인공지능 방법론 - Deep Learning 쉽게 이해하기
인공지능 방법론 - Deep Learning 쉽게 이해하기인공지능 방법론 - Deep Learning 쉽게 이해하기
인공지능 방법론 - Deep Learning 쉽게 이해하기
 
물리기반렌더링 알레고리드믹 한국어 번역
물리기반렌더링 알레고리드믹 한국어 번역물리기반렌더링 알레고리드믹 한국어 번역
물리기반렌더링 알레고리드믹 한국어 번역
 
Image processing, Noise, Noise Removal filters
Image processing, Noise, Noise Removal filtersImage processing, Noise, Noise Removal filters
Image processing, Noise, Noise Removal filters
 
Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...
Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...
Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...
 
Image denoising
Image denoisingImage denoising
Image denoising
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
A version of watershed algorithm for color image segmentation
A version of watershed algorithm for color image segmentationA version of watershed algorithm for color image segmentation
A version of watershed algorithm for color image segmentation
 

Viewers also liked

IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learningIEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEEBEBTECHSTUDENTPROJECTS
 
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...zukun
 
Blind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary LearningBlind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary Learning
Davide Nardone
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014
Manchor Ko
 
Introduction to Compressive Sensing (Compressed Sensing)
Introduction to Compressive Sensing (Compressed Sensing)Introduction to Compressive Sensing (Compressed Sensing)
Introduction to Compressive Sensing (Compressed Sensing)
Hamid Adldoost
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
recsysfr
 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in Python
Gael Varoquaux
 

Viewers also liked (7)

IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learningIEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
 
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 1: S...
 
Blind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary LearningBlind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary Learning
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014
 
Introduction to Compressive Sensing (Compressed Sensing)
Introduction to Compressive Sensing (Compressed Sensing)Introduction to Compressive Sensing (Compressed Sensing)
Introduction to Compressive Sensing (Compressed Sensing)
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in Python
 

Similar to Dictionary Learning for Massive Matrix Factorization

Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filtering
Arthur Mensch
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to Reconstruct
Jonas Adler
 
Macrocanonical models for texture synthesis
Macrocanonical models for texture synthesisMacrocanonical models for texture synthesis
Macrocanonical models for texture synthesis
Valentin De Bortoli
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
André Panisson
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
Frank Nielsen
 
presentation
presentationpresentation
presentationjie ren
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
tthonet
 
Big Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBig Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina Morik
BigMine
 
Talk icml
Talk icmlTalk icml
Talk icmlBo Li
 
Matrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender SystemsMatrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender Systems
Dmitriy Selivanov
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Jack Clark
 
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
multimediaeval
 
Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4Xenia miscouridou wi mlds 4
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
Gael Varoquaux
 
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
The Statistical and Applied Mathematical Sciences Institute
 
QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
The Statistical and Applied Mathematical Sciences Institute
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
The Statistical and Applied Mathematical Sciences Institute
 
Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...
Cigdem Aslay
 
Benelearn2016
Benelearn2016Benelearn2016
Benelearn2016
Antonio Sutera
 

Similar to Dictionary Learning for Massive Matrix Factorization (20)

Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filtering
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to Reconstruct
 
Macrocanonical models for texture synthesis
Macrocanonical models for texture synthesisMacrocanonical models for texture synthesis
Macrocanonical models for texture synthesis
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
presentation
presentationpresentation
presentation
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
Big Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBig Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina Morik
 
Talk icml
Talk icmlTalk icml
Talk icml
 
Matrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender SystemsMatrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender Systems
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
 
Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
 
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
QMC: Operator Splitting Workshop, Incremental Learning-to-Learn with Statisti...
 
QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...
 
Benelearn2016
Benelearn2016Benelearn2016
Benelearn2016
 

Recently uploaded

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 

Recently uploaded (20)

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 

Dictionary Learning for Massive Matrix Factorization

  • 1. Dictionary Learning for Massive Matrix Factorization Arthur Mensch, Julien Mairal, Ga¨el Varoquaux, Bertrand Thirion Inria/CEA Parietal, Inria Thoth June 20, 2016
  • 2. Matrix factorization X p n D p k = A n k 1X ∈ Rp×n = DA ∈ Rp×k × Rk×n Flexible tool for unsupervised data analysis Dataset has lower underlying complexity than appearing size How to scale it to very large datasets ? (Brain imaging, 2TB) Arthur Mensch Dictionary Learning for Massive Matrix Factorization 1 / 19
  • 3. Matrix factorization X p n D p k = A n k 1 Low rank factorization : k < p X p n D p k = A n k 1 ...with optional sparse factors → interpretable data (fMRI, genetics, topic modeling) X p n D p k = A n k 1Overcomplete dictionary learning k p - sparse A [Olshausen and Field, 1997] Arthur Mensch Dictionary Learning for Massive Matrix Factorization 2 / 19
  • 4. Formalism and methods Non-convex formulation min D∈C,A∈Rk×n X − DA 2 2 + λΩ(A) Constraints on D Penalty on A ( 1, 2) Naive resolution Alternated minimization: use full X at each iteration Very slow : single iteration in O(p n) Arthur Mensch Dictionary Learning for Massive Matrix Factorization 3 / 19
  • 5. Online matrix factorization Stream (xt), update D at each t [Mairal et al., 2010] Single iteration in O(p), a few epochs xt p n D p k = αt n k streaming 1 Large n, regular p, eg image patches: p = 256 n ≈ 106 1GB Both (sparse) low-rank factorization / sparse coding Arthur Mensch Dictionary Learning for Massive Matrix Factorization 4 / 19
  • 6. Scaling-up for massive matrices Functional MRI (HCP dataset) Brain “movies” : space × time Extract k sparse networks p = 2 · 105 n = 2 · 106 2 TB Way larger than vision problems Unusual setting: data is large in both directions Also useful in collaborative filtering X Voxels Time = D A k spatial maps Time x Arthur Mensch Dictionary Learning for Massive Matrix Factorization 5 / 19
  • 7. Scaling-up for massive matrices Out-of-the-box online algorithm ? xt p n D p k = αt n k 1Limited time budget ? Need to accomodate large p 235 h run time 1 full epoch 10 h run time 1 24 epoch Arthur Mensch Dictionary Learning for Massive Matrix Factorization 6 / 19
  • 8. Scaling-up in both directions X p n Batch → online xt n Steaming Handle large n 1 xt p n Streaming Mtxt n Streaming Subsampling Handle large p Online → double online 1 Online learning + partial random access to samples Arthur Mensch Dictionary Learning for Massive Matrix Factorization 7 / 19
  • 9. Scaling-up in both directions xt p n Streaming Mtxt n Streaming Subsampling Handle large p Online → double online 1 Low-distorsion lemma [Johnson and Lindenstrauss, 1984] Random linear alebra [Halko et al., 2009] Sketching for data reduction [Pilanci and Wainwright, 2014] Arthur Mensch Dictionary Learning for Massive Matrix Factorization 8 / 19
  • 10. Algorithm design Online dictionary learning [Mairal et al., 2010] 1 Compute code – O(p) αt = argmin α∈Rk xt − Dt−1α 2 2 + λΩ(αt) 2 Update surrogate – O(p) gt = 1 t t i=1 xi − Dαi 2 2 3 Minimize surrogate – O(p) Dt = argmin D∈C gt(D) = argmin D∈C Tr (D DAt − D Bt) xt access → O(p) algorithm (complexity dependency in p) Arthur Mensch Dictionary Learning for Massive Matrix Factorization 9 / 19
  • 11. Introducing subsampling Iteration cost in O(p): can we reduce it? xt → Mtxt, p → rk Mt = s Use only Mtxt in algorithm computation: complexity in O(s) Mtxt p n Streaming Subsampling 1Our contribution Adapt the 3 parts of the algorith to obtain O(s) complexity 1 Code computation 2 Surrogate update 3 Surrogate minimization [Szab´o et al., 2011]: dictionary learning with missing value – O(p) Arthur Mensch Dictionary Learning for Massive Matrix Factorization 10 / 19
  • 12. 1. Code computation Linear regression with random sampling αt = argmin α∈Rk Mt(xt − Dt−1αt) 2 2 + λ rk Mt p Ω(α) approximative solution of αt = argmin α∈Rk xt − Dt−1αt 2 2 + λΩ(α) validity in high dimension, with incoherent features: D MtD ≈ s p D D D Mtxt ≈ s p D xt Arthur Mensch Dictionary Learning for Massive Matrix Factorization 11 / 19
  • 13. 2. Surrogate update Original algorithm: At and Bt used in dictionary update At = 1 t t i=1 αi αi same as in online algorithm Bt = (1 − 1 t )Bt−1 + 1 t xtαt = 1 t t i=1 xi αi Forbidden Partial update of Bt at each iteration Bt = 1 t i=1 Mi t i=1 Mi xi αi Only MtB is updated Behaves like Ex[xα] for large t Arthur Mensch Dictionary Learning for Massive Matrix Factorization 12 / 19
  • 14. 3. Surrogate minimization Original algorithm : block coordinate descent with projection on C min D∈C gt(D) Dj ← p⊥ Cj (Dj − 1 Aj,j (D(At)j − (Bt)j )) Forbidden update of full D at iteration t Cautious update Leave dictionary unchanged for unseen features (I − Mt) min D∈C (I−Mt )D=(I−Mt )Dt−1 gt(D) O(s) update in block coordinate descent Dj ← p⊥ Cr j (Dj − 1 (At)j,j (Mt(D(At)j − (Bt)j ))) 1 ball Cr j = {D ∈ C, MtD 1 ≤ MtDt−1 1} Arthur Mensch Dictionary Learning for Massive Matrix Factorization 13 / 19
  • 15. Resting-state fMRI HCP dataset One brain image per second 200 sessions n = 2 · 106 p = 2 · 105 D AX Voxels Time = k spatial maps Time x Sparse decomposition k = 70 C = Bk 1 Ω = · 2 2 Validation Increase reduction factor p s Objective function on test set vs CPU time Arthur Mensch Dictionary Learning for Massive Matrix Factorization 14 / 19
  • 16. Resting-state fMRI Online dictionary learning 235 h run time 1 full epoch 10 h run time 1 24 epoch Proposed method 10 h run time 1 2 epoch, reduction r=12 Qualitatively, usable maps are obtained 10× faster Arthur Mensch Dictionary Learning for Massive Matrix Factorization 15 / 19
  • 17. Resting-state fMRI .1 h 1 h 10 h 100 h CPU time 2.20 2.25 2.30 2.35 2.40 Objectivevalueontestset ×108 Original online algorithm Reduction factor r 4 8 12 No reduction Speed-up close to reduction factor p s Arthur Mensch Dictionary Learning for Massive Matrix Factorization 16 / 19
  • 18. Resting-state fMRI 100 1000 Epoch 4000Records 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23 2.24 Objectivevalueontestset ×108 λ = 10−4 No reduction (original alg.) r = 4 r = 8 r = 12 Convergence speed / number of seen records Information is acquired faster Arthur Mensch Dictionary Learning for Massive Matrix Factorization 17 / 19
  • 19. Collaborative filtering Mtxt movie ratings from user t vs. coordinate descent for MMMF loss (no hyperparameters) 100 s 1000 s 0.93 0.94 0.95 0.96 0.97 0.98 0.99 Netflix (140M) Coordinate descent Proposed (full projection) Proposed (partial projection) Dataset Test RMSE Speed CD MODL -up ML 1M 0.872 0.866 ×0.75 ML 10M 0.802 0.799 ×3.7 NF (140M) 0.938 0.934 ×6.8 Outperform coordinate descent beyond 10M ratings Same prediction performance Speed-up 6.8× on Netflix Arthur Mensch Dictionary Learning for Massive Matrix Factorization 18 / 19
  • 20. Conclusion Take-home message Loading stochastic subsets of sample streams can drastically accelerates online matrix factorization Mtxt p n Streaming Subsampling 1 Reduce CPU (+IO) load at each iteration cf Gradient Descent vs SGD An order of magnitude speed-up on two different problems Python package http://github.com/arthurmensch/modl Heuristic at contribution time A follow-up algorithm has convergence guarantees Questions ? (Poster # 41 this afternoon) Arthur Mensch Dictionary Learning for Massive Matrix Factorization 19 / 19
  • 21. Bibliography I [Halko et al., 2009] Halko, N., Martinsson, P.-G., and Tropp, J. A. (2009). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. arXiv:0909.4061 [math]. [Johnson and Lindenstrauss, 1984] Johnson, W. B. and Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space. Contemporary mathematics, 26(189-206):1. [Mairal et al., 2010] Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. The Journal of Machine Learning Research, 11:19–60. [Olshausen and Field, 1997] Olshausen, B. A. and Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37(23):3311–3325. Arthur Mensch Dictionary Learning for Massive Matrix Factorization 20 / 19
  • 22. Bibliography II [Pilanci and Wainwright, 2014] Pilanci, M. and Wainwright, M. J. (2014). Iterative Hessian sketch: Fast and accurate solution approximation for constrained least-squares. arXiv:1411.0347 [cs, math, stat]. [Szab´o et al., 2011] Szab´o, Z., P´oczos, B., and Lorincz, A. (2011). Online group-structured dictionary learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2865–2872. IEEE. [Yu et al., 2012] Yu, H.-F., Hsieh, C.-J., and Dhillon, I. (2012). Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In Proceedings of the International Conference on Data Mining, pages 765–774. IEEE. Arthur Mensch Dictionary Learning for Massive Matrix Factorization 21 / 19
  • 24. Collaborative filtering Streaming uncomplete data Mt is imposed by user t Data stream : Mtxt movies ranked by user t Proposed by [Szab´o et al., 2011]), with O(p) complexity Validation: Test RMSE (rating prediction) vs CPU time Baseline: Coordinate descent solver [Yu et al., 2012] solving related loss n i=1 ( Mt(Xt − Dαt) 2 2 + λ αt 2 2) + λ D 2 2 Fastest solver available apart from SGD – no hyperparameters Our method is not sensitive to hyperparameters Arthur Mensch Dictionary Learning for Massive Matrix Factorization 22 / 19
  • 25. Algorithm Our algorithm 1 Code computation αt = argmin α∈Rk Mt (xt − Dt−1α) 2 2 + λ rk Mt p Ω(αt ) 2 Surrogate aggregation At = 1 t t i=1 αi αi Bt = Bt−1 + 1 t i=1 Mi (Mt xt αt − Mt Bt−1) 3 Surrogate minimization Mt Dj ← p⊥ Cj (Mt Dj − 1 (At )j,j Mt (D(At )j −(Bt )j )) Original online MF 1 Code computation αt = argmin α∈Rk xt − Dt−1α 2 2 + λΩ(αt ) 2 Surrogate aggregation At = 1 t t i=1 αi αi Bt = Bt−1 + 1 t (xt αt − Bt−1) 3 Surrogate minimization Dj ← p⊥ Cr j (Dj − 1 (At )j,j (D(At )j −(Bt )j )) Arthur Mensch Dictionary Learning for Massive Matrix Factorization 23 / 19