SlideShare a Scribd company logo
Breaking the Nonsmooth Barrier: A Scalable
Parallel Method for Composite Optimization
Fabian Pedregosa Rémi Leblond Simon Lacoste–Julien
Motivation
 Since 2005, the speed of
processors has stagnated.
 The number of cores has
increased.
 Development of parallel
asynchronous variants of
stochastic gradient algorithms
1/6
Motivation
 Since 2005, the speed of
processors has stagnated.
 The number of cores has
increased.
 Development of parallel
asynchronous variants of
stochastic gradient algorithms
SGD → Hogwild (Niu et al. 2011).
SVRG → Kromagnon (Reddi et al. 2015; Mania et al. 2017).
SAGA → ASAGA (Leblond, Pedregosa, and Lacoste-Julien 2017).
1/6
Composite objective
 These methods assume objective function is smooth
Cannot be applied to Lasso, Group Lasso, box constraints, etc.
2/6
Composite objective
 These methods assume objective function is smooth
Cannot be applied to Lasso, Group Lasso, box constraints, etc.
Objective: minimize composite objective function:
minimize
x
f(x) + h(x) , with f(x) = 1
n
∑n
i=1 fi(x)
where fi is smooth and h is a block-separable (i.e., h(x) =
∑
B h([x]B))
convex function for which we have access to its proximal operator.
2/6
Sparse Proximal SAGA
Contribution 1: Sparse Proximal SAGA. Variant of SAGA (Defazio, Bach,
and Lacoste-Julien 2014), particularly efficient when ∇fi are sparse.
3/6
Sparse Proximal SAGA
Contribution 1: Sparse Proximal SAGA. Variant of SAGA (Defazio, Bach,
and Lacoste-Julien 2014), particularly efficient when ∇fi are sparse.
Like SAGA, it relies on unbiased gradient estimate
vi=∇fi(x) − αi + Diα ;
3/6
Sparse Proximal SAGA
Contribution 1: Sparse Proximal SAGA. Variant of SAGA (Defazio, Bach,
and Lacoste-Julien 2014), particularly efficient when ∇fi are sparse.
Like SAGA, it relies on unbiased gradient estimate and proximal step
vi=∇fi(x) − αi + Diα ; x+
= proxγφi
(x − γvi) ; α+
i = ∇fi(x)
3/6
Sparse Proximal SAGA
Contribution 1: Sparse Proximal SAGA. Variant of SAGA (Defazio, Bach,
and Lacoste-Julien 2014), particularly efficient when ∇fi are sparse.
Like SAGA, it relies on unbiased gradient estimate and proximal step
vi=∇fi(x) − αi + Diα ; x+
= proxγφi
(x − γvi) ; α+
i = ∇fi(x)
Unlike SAGA, Di and φi are designed to give sparse updates while
verifying unbiasedness conditions.
3/6
Sparse Proximal SAGA
Contribution 1: Sparse Proximal SAGA. Variant of SAGA (Defazio, Bach,
and Lacoste-Julien 2014), particularly efficient when ∇fi are sparse.
Like SAGA, it relies on unbiased gradient estimate and proximal step
vi=∇fi(x) − αi + Diα ; x+
= proxγφi
(x − γvi) ; α+
i = ∇fi(x)
Unlike SAGA, Di and φi are designed to give sparse updates while
verifying unbiasedness conditions.
Convergence: same linear convergence rate as SAGA, with cheaper
updates in presence of sparsity.
3/6
Proximal Asynchronous SAGA (ProxASAGA)
Contribution 2: Proximal Asynchronous SAGA (ProxASAGA). Each core
runs Sparse Proximal SAGA asynchronously without locks and
updates x, α and α in shared memory.
 All read/write operations to shared memory are inconsistent, i.e.,
no performance destroying vector-level locks while reading/writing.
Convergence: under sparsity assumptions, ProxASAGA converges
with the same rate as the sequential algorithm =⇒ theoretical
linear speedup with respect to the number of cores.
4/6
Empirical results
ProxASAGA vs competing methods on 3 large-scale datasets,
ℓ1-regularized logistic regression
Dataset n p density L ∆
KDD 2010 19,264,097 1,163,024 10−6
28.12 0.15
KDD 2012 149,639,105 54,686,452 2 × 10−7
1.25 0.85
Criteo 45,840,617 1,000,000 4 × 10−5
1.25 0.89
0 20 40 60 80 100
Time (in minutes)
10 12
10 9
10 6
10 3
100
Objectiveminusoptimum
KDD10 dataset
0 10 20 30 40
Time (in minutes)
10 12
10 9
10 6
10 3
KDD12 dataset
0 10 20 30 40
Time (in minutes)
10 12
10 9
10 6
10 3
100 Criteo dataset
ProxASAGA (1 core)
ProxASAGA (10 cores)
AsySPCD (1 core)
AsySPCD (10 cores)
FISTA (1 core)
FISTA (10 cores)
5/6
Empirical results - Speedup
Speedup =
Time to 10−10
suboptimality on one core
Time to same suboptimality on k cores
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20
Timespeedup
KDD10 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 KDD12 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 Criteo dataset
Ideal ProxASAGA AsySPCD FISTA
6/6
Empirical results - Speedup
Speedup =
Time to 10−10
suboptimality on one core
Time to same suboptimality on k cores
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20
Timespeedup
KDD10 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 KDD12 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 Criteo dataset
Ideal ProxASAGA AsySPCD FISTA
• ProxASAGA achieves speedups between 6x and 12x on a 20 cores
architecture.
6/6
Empirical results - Speedup
Speedup =
Time to 10−10
suboptimality on one core
Time to same suboptimality on k cores
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20
Timespeedup
KDD10 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 KDD12 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 Criteo dataset
Ideal ProxASAGA AsySPCD FISTA
• ProxASAGA achieves speedups between 6x and 12x on a 20 cores
architecture.
• As predicted by theory, there is a high correlation between
degree of sparsity and speedup.
6/6
Empirical results - Speedup
Speedup =
Time to 10−10
suboptimality on one core
Time to same suboptimality on k cores
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20
Timespeedup
KDD10 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 KDD12 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 Criteo dataset
Ideal ProxASAGA AsySPCD FISTA
• ProxASAGA achieves speedups between 6x and 12x on a 20 cores
architecture.
• As predicted by theory, there is a high correlation between
degree of sparsity and speedup.
Thanks for your attention, see you at poster #159. 6/6
References
Defazio, Aaron, Francis Bach, and Simon Lacoste-Julien (2014). “SAGA: A fast incremental gradient
method with support for non-strongly convex composite objectives”. In: Advances in Neural
Information Processing Systems.
Leblond, Rémi, Fabian Pedregosa, and Simon Lacoste-Julien (2017). “ASAGA: asynchronous parallel
SAGA”. In: Proceedings of the 20th International Conference on Artificial Intelligence and
Statistics (AISTATS 2017).
Mania, Horia et al. (2017). “Perturbed iterate analysis for asynchronous stochastic optimization”. In:
SIAM Journal on Optimization.
Niu, Feng et al. (2011). “Hogwild: A lock-free approach to parallelizing stochastic gradient descent”.
In: Advances in Neural Information Processing Systems.
Pedregosa, Fabian, Rémi Leblond, and Simon Lacoste-Julien (2017). “Breaking the Nonsmooth
Barrier: A Scalable Parallel Method for Composite Optimization”. In: Advances in Neural
Information Processing Systems 30.
Reddi, Sashank J et al. (2015). “On variance reduction in stochastic gradient descent and its
asynchronous variants”. In: Advances in Neural Information Processing Systems.
6/6

More Related Content

What's hot

Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
Yoonho Lee
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
Albert Bifet
 
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
Nesreen K. Ahmed
 
Sampling from Massive Graph Streams: A Unifying Framework
Sampling from Massive Graph Streams: A Unifying FrameworkSampling from Massive Graph Streams: A Unifying Framework
Sampling from Massive Graph Streams: A Unifying Framework
Nesreen K. Ahmed
 
VRP2013 - Comp Aspects VRP
VRP2013 - Comp Aspects VRPVRP2013 - Comp Aspects VRP
VRP2013 - Comp Aspects VRP
Victor Pillac
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
Nesreen K. Ahmed
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
Albert Bifet
 
On Sampling from Massive Graph Streams
On Sampling from Massive Graph StreamsOn Sampling from Massive Graph Streams
On Sampling from Massive Graph Streams
Nesreen K. Ahmed
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O
Sri Ambati
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
Ha Phuong
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
Ansgar Scherp
 
Effective management of high volume numeric data with histograms
Effective management of high volume numeric data with histogramsEffective management of high volume numeric data with histograms
Effective management of high volume numeric data with histograms
Fred Moyer
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Albert Bifet
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306
Yasuo Tabei
 
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsTraffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Guillaume Costeseque
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
Albert Bifet
 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
T. E. BOGALE
 
Target tracking suing multiple auxiliary particle filtering
Target tracking suing multiple auxiliary particle filteringTarget tracking suing multiple auxiliary particle filtering
Target tracking suing multiple auxiliary particle filtering
Luis Úbeda Medina
 
Encoding survey
Encoding surveyEncoding survey
Encoding survey
Rajeev Raman
 

What's hot (19)

Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
 
Sampling from Massive Graph Streams: A Unifying Framework
Sampling from Massive Graph Streams: A Unifying FrameworkSampling from Massive Graph Streams: A Unifying Framework
Sampling from Massive Graph Streams: A Unifying Framework
 
VRP2013 - Comp Aspects VRP
VRP2013 - Comp Aspects VRPVRP2013 - Comp Aspects VRP
VRP2013 - Comp Aspects VRP
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
 
On Sampling from Massive Graph Streams
On Sampling from Massive Graph StreamsOn Sampling from Massive Graph Streams
On Sampling from Massive Graph Streams
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
 
Effective management of high volume numeric data with histograms
Effective management of high volume numeric data with histogramsEffective management of high volume numeric data with histograms
Effective management of high volume numeric data with histograms
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive Windows
 
CPM2013-tabei201306
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306
 
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsTraffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equations
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
 
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
 
Target tracking suing multiple auxiliary particle filtering
Target tracking suing multiple auxiliary particle filteringTarget tracking suing multiple auxiliary particle filtering
Target tracking suing multiple auxiliary particle filtering
 
Encoding survey
Encoding surveyEncoding survey
Encoding survey
 

Similar to Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization

Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Fabian Pedregosa
 
Regression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network ApproachRegression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network Approach
Khulna University
 
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
MLconf
 
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
The Statistical and Applied Mathematical Sciences Institute
 
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Saliya Ekanayake
 
Towards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingTowards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and Benchmarking
Saliya Ekanayake
 
Defense
DefenseDefense
Defense
Luca Foschini
 
Gossip & Key Value Store
Gossip & Key Value StoreGossip & Key Value Store
Gossip & Key Value Store
Sajeev P
 
CSC2983
CSC2983CSC2983
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
Dmytro Mishkin
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clustering
Ted Dunning
 
End of Sprint 5
End of Sprint 5End of Sprint 5
End of Sprint 5
dm_work
 
EOS5 Demo
EOS5 DemoEOS5 Demo
EOS5 Demo
dm_work
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
Radek Maciaszek
 
Slider: an Efficient Incremental Reasoner, by Jules Chevalier
Slider: an Efficient Incremental Reasoner, by Jules ChevalierSlider: an Efficient Incremental Reasoner, by Jules Chevalier
Slider: an Efficient Incremental Reasoner, by Jules Chevalier
opencloudware
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
johnrjenson
 
MLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf NYC Shan Shan Huang
MLconf NYC Shan Shan Huang
MLconf
 
Streaming solutions for real time problems
Streaming solutions for real time problems Streaming solutions for real time problems
Streaming solutions for real time problems
Aparna Gaonkar
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
Ben Mabey
 
Storm users group real time hadoop
Storm users group real time hadoopStorm users group real time hadoop
Storm users group real time hadoop
Ted Dunning
 

Similar to Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization (20)

Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 
Regression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network ApproachRegression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network Approach
 
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
Tamara G. Kolda, Distinguished Member of Technical Staff, Sandia National Lab...
 
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
 
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
 
Towards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and BenchmarkingTowards a Systematic Study of Big Data Performance and Benchmarking
Towards a Systematic Study of Big Data Performance and Benchmarking
 
Defense
DefenseDefense
Defense
 
Gossip & Key Value Store
Gossip & Key Value StoreGossip & Key Value Store
Gossip & Key Value Store
 
CSC2983
CSC2983CSC2983
CSC2983
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clustering
 
End of Sprint 5
End of Sprint 5End of Sprint 5
End of Sprint 5
 
EOS5 Demo
EOS5 DemoEOS5 Demo
EOS5 Demo
 
Extending lifespan with Hadoop and R
Extending lifespan with Hadoop and RExtending lifespan with Hadoop and R
Extending lifespan with Hadoop and R
 
Slider: an Efficient Incremental Reasoner, by Jules Chevalier
Slider: an Efficient Incremental Reasoner, by Jules ChevalierSlider: an Efficient Incremental Reasoner, by Jules Chevalier
Slider: an Efficient Incremental Reasoner, by Jules Chevalier
 
Neo, Titan & Cassandra
Neo, Titan & CassandraNeo, Titan & Cassandra
Neo, Titan & Cassandra
 
MLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf NYC Shan Shan Huang
MLconf NYC Shan Shan Huang
 
Streaming solutions for real time problems
Streaming solutions for real time problems Streaming solutions for real time problems
Streaming solutions for real time problems
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Storm users group real time hadoop
Storm users group real time hadoopStorm users group real time hadoop
Storm users group real time hadoop
 

More from Fabian Pedregosa

Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
Fabian Pedregosa
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
Fabian Pedregosa
 
Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2
Fabian Pedregosa
 
Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1
Fabian Pedregosa
 
Average case acceleration through spectral density estimation
Average case acceleration through spectral density estimationAverage case acceleration through spectral density estimation
Average case acceleration through spectral density estimation
Fabian Pedregosa
 
Adaptive Three Operator Splitting
Adaptive Three Operator SplittingAdaptive Three Operator Splitting
Adaptive Three Operator Splitting
Fabian Pedregosa
 
Sufficient decrease is all you need
Sufficient decrease is all you needSufficient decrease is all you need
Sufficient decrease is all you need
Fabian Pedregosa
 
Lightning: large scale machine learning in python
Lightning: large scale machine learning in pythonLightning: large scale machine learning in python
Lightning: large scale machine learning in python
Fabian Pedregosa
 
Profiling in Python
Profiling in PythonProfiling in Python
Profiling in Python
Fabian Pedregosa
 

More from Fabian Pedregosa (9)

Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2
 
Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1
 
Average case acceleration through spectral density estimation
Average case acceleration through spectral density estimationAverage case acceleration through spectral density estimation
Average case acceleration through spectral density estimation
 
Adaptive Three Operator Splitting
Adaptive Three Operator SplittingAdaptive Three Operator Splitting
Adaptive Three Operator Splitting
 
Sufficient decrease is all you need
Sufficient decrease is all you needSufficient decrease is all you need
Sufficient decrease is all you need
 
Lightning: large scale machine learning in python
Lightning: large scale machine learning in pythonLightning: large scale machine learning in python
Lightning: large scale machine learning in python
 
Profiling in Python
Profiling in PythonProfiling in Python
Profiling in Python
 

Recently uploaded

Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 

Recently uploaded (20)

Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization

  • 1. Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization Fabian Pedregosa Rémi Leblond Simon Lacoste–Julien
  • 2. Motivation  Since 2005, the speed of processors has stagnated.  The number of cores has increased.  Development of parallel asynchronous variants of stochastic gradient algorithms 1/6
  • 3. Motivation  Since 2005, the speed of processors has stagnated.  The number of cores has increased.  Development of parallel asynchronous variants of stochastic gradient algorithms SGD → Hogwild (Niu et al. 2011). SVRG → Kromagnon (Reddi et al. 2015; Mania et al. 2017). SAGA → ASAGA (Leblond, Pedregosa, and Lacoste-Julien 2017). 1/6
  • 4. Composite objective  These methods assume objective function is smooth Cannot be applied to Lasso, Group Lasso, box constraints, etc. 2/6
  • 5. Composite objective  These methods assume objective function is smooth Cannot be applied to Lasso, Group Lasso, box constraints, etc. Objective: minimize composite objective function: minimize x f(x) + h(x) , with f(x) = 1 n ∑n i=1 fi(x) where fi is smooth and h is a block-separable (i.e., h(x) = ∑ B h([x]B)) convex function for which we have access to its proximal operator. 2/6
  • 6. Sparse Proximal SAGA Contribution 1: Sparse Proximal SAGA. Variant of SAGA (Defazio, Bach, and Lacoste-Julien 2014), particularly efficient when ∇fi are sparse. 3/6
  • 7. Sparse Proximal SAGA Contribution 1: Sparse Proximal SAGA. Variant of SAGA (Defazio, Bach, and Lacoste-Julien 2014), particularly efficient when ∇fi are sparse. Like SAGA, it relies on unbiased gradient estimate vi=∇fi(x) − αi + Diα ; 3/6
  • 8. Sparse Proximal SAGA Contribution 1: Sparse Proximal SAGA. Variant of SAGA (Defazio, Bach, and Lacoste-Julien 2014), particularly efficient when ∇fi are sparse. Like SAGA, it relies on unbiased gradient estimate and proximal step vi=∇fi(x) − αi + Diα ; x+ = proxγφi (x − γvi) ; α+ i = ∇fi(x) 3/6
  • 9. Sparse Proximal SAGA Contribution 1: Sparse Proximal SAGA. Variant of SAGA (Defazio, Bach, and Lacoste-Julien 2014), particularly efficient when ∇fi are sparse. Like SAGA, it relies on unbiased gradient estimate and proximal step vi=∇fi(x) − αi + Diα ; x+ = proxγφi (x − γvi) ; α+ i = ∇fi(x) Unlike SAGA, Di and φi are designed to give sparse updates while verifying unbiasedness conditions. 3/6
  • 10. Sparse Proximal SAGA Contribution 1: Sparse Proximal SAGA. Variant of SAGA (Defazio, Bach, and Lacoste-Julien 2014), particularly efficient when ∇fi are sparse. Like SAGA, it relies on unbiased gradient estimate and proximal step vi=∇fi(x) − αi + Diα ; x+ = proxγφi (x − γvi) ; α+ i = ∇fi(x) Unlike SAGA, Di and φi are designed to give sparse updates while verifying unbiasedness conditions. Convergence: same linear convergence rate as SAGA, with cheaper updates in presence of sparsity. 3/6
  • 11. Proximal Asynchronous SAGA (ProxASAGA) Contribution 2: Proximal Asynchronous SAGA (ProxASAGA). Each core runs Sparse Proximal SAGA asynchronously without locks and updates x, α and α in shared memory.  All read/write operations to shared memory are inconsistent, i.e., no performance destroying vector-level locks while reading/writing. Convergence: under sparsity assumptions, ProxASAGA converges with the same rate as the sequential algorithm =⇒ theoretical linear speedup with respect to the number of cores. 4/6
  • 12. Empirical results ProxASAGA vs competing methods on 3 large-scale datasets, ℓ1-regularized logistic regression Dataset n p density L ∆ KDD 2010 19,264,097 1,163,024 10−6 28.12 0.15 KDD 2012 149,639,105 54,686,452 2 × 10−7 1.25 0.85 Criteo 45,840,617 1,000,000 4 × 10−5 1.25 0.89 0 20 40 60 80 100 Time (in minutes) 10 12 10 9 10 6 10 3 100 Objectiveminusoptimum KDD10 dataset 0 10 20 30 40 Time (in minutes) 10 12 10 9 10 6 10 3 KDD12 dataset 0 10 20 30 40 Time (in minutes) 10 12 10 9 10 6 10 3 100 Criteo dataset ProxASAGA (1 core) ProxASAGA (10 cores) AsySPCD (1 core) AsySPCD (10 cores) FISTA (1 core) FISTA (10 cores) 5/6
  • 13. Empirical results - Speedup Speedup = Time to 10−10 suboptimality on one core Time to same suboptimality on k cores 2 4 6 8 10 12 14 16 18 20 Number of cores 2 4 6 8 10 12 14 16 18 20 Timespeedup KDD10 dataset 2 4 6 8 10 12 14 16 18 20 Number of cores 2 4 6 8 10 12 14 16 18 20 KDD12 dataset 2 4 6 8 10 12 14 16 18 20 Number of cores 2 4 6 8 10 12 14 16 18 20 Criteo dataset Ideal ProxASAGA AsySPCD FISTA 6/6
  • 14. Empirical results - Speedup Speedup = Time to 10−10 suboptimality on one core Time to same suboptimality on k cores 2 4 6 8 10 12 14 16 18 20 Number of cores 2 4 6 8 10 12 14 16 18 20 Timespeedup KDD10 dataset 2 4 6 8 10 12 14 16 18 20 Number of cores 2 4 6 8 10 12 14 16 18 20 KDD12 dataset 2 4 6 8 10 12 14 16 18 20 Number of cores 2 4 6 8 10 12 14 16 18 20 Criteo dataset Ideal ProxASAGA AsySPCD FISTA • ProxASAGA achieves speedups between 6x and 12x on a 20 cores architecture. 6/6
  • 15. Empirical results - Speedup Speedup = Time to 10−10 suboptimality on one core Time to same suboptimality on k cores 2 4 6 8 10 12 14 16 18 20 Number of cores 2 4 6 8 10 12 14 16 18 20 Timespeedup KDD10 dataset 2 4 6 8 10 12 14 16 18 20 Number of cores 2 4 6 8 10 12 14 16 18 20 KDD12 dataset 2 4 6 8 10 12 14 16 18 20 Number of cores 2 4 6 8 10 12 14 16 18 20 Criteo dataset Ideal ProxASAGA AsySPCD FISTA • ProxASAGA achieves speedups between 6x and 12x on a 20 cores architecture. • As predicted by theory, there is a high correlation between degree of sparsity and speedup. 6/6
  • 16. Empirical results - Speedup Speedup = Time to 10−10 suboptimality on one core Time to same suboptimality on k cores 2 4 6 8 10 12 14 16 18 20 Number of cores 2 4 6 8 10 12 14 16 18 20 Timespeedup KDD10 dataset 2 4 6 8 10 12 14 16 18 20 Number of cores 2 4 6 8 10 12 14 16 18 20 KDD12 dataset 2 4 6 8 10 12 14 16 18 20 Number of cores 2 4 6 8 10 12 14 16 18 20 Criteo dataset Ideal ProxASAGA AsySPCD FISTA • ProxASAGA achieves speedups between 6x and 12x on a 20 cores architecture. • As predicted by theory, there is a high correlation between degree of sparsity and speedup. Thanks for your attention, see you at poster #159. 6/6
  • 17. References Defazio, Aaron, Francis Bach, and Simon Lacoste-Julien (2014). “SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives”. In: Advances in Neural Information Processing Systems. Leblond, Rémi, Fabian Pedregosa, and Simon Lacoste-Julien (2017). “ASAGA: asynchronous parallel SAGA”. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017). Mania, Horia et al. (2017). “Perturbed iterate analysis for asynchronous stochastic optimization”. In: SIAM Journal on Optimization. Niu, Feng et al. (2011). “Hogwild: A lock-free approach to parallelizing stochastic gradient descent”. In: Advances in Neural Information Processing Systems. Pedregosa, Fabian, Rémi Leblond, and Simon Lacoste-Julien (2017). “Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization”. In: Advances in Neural Information Processing Systems 30. Reddi, Sashank J et al. (2015). “On variance reduction in stochastic gradient descent and its asynchronous variants”. In: Advances in Neural Information Processing Systems. 6/6