Breaking the Nonsmooth Barrier

Breaking the Nonsmooth Barrier: A Scalable
Parallel Method for Composite Optimization
Fabian Pedregosa Rémi Leblond Simon Lacoste–Julien

Motivation
 Since 2005, the speed of
processors has stagnated.
 The number of cores has
increased.
 Development of parallel
asynchronous variants of
stochastic gradient algorithms
1/6

Motivation
 Since 2005, the speed of
processors has stagnated.
 The number of cores has
increased.
 Development of parallel
asynchronous variants of
stochastic gradient algorithms
SGD → Hogwild (Niu et al. 2011).
SVRG → Kromagnon (Reddi et al. 2015; Mania et al. 2017).
SAGA → ASAGA (Leblond, Pedregosa, and Lacoste-Julien 2017).
1/6

Composite objective
 These methods assume objective function is smooth
Cannot be applied to Lasso, Group Lasso, box constraints, etc.
2/6

Composite objective
 These methods assume objective function is smooth
Cannot be applied to Lasso, Group Lasso, box constraints, etc.
Objective: minimize composite objective function:
minimize
x
f(x) + h(x) , with f(x) = 1
n
∑n
i=1 fi(x)
where fi is smooth and h is a block-separable (i.e., h(x) =
∑
B h([x]B))
convex function for which we have access to its proximal operator.
2/6

Sparse Proximal SAGA
Contribution 1: Sparse Proximal SAGA. Variant of SAGA (Defazio, Bach,
and Lacoste-Julien 2014), particularly efﬁcient when ∇fi are sparse.
3/6

Like SAGA, it relies on unbiased gradient estimate
vi=∇fi(x) − αi + Diα ;
3/6

Like SAGA, it relies on unbiased gradient estimate and proximal step
vi=∇fi(x) − αi + Diα ; x+
= proxγφi
(x − γvi) ; α+
i = ∇fi(x)
3/6

= proxγφi
(x − γvi) ; α+
i = ∇fi(x)
Unlike SAGA, Di and φi are designed to give sparse updates while
verifying unbiasedness conditions.
3/6

= proxγφi
(x − γvi) ; α+
i = ∇fi(x)
Unlike SAGA, Di and φi are designed to give sparse updates while
verifying unbiasedness conditions.
Convergence: same linear convergence rate as SAGA, with cheaper
updates in presence of sparsity.
3/6

Proximal Asynchronous SAGA (ProxASAGA)
Contribution 2: Proximal Asynchronous SAGA (ProxASAGA). Each core
runs Sparse Proximal SAGA asynchronously without locks and
updates x, α and α in shared memory.
 All read/write operations to shared memory are inconsistent, i.e.,
no performance destroying vector-level locks while reading/writing.
Convergence: under sparsity assumptions, ProxASAGA converges
with the same rate as the sequential algorithm =⇒ theoretical
linear speedup with respect to the number of cores.
4/6

Empirical results
ProxASAGA vs competing methods on 3 large-scale datasets,
ℓ1-regularized logistic regression
Dataset n p density L ∆
KDD 2010 19,264,097 1,163,024 10−6
28.12 0.15
KDD 2012 149,639,105 54,686,452 2 × 10−7
1.25 0.85
Criteo 45,840,617 1,000,000 4 × 10−5
1.25 0.89
0 20 40 60 80 100
Time (in minutes)
10 12
10 9
10 6
10 3
100
Objectiveminusoptimum
KDD10 dataset
0 10 20 30 40
Time (in minutes)
10 12
10 9
10 6
10 3
KDD12 dataset
0 10 20 30 40
Time (in minutes)
10 12
10 9
10 6
10 3
100 Criteo dataset
ProxASAGA (1 core)
ProxASAGA (10 cores)
AsySPCD (1 core)
AsySPCD (10 cores)
FISTA (1 core)
FISTA (10 cores)
5/6

Empirical results - Speedup
Speedup =
Time to 10−10
suboptimality on one core
Time to same suboptimality on k cores
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20
Timespeedup
KDD10 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 KDD12 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 Criteo dataset
Ideal ProxASAGA AsySPCD FISTA
6/6

Speedup =
Time to 10−10
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20
Timespeedup
KDD10 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 KDD12 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 Criteo dataset
• ProxASAGA achieves speedups between 6x and 12x on a 20 cores
architecture.
6/6

Speedup =
Time to 10−10
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20
Timespeedup
KDD10 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 KDD12 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 Criteo dataset
architecture.
• As predicted by theory, there is a high correlation between
degree of sparsity and speedup.
6/6

Speedup =
Time to 10−10
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20
Timespeedup
KDD10 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 KDD12 dataset
2 4 6 8 10 12 14 16 18 20
Number of cores
2
4
6
8
10
12
14
16
18
20 Criteo dataset
architecture.
• As predicted by theory, there is a high correlation between
degree of sparsity and speedup.
Thanks for your attention, see you at poster #159. 6/6

References
Defazio, Aaron, Francis Bach, and Simon Lacoste-Julien (2014). “SAGA: A fast incremental gradient
method with support for non-strongly convex composite objectives”. In: Advances in Neural
Information Processing Systems.
Leblond, Rémi, Fabian Pedregosa, and Simon Lacoste-Julien (2017). “ASAGA: asynchronous parallel
SAGA”. In: Proceedings of the 20th International Conference on Artiﬁcial Intelligence and
Statistics (AISTATS 2017).
Mania, Horia et al. (2017). “Perturbed iterate analysis for asynchronous stochastic optimization”. In:
SIAM Journal on Optimization.
Niu, Feng et al. (2011). “Hogwild: A lock-free approach to parallelizing stochastic gradient descent”.
In: Advances in Neural Information Processing Systems.
Pedregosa, Fabian, Rémi Leblond, and Simon Lacoste-Julien (2017). “Breaking the Nonsmooth
Barrier: A Scalable Parallel Method for Composite Optimization”. In: Advances in Neural
Information Processing Systems 30.
Reddi, Sashank J et al. (2015). “On variance reduction in stochastic gradient descent and its
asynchronous variants”. In: Advances in Neural Information Processing Systems.
6/6

Breaking the Nonsmooth Barrier

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Breaking the Nonsmooth Barrier

Similar to Breaking the Nonsmooth Barrier (20)

More from Fabian Pedregosa

More from Fabian Pedregosa (9)

Recently uploaded

Recently uploaded (20)

Breaking the Nonsmooth Barrier