SlideShare a Scribd company logo
On the Convergence of Single-Call Stochastic Extra-Gradient Methods
Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos
NeurIPS, December 2019
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 0 / 22
Outline:
1 Variational Inequality
2 Extra-Gradient
3 Single-call Extra-Gradient [Main Focus]
4 Conclusion
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 1 / 22
Variational Inequality
Variational Inequality
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 1 / 22
Variational Inequality
Introduction: Variational Inequalities in Machine Learning
Generative adversarial network (GAN)
min
θ
max
φ
Ex∼pdata
[log(Dφ(x))] + Ez∼pZ
[log(1 − Dφ(Gθ(z)))].
More min-max (saddle point) problems: distributionally robust learning, primal-dual
formulation in optimization, . . .
Seach of equilibrium: games, multi-agent reinforcement learning, . . .
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 2 / 22
Variational Inequality
Definition
Stampacchia variational inequality
Find x⋆
∈ X such that ⟨V (x⋆
),x − x⋆
⟩ ≥ 0 for all x ∈ X. (SVI)
Minty variational inequality
Find x⋆
∈ X such that ⟨V (x),x − x⋆
⟩ ≥ 0 for all x ∈ X. (MVI)
With closed convex set X ⊆ Rd
and vector field V Rd
→ Rd
.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 3 / 22
Variational Inequality
Illustration
SVI: V (x⋆
) belongs to the dual cone DC(x⋆
) of X at x⋆
[ local ]
MVI: V (x) forms an acute angle with tangent vector x − x⋆
∈ TC(x⋆
) [ global ]
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 4 / 22
Variational Inequality
Example: Function Minimization
min
x
f(x)
subject to x ∈ X
f X → R differentiable function to minimize.
Let V = ∇f.
(SVI) ∀x ∈ X, ⟨∇f(x⋆
),x − x⋆
⟩ ≥ 0 [ first-order optimality ]
(MVI) ∀x ∈ X, ⟨∇f(x),x − x⋆
⟩ ≥ 0 [x⋆
is a minimizer of f ]
If f is convex, (SVI) and (MVI) are equivalent.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 5 / 22
Variational Inequality
Example: Saddle point Problem
Find x⋆
= (θ⋆
,φ⋆
) such that
L(θ⋆
,φ) ≤ L(θ⋆
,φ⋆
) ≤ L(θ,φ⋆
) for all θ ∈ Θ and all φ ∈ Φ.
X ≡ Θ × Φ and L X → R differentiable function.
Let V = (∇θL,−∇φL).
(SVI) ∀(θ,φ) ∈ X, ⟨∇θL(x⋆
),θ − θ⋆
⟩ − ⟨∇φL(x⋆
),φ − φ⋆
⟩ ≥ 0 [ stationary ]
(MVI) ∀(θ,φ) ∈ X, ⟨∇θL(x),θ − θ⋆
⟩ − ⟨∇φL(x),φ − φ⋆
⟩ ≥ 0 [ saddle point ]
If L is convex-concave, (SVI) and (MVI) are equivalent.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 6 / 22
Variational Inequality
Monoticity
The solutions of (SVI) and (MVI) coincide when V is continuous and monotone, i.e.,
⟨V (x′
) − V (x),x′
− x⟩ ≥ 0 for all x,x′
∈ Rd
.
In the above two examples, this corresponds to either f being convex or L being
convex-concave.
The operator analogue of strong convexity is strong monoticity
⟨V (x′
) − V (x),x′
− x⟩ ≥ α x′
− x 2
for some α > 0 and all x,x′
∈ Rd
.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 7 / 22
Extra-Gradient
Extra-Gradient
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 7 / 22
Extra-Gradient
From Forward-backward to Extra-Gradient
Forward-backward
Xt+1 = ΠX (Xt − γtV (Xt)) (FB)
Extra-Gradient [Korpelevich 1976]
Xt+ 1
2
= ΠX (Xt − γtV (Xt))
Xt+1 = ΠX (Xt − γtV (Xt+ 1
2
))
(EG)
The Extra-Gradient method anticipates the landscape of V by taking an extrapolation step
to reach the leading state Xt+ 1
2
.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 8 / 22
Extra-Gradient
From Forward-backward to Extra-Gradient
Forward-backward does not converge in bilinear games, while Extra-Gradient does.
min
θ∈R
max
φ∈R
θφ
Left:
Forward-backward
Right:
Extra-Gradient
4 2 0 2 4 6
6
4
2
0
2
4
6
0.5 0.0 0.5 1.0
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 9 / 22
Extra-Gradient
Stochastic Oracle
If a stochastic oracle is involved:
Xt+1
2
= ΠX (Xt − γt
ˆVt)
Xt+1 = ΠX (Xt − γt
ˆVt+1
2
)
With ˆVt = V (Xt) + Zt satisfying (and same for ˆVt+ 1
2
)
a) Zero-mean: E[Zt Ft] = 0.
b) Bounded variance: E[ Zt
2
Ft] ≤ σ2
.
(Ft)t∈N/2 is the natural filtration associated to the stochastic process (Xt)t∈N/2.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 10 / 22
Extra-Gradient
Convergence Metrics
Ergodic convergence: restricted error function
ErrR(ˆx) = max
x∈XR
⟨V (x), ˆx − x⟩,
where XR ≡ X ∩ BR(0) = {x ∈ X x ≤ R}.
Last iterate convergence: squared distance dist(ˆx,X⋆
)2
.
Lemma [Nesterov 2007]
Assume V is monotone. If x⋆
is a solution of (SVI), we have ErrR(x⋆
) = 0 for all
sufficiently large R. Conversely, if ErrR(ˆx) = 0 for large enough R > 0 and some ˆx ∈ XR,
then ˆx is a solution of (SVI).
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 11 / 22
Extra-Gradient
Literature Review
We further suppose that V is β-Lipschitz.
Convergence type Hypothesis
Korpelevich 1976 Last iterate asymptotic Pseudo monotone
Tseng 1995 Last iterate geometric
Monotone + error bound
(e.g., strongly monotone, affine)
Nemirovski 2004 Ergodic in O(1/t) Monotone
Juditsky et al. 2011 Ergodic in O(1/
√
t) Stochastic monotone
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 12 / 22
Extra-Gradient
In Deep Learning
Extra-Gradient (EG) needs two oracle calls per iteration, while gradient computations can
be very costly for deep models:
And if we drop one oracle call per iteration?
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 13 / 22
Single-call Extra-Gradient [Main Focus]
Single-call Extra-Gradient
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 13 / 22
Single-call Extra-Gradient [Main Focus]
Algorithms
1 Past Extra-Gradient [Popov 1980]
Xt+ 1
2
= ΠX (Xt − γt
ˆVt− 1
2
)
Xt+1 = ΠX (Xt − γt
ˆVt+ 1
2
)
(PEG)
2 Reflected Gradient [Malitsky 2015]
Xt+ 1
2
= Xt − (Xt−1 − Xt)
Xt+1 = ΠX (Xt − γt
ˆVt+ 1
2
)
(RG)
3 Optimistic Gradient [Daskalakis et al. 2018]
Xt+ 1
2
= ΠX (Xt − γt
ˆVt− 1
2
)
Xt+1 = Xt+1
2
+ γt
ˆVt−1
2
− γt
ˆVt+ 1
2
(OG)
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 14 / 22
Single-call Extra-Gradient [Main Focus]
A First Result
Proxy
PEG: [Step 1] ˆVt ← ˆVt− 1
2
RG: [Step 1] ˆVt ← (Xt−1 − Xt)/γt; no projection
OG: [Step 1] ˆVt ← ˆVt− 1
2
[Step 2] Xt ← Xt+ 1
2
+ γt
ˆVt− 1
2
; no projection
Proposition
Suppose that the Single-call Extra-Gradient (1-EG) methods presented above share the
same initialization, X0 = X1 ∈ X, ˆV1/2 = 0 and a same constant step-size (γt)t∈N ≡ γ. If
X = Rd
, the generated iterates Xt coincide for all t ≥ 1.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 15 / 22
Xt+1
2
= ΠX (Xt − γt
ˆVt)
Xt+1 = ΠX (Xt − γt
ˆVt+ 1
2
)
Single-call Extra-Gradient [Main Focus]
Global Convergence Rate
Always with Lipschitz continuity.
Stochastic strongly monotone: step size in O(1/t).
New results!
Monotone Strongly Monotone
Ergodic Last Iterate Ergodic Last Iterate
Deterministic 1/t Unknown 1/t e−ρt
Stochastic 1/
√
t Unknown 1/t 1/t
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 16 / 22
Single-call Extra-Gradient [Main Focus]
Proof Ingredients
Descent Lemma [Deterministic + Monotone]
There exists (µt)t∈N ∈ RN
+ such that for all p ∈ X,
Xt+1 − p 2
+ µt+1 ≤ Xt − p 2
− 2γ⟨V (Xt+1
2
),Xt+ 1
2
− p⟩ + µt.
Descent Lemma [Stochastic + Strongly Monotone]
Let x⋆
be the unique solution of (SVI). There exists (µt)t∈N ∈ RN
+ ,M ∈ R+ such that
E[ Xt+1 − x⋆ 2
] + µt+1 ≤ (1 − αγt)(E[ Xt − x⋆ 2
] + µt) + Mγ2
t σ2
.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 17 / 22
Single-call Extra-Gradient [Main Focus]
Regular Solution
Definition [Regular Solution]
We say that x⋆
is a regular solution of (SVI) if V is C1
-smooth in a neighborhood of x⋆
and the Jacobian JacV (x⋆
) is positive-definite along rays emanating from x⋆
, i.e.,
z⊺
JacV (x⋆
)z ≡
d
∑
i,j=1
zi
∂Vi
∂xj
(x⋆
)zj > 0 for all z ∈ Rd
∖{0} that are tangent to X at x⋆
.
To be compared with
positive definiteness of the Hessian along qualified constraints in minimization;
differential equilibrium in games.
Localization of strong monoticity.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 18 / 22
Single-call Extra-Gradient [Main Focus]
Local Convergence
Theorem [Local convergence for stochastic non-monotone operators]
Let x⋆
be a regular solution of (SVI) and fix a tolerance level δ > 0. Suppose (PEG) is run
with step-sizes of the form γt = γ/(t + b) for large enough γ and b. Then:
(a) There are neighborhoods U and U1 of x⋆
in X such that, if X1/2 ∈ U,X1 ∈ U1, the
event
E∞ = {Xt+ 1
2
∈ U for all t = 1,2,...}
occurs with probability at least 1 − δ.
(b) Conditioning on the above, we have:
E[ Xt − x⋆ 2
E∞] = O (
1
t
).
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 19 / 22
Single-call Extra-Gradient [Main Focus]
Experiments
L(θ,φ) = 2 1θ⊺
A1θ + 2(θ⊺
A2θ)
2
− 2 1φ⊺
B1φ − 2(φ⊺
B2φ)
2
+ 4θ⊺
Cφ
0 20 40 60 80 100
# Oracle Calls
10
−16
10
−13
10
−10
10
−7
10
−4
10
−1
∥x−x*∥2
EG
1-EG
γ = 0.1
γ = 0.2
γ = 0.3
γ = 0.4
(a)
Strongly monotone
( 1 = 1, 2 = 0)
Deterministic
Last iterate
10
0
10
1
10
2
10
3
# Oracle Calls
10
−3
10
−2
10
−1
10
0
∥x−x*∥2
EG
1-EG
γ = 0.2
γ = 0.4
γ = 0.6
γ = 0.8
(b)
Monotone ( 1 = 0, 2 = 1)
Deterministic
Ergodic
10
0
10
1
10
2
10
3
10
4
# Oracle Calls
10
−3
10
−2
10
−1
∥x−x*∥2
EG
1-EG
γ = 0.3
γ = 0.6
γ = 0.9
γ = 1.2
(c)
Non monotone ( 1 = 1, 2 = −1)
Stochastic Zt
iid
∼ N (0, σ
2
= .01)
Last iterate (b = 15)
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 20 / 22
Conclusion
Conclusion and Perspectives
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 20 / 22
Conclusion
Conclusion
Single-call rates ∼ Two-call rates.
Localization of stochastic guarantee.
Last iterate convergence: a first step to the non-monotone world.
Some research directions: Bregman, universal, . . .
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 21 / 22
Conclusion
Bibliography
Daskalakis, Constantinos et al. (2018). “Training GANs with optimism”. In: ICLR ’18: Proceedings of the
2018 International Conference on Learning Representations.
Juditsky, Anatoli, Arkadi Semen Nemirovski, and Claire Tauvel (2011). “Solving variational inequalities with
stochastic mirror-prox algorithm”. In: Stochastic Systems 1.1, pp. 17–58.
Korpelevich, G. M. (1976). “The extragradient method for finding saddle points and other problems”. In:
Èkonom. i Mat. Metody 12, pp. 747–756.
Malitsky, Yura (2015). “Projected reflected gradient methods for monotone variational inequalities”. In:
SIAM Journal on Optimization 25.1, pp. 502–520.
Nemirovski, Arkadi Semen (2004). “Prox-method with rate of convergence O(1/t) for variational inequalities
with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems”. In:
SIAM Journal on Optimization 15.1, pp. 229–251.
Nesterov, Yurii (2007). “Dual extrapolation and its applications to solving variational inequalities and related
problems”. In: Mathematical Programming 109.2, pp. 319–344.
Popov, Leonid Denisovich (1980). “A modification of the Arrow–Hurwicz method for search of saddle
points”. In: Mathematical Notes of the Academy of Sciences of the USSR 28.5, pp. 845–848.
Tseng, Paul (June 1995). “On linear convergence of iterative methods for the variational inequality problem”.
In: Journal of Computational and Applied Mathematics 60.1-2, pp. 237–252.
Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 22 / 22

More Related Content

What's hot

k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
Frank Nielsen
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
Christian Robert
 
A Note on Pseudo Operations Decomposable Measure
A Note on Pseudo Operations Decomposable MeasureA Note on Pseudo Operations Decomposable Measure
A Note on Pseudo Operations Decomposable Measure
inventionjournals
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
Christian Robert
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Continuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDContinuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGD
Valentin De Bortoli
 
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
The Statistical and Applied Mathematical Sciences Institute
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
Christian Robert
 
Ml mle_bayes
Ml  mle_bayesMl  mle_bayes
Ml mle_bayesPhong Vo
 
Intro probability 4
Intro probability 4Intro probability 4
Intro probability 4Phong Vo
 
Logit stick-breaking priors for partially exchangeable count data
Logit stick-breaking priors for partially exchangeable count dataLogit stick-breaking priors for partially exchangeable count data
Logit stick-breaking priors for partially exchangeable count data
Tommaso Rigon
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Ono Shigeru
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Ono Shigeru
 
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
The Statistical and Applied Mathematical Sciences Institute
 

What's hot (14)

k-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture modelsk-MLE: A fast algorithm for learning statistical mixture models
k-MLE: A fast algorithm for learning statistical mixture models
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
A Note on Pseudo Operations Decomposable Measure
A Note on Pseudo Operations Decomposable MeasureA Note on Pseudo Operations Decomposable Measure
A Note on Pseudo Operations Decomposable Measure
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Continuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGDContinuous and Discrete-Time Analysis of SGD
Continuous and Discrete-Time Analysis of SGD
 
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Ml mle_bayes
Ml  mle_bayesMl  mle_bayes
Ml mle_bayes
 
Intro probability 4
Intro probability 4Intro probability 4
Intro probability 4
 
Logit stick-breaking priors for partially exchangeable count data
Logit stick-breaking priors for partially exchangeable count dataLogit stick-breaking priors for partially exchangeable count data
Logit stick-breaking priors for partially exchangeable count data
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 6
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
 
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
 

Similar to On the Convergence of Single-Call Stochastic Extra-Gradient Methods

Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
Christian Robert
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesGilles Louppe
 
A brief introduction to Gaussian process
A brief introduction to Gaussian processA brief introduction to Gaussian process
A brief introduction to Gaussian process
Eric Xihui Lin
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
Yoonho Lee
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
Christian Robert
 
Computation of the marginal likelihood
Computation of the marginal likelihoodComputation of the marginal likelihood
Computation of the marginal likelihood
BigMC
 
Computational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial ModelingComputational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial Modeling
Victor Zhorin
 
Comparison of the optimal design
Comparison of the optimal designComparison of the optimal design
Comparison of the optimal design
Alexander Decker
 
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
The Statistical and Applied Mathematical Sciences Institute
 
A Note on BPTT for LSTM LM
A Note on BPTT for LSTM LMA Note on BPTT for LSTM LM
A Note on BPTT for LSTM LM
Tomonari Masada
 
A Unified Perspective for Darmon Points
A Unified Perspective for Darmon PointsA Unified Perspective for Darmon Points
A Unified Perspective for Darmon Points
mmasdeu
 
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Geoffrey Négiar
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
Pierre Jacob
 
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Jihun Yun
 
The wild McKay correspondence
The wild McKay correspondenceThe wild McKay correspondence
The wild McKay correspondence
Takehiko Yasuda
 
Chapter 4 likelihood
Chapter 4 likelihoodChapter 4 likelihood
Chapter 4 likelihoodNBER
 
04 structured prediction and energy minimization part 1
04 structured prediction and energy minimization part 104 structured prediction and energy minimization part 1
04 structured prediction and energy minimization part 1zukun
 
Estimates for a class of non-standard bilinear multipliers
Estimates for a class of non-standard bilinear multipliersEstimates for a class of non-standard bilinear multipliers
Estimates for a class of non-standard bilinear multipliers
VjekoslavKovac1
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
Frank Nielsen
 
My PhD talk "Application of H-matrices for computing partial inverse"
My PhD talk "Application of H-matrices for computing partial inverse"My PhD talk "Application of H-matrices for computing partial inverse"
My PhD talk "Application of H-matrices for computing partial inverse"
Alexander Litvinenko
 

Similar to On the Convergence of Single-Call Stochastic Extra-Gradient Methods (20)

Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized trees
 
A brief introduction to Gaussian process
A brief introduction to Gaussian processA brief introduction to Gaussian process
A brief introduction to Gaussian process
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
 
Computation of the marginal likelihood
Computation of the marginal likelihoodComputation of the marginal likelihood
Computation of the marginal likelihood
 
Computational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial ModelingComputational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial Modeling
 
Comparison of the optimal design
Comparison of the optimal designComparison of the optimal design
Comparison of the optimal design
 
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
CLIM Fall 2017 Course: Statistics for Climate Research, Statistics of Climate...
 
A Note on BPTT for LSTM LM
A Note on BPTT for LSTM LMA Note on BPTT for LSTM LM
A Note on BPTT for LSTM LM
 
A Unified Perspective for Darmon Points
A Unified Perspective for Darmon PointsA Unified Perspective for Darmon Points
A Unified Perspective for Darmon Points
 
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
Trimming the L1 Regularizer: Statistical Analysis, Optimization, and Applicat...
 
The wild McKay correspondence
The wild McKay correspondenceThe wild McKay correspondence
The wild McKay correspondence
 
Chapter 4 likelihood
Chapter 4 likelihoodChapter 4 likelihood
Chapter 4 likelihood
 
04 structured prediction and energy minimization part 1
04 structured prediction and energy minimization part 104 structured prediction and energy minimization part 1
04 structured prediction and energy minimization part 1
 
Estimates for a class of non-standard bilinear multipliers
Estimates for a class of non-standard bilinear multipliersEstimates for a class of non-standard bilinear multipliers
Estimates for a class of non-standard bilinear multipliers
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
My PhD talk "Application of H-matrices for computing partial inverse"
My PhD talk "Application of H-matrices for computing partial inverse"My PhD talk "Application of H-matrices for computing partial inverse"
My PhD talk "Application of H-matrices for computing partial inverse"
 

Recently uploaded

mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
frank0071
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Red blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptxRed blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptx
muralinath2
 

Recently uploaded (20)

mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...Mudde &  Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
Mudde & Rovira Kaltwasser. - Populism in Europe and the Americas - Threat Or...
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Red blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptxRed blood cells- genesis-maturation.pptx
Red blood cells- genesis-maturation.pptx
 

On the Convergence of Single-Call Stochastic Extra-Gradient Methods

  • 1. On the Convergence of Single-Call Stochastic Extra-Gradient Methods Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos NeurIPS, December 2019 Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 0 / 22
  • 2. Outline: 1 Variational Inequality 2 Extra-Gradient 3 Single-call Extra-Gradient [Main Focus] 4 Conclusion Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 1 / 22
  • 3. Variational Inequality Variational Inequality Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 1 / 22
  • 4. Variational Inequality Introduction: Variational Inequalities in Machine Learning Generative adversarial network (GAN) min θ max φ Ex∼pdata [log(Dφ(x))] + Ez∼pZ [log(1 − Dφ(Gθ(z)))]. More min-max (saddle point) problems: distributionally robust learning, primal-dual formulation in optimization, . . . Seach of equilibrium: games, multi-agent reinforcement learning, . . . Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 2 / 22
  • 5. Variational Inequality Definition Stampacchia variational inequality Find x⋆ ∈ X such that ⟨V (x⋆ ),x − x⋆ ⟩ ≥ 0 for all x ∈ X. (SVI) Minty variational inequality Find x⋆ ∈ X such that ⟨V (x),x − x⋆ ⟩ ≥ 0 for all x ∈ X. (MVI) With closed convex set X ⊆ Rd and vector field V Rd → Rd . Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 3 / 22
  • 6. Variational Inequality Illustration SVI: V (x⋆ ) belongs to the dual cone DC(x⋆ ) of X at x⋆ [ local ] MVI: V (x) forms an acute angle with tangent vector x − x⋆ ∈ TC(x⋆ ) [ global ] Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 4 / 22
  • 7. Variational Inequality Example: Function Minimization min x f(x) subject to x ∈ X f X → R differentiable function to minimize. Let V = ∇f. (SVI) ∀x ∈ X, ⟨∇f(x⋆ ),x − x⋆ ⟩ ≥ 0 [ first-order optimality ] (MVI) ∀x ∈ X, ⟨∇f(x),x − x⋆ ⟩ ≥ 0 [x⋆ is a minimizer of f ] If f is convex, (SVI) and (MVI) are equivalent. Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 5 / 22
  • 8. Variational Inequality Example: Saddle point Problem Find x⋆ = (θ⋆ ,φ⋆ ) such that L(θ⋆ ,φ) ≤ L(θ⋆ ,φ⋆ ) ≤ L(θ,φ⋆ ) for all θ ∈ Θ and all φ ∈ Φ. X ≡ Θ × Φ and L X → R differentiable function. Let V = (∇θL,−∇φL). (SVI) ∀(θ,φ) ∈ X, ⟨∇θL(x⋆ ),θ − θ⋆ ⟩ − ⟨∇φL(x⋆ ),φ − φ⋆ ⟩ ≥ 0 [ stationary ] (MVI) ∀(θ,φ) ∈ X, ⟨∇θL(x),θ − θ⋆ ⟩ − ⟨∇φL(x),φ − φ⋆ ⟩ ≥ 0 [ saddle point ] If L is convex-concave, (SVI) and (MVI) are equivalent. Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 6 / 22
  • 9. Variational Inequality Monoticity The solutions of (SVI) and (MVI) coincide when V is continuous and monotone, i.e., ⟨V (x′ ) − V (x),x′ − x⟩ ≥ 0 for all x,x′ ∈ Rd . In the above two examples, this corresponds to either f being convex or L being convex-concave. The operator analogue of strong convexity is strong monoticity ⟨V (x′ ) − V (x),x′ − x⟩ ≥ α x′ − x 2 for some α > 0 and all x,x′ ∈ Rd . Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 7 / 22
  • 10. Extra-Gradient Extra-Gradient Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 7 / 22
  • 11. Extra-Gradient From Forward-backward to Extra-Gradient Forward-backward Xt+1 = ΠX (Xt − γtV (Xt)) (FB) Extra-Gradient [Korpelevich 1976] Xt+ 1 2 = ΠX (Xt − γtV (Xt)) Xt+1 = ΠX (Xt − γtV (Xt+ 1 2 )) (EG) The Extra-Gradient method anticipates the landscape of V by taking an extrapolation step to reach the leading state Xt+ 1 2 . Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 8 / 22
  • 12. Extra-Gradient From Forward-backward to Extra-Gradient Forward-backward does not converge in bilinear games, while Extra-Gradient does. min θ∈R max φ∈R θφ Left: Forward-backward Right: Extra-Gradient 4 2 0 2 4 6 6 4 2 0 2 4 6 0.5 0.0 0.5 1.0 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 9 / 22
  • 13. Extra-Gradient Stochastic Oracle If a stochastic oracle is involved: Xt+1 2 = ΠX (Xt − γt ˆVt) Xt+1 = ΠX (Xt − γt ˆVt+1 2 ) With ˆVt = V (Xt) + Zt satisfying (and same for ˆVt+ 1 2 ) a) Zero-mean: E[Zt Ft] = 0. b) Bounded variance: E[ Zt 2 Ft] ≤ σ2 . (Ft)t∈N/2 is the natural filtration associated to the stochastic process (Xt)t∈N/2. Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 10 / 22
  • 14. Extra-Gradient Convergence Metrics Ergodic convergence: restricted error function ErrR(ˆx) = max x∈XR ⟨V (x), ˆx − x⟩, where XR ≡ X ∩ BR(0) = {x ∈ X x ≤ R}. Last iterate convergence: squared distance dist(ˆx,X⋆ )2 . Lemma [Nesterov 2007] Assume V is monotone. If x⋆ is a solution of (SVI), we have ErrR(x⋆ ) = 0 for all sufficiently large R. Conversely, if ErrR(ˆx) = 0 for large enough R > 0 and some ˆx ∈ XR, then ˆx is a solution of (SVI). Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 11 / 22
  • 15. Extra-Gradient Literature Review We further suppose that V is β-Lipschitz. Convergence type Hypothesis Korpelevich 1976 Last iterate asymptotic Pseudo monotone Tseng 1995 Last iterate geometric Monotone + error bound (e.g., strongly monotone, affine) Nemirovski 2004 Ergodic in O(1/t) Monotone Juditsky et al. 2011 Ergodic in O(1/ √ t) Stochastic monotone Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 12 / 22
  • 16. Extra-Gradient In Deep Learning Extra-Gradient (EG) needs two oracle calls per iteration, while gradient computations can be very costly for deep models: And if we drop one oracle call per iteration? Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 13 / 22
  • 17. Single-call Extra-Gradient [Main Focus] Single-call Extra-Gradient Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 13 / 22
  • 18. Single-call Extra-Gradient [Main Focus] Algorithms 1 Past Extra-Gradient [Popov 1980] Xt+ 1 2 = ΠX (Xt − γt ˆVt− 1 2 ) Xt+1 = ΠX (Xt − γt ˆVt+ 1 2 ) (PEG) 2 Reflected Gradient [Malitsky 2015] Xt+ 1 2 = Xt − (Xt−1 − Xt) Xt+1 = ΠX (Xt − γt ˆVt+ 1 2 ) (RG) 3 Optimistic Gradient [Daskalakis et al. 2018] Xt+ 1 2 = ΠX (Xt − γt ˆVt− 1 2 ) Xt+1 = Xt+1 2 + γt ˆVt−1 2 − γt ˆVt+ 1 2 (OG) Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 14 / 22
  • 19. Single-call Extra-Gradient [Main Focus] A First Result Proxy PEG: [Step 1] ˆVt ← ˆVt− 1 2 RG: [Step 1] ˆVt ← (Xt−1 − Xt)/γt; no projection OG: [Step 1] ˆVt ← ˆVt− 1 2 [Step 2] Xt ← Xt+ 1 2 + γt ˆVt− 1 2 ; no projection Proposition Suppose that the Single-call Extra-Gradient (1-EG) methods presented above share the same initialization, X0 = X1 ∈ X, ˆV1/2 = 0 and a same constant step-size (γt)t∈N ≡ γ. If X = Rd , the generated iterates Xt coincide for all t ≥ 1. Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 15 / 22 Xt+1 2 = ΠX (Xt − γt ˆVt) Xt+1 = ΠX (Xt − γt ˆVt+ 1 2 )
  • 20. Single-call Extra-Gradient [Main Focus] Global Convergence Rate Always with Lipschitz continuity. Stochastic strongly monotone: step size in O(1/t). New results! Monotone Strongly Monotone Ergodic Last Iterate Ergodic Last Iterate Deterministic 1/t Unknown 1/t e−ρt Stochastic 1/ √ t Unknown 1/t 1/t Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 16 / 22
  • 21. Single-call Extra-Gradient [Main Focus] Proof Ingredients Descent Lemma [Deterministic + Monotone] There exists (µt)t∈N ∈ RN + such that for all p ∈ X, Xt+1 − p 2 + µt+1 ≤ Xt − p 2 − 2γ⟨V (Xt+1 2 ),Xt+ 1 2 − p⟩ + µt. Descent Lemma [Stochastic + Strongly Monotone] Let x⋆ be the unique solution of (SVI). There exists (µt)t∈N ∈ RN + ,M ∈ R+ such that E[ Xt+1 − x⋆ 2 ] + µt+1 ≤ (1 − αγt)(E[ Xt − x⋆ 2 ] + µt) + Mγ2 t σ2 . Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 17 / 22
  • 22. Single-call Extra-Gradient [Main Focus] Regular Solution Definition [Regular Solution] We say that x⋆ is a regular solution of (SVI) if V is C1 -smooth in a neighborhood of x⋆ and the Jacobian JacV (x⋆ ) is positive-definite along rays emanating from x⋆ , i.e., z⊺ JacV (x⋆ )z ≡ d ∑ i,j=1 zi ∂Vi ∂xj (x⋆ )zj > 0 for all z ∈ Rd ∖{0} that are tangent to X at x⋆ . To be compared with positive definiteness of the Hessian along qualified constraints in minimization; differential equilibrium in games. Localization of strong monoticity. Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 18 / 22
  • 23. Single-call Extra-Gradient [Main Focus] Local Convergence Theorem [Local convergence for stochastic non-monotone operators] Let x⋆ be a regular solution of (SVI) and fix a tolerance level δ > 0. Suppose (PEG) is run with step-sizes of the form γt = γ/(t + b) for large enough γ and b. Then: (a) There are neighborhoods U and U1 of x⋆ in X such that, if X1/2 ∈ U,X1 ∈ U1, the event E∞ = {Xt+ 1 2 ∈ U for all t = 1,2,...} occurs with probability at least 1 − δ. (b) Conditioning on the above, we have: E[ Xt − x⋆ 2 E∞] = O ( 1 t ). Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 19 / 22
  • 24. Single-call Extra-Gradient [Main Focus] Experiments L(θ,φ) = 2 1θ⊺ A1θ + 2(θ⊺ A2θ) 2 − 2 1φ⊺ B1φ − 2(φ⊺ B2φ) 2 + 4θ⊺ Cφ 0 20 40 60 80 100 # Oracle Calls 10 −16 10 −13 10 −10 10 −7 10 −4 10 −1 ∥x−x*∥2 EG 1-EG γ = 0.1 γ = 0.2 γ = 0.3 γ = 0.4 (a) Strongly monotone ( 1 = 1, 2 = 0) Deterministic Last iterate 10 0 10 1 10 2 10 3 # Oracle Calls 10 −3 10 −2 10 −1 10 0 ∥x−x*∥2 EG 1-EG γ = 0.2 γ = 0.4 γ = 0.6 γ = 0.8 (b) Monotone ( 1 = 0, 2 = 1) Deterministic Ergodic 10 0 10 1 10 2 10 3 10 4 # Oracle Calls 10 −3 10 −2 10 −1 ∥x−x*∥2 EG 1-EG γ = 0.3 γ = 0.6 γ = 0.9 γ = 1.2 (c) Non monotone ( 1 = 1, 2 = −1) Stochastic Zt iid ∼ N (0, σ 2 = .01) Last iterate (b = 15) Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 20 / 22
  • 25. Conclusion Conclusion and Perspectives Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 20 / 22
  • 26. Conclusion Conclusion Single-call rates ∼ Two-call rates. Localization of stochastic guarantee. Last iterate convergence: a first step to the non-monotone world. Some research directions: Bregman, universal, . . . Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 21 / 22
  • 27. Conclusion Bibliography Daskalakis, Constantinos et al. (2018). “Training GANs with optimism”. In: ICLR ’18: Proceedings of the 2018 International Conference on Learning Representations. Juditsky, Anatoli, Arkadi Semen Nemirovski, and Claire Tauvel (2011). “Solving variational inequalities with stochastic mirror-prox algorithm”. In: Stochastic Systems 1.1, pp. 17–58. Korpelevich, G. M. (1976). “The extragradient method for finding saddle points and other problems”. In: Èkonom. i Mat. Metody 12, pp. 747–756. Malitsky, Yura (2015). “Projected reflected gradient methods for monotone variational inequalities”. In: SIAM Journal on Optimization 25.1, pp. 502–520. Nemirovski, Arkadi Semen (2004). “Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems”. In: SIAM Journal on Optimization 15.1, pp. 229–251. Nesterov, Yurii (2007). “Dual extrapolation and its applications to solving variational inequalities and related problems”. In: Mathematical Programming 109.2, pp. 319–344. Popov, Leonid Denisovich (1980). “A modification of the Arrow–Hurwicz method for search of saddle points”. In: Mathematical Notes of the Academy of Sciences of the USSR 28.5, pp. 845–848. Tseng, Paul (June 1995). “On linear convergence of iterative methods for the variational inequality problem”. In: Journal of Computational and Applied Mathematics 60.1-2, pp. 237–252. Y-G. H., F. I., J. M., P. M. Single-Call Extra-Gradient NeurIPS, December 2019 22 / 22