SlideShare a Scribd company logo
1 of 87
Download to read offline
Quasi-Stochastic Approximation
Algorithm Design Principles with Applications to Machine Learning and Optimization
Caio Kalil Lauand (Joint work with Sean Meyn)
Department of Electrical and Computer Engineering
University of Florida
Support from ARO award W911NF2010055 and NSF awards EPCN 1935389, CCF 2306023 is gratefully acknowledged.
Agenda
1 Root-Finding Under Presence of Noise
2 Quasi-Stochastic Approximation
3 A Return to Extremum Seeking Control
4 Conclusions
5 Appendices
6 References
1 / 40
Root-Finding Under Presence of Noise
Optimization
We have an objective Γ : Rd → R+ that we wish to minimize.
First order condition for optimality of θ∗ ∈ Rd:
∇Γ(θ∗
) = 0
2 / 40
Root-Finding Under Presence of Noise
Optimization
We have an objective Γ : Rd → R+ that we wish to minimize.
First order condition for optimality of θ∗ ∈ Rd:
s
f(θ∗
)
def
= −∇Γ(θ∗
) = 0
Our first root-finding problem
2 / 40
Root-Finding Under Presence of Noise
Optimization
We have an objective Γ : Rd → R+ that we wish to minimize.
First order condition for optimality of θ∗ ∈ Rd:
s
f(θ∗
)
def
= −∇Γ(θ∗
) = 0
Our first root-finding problem
Challenges we will address:
• In complex systems we might not have access to the gradient but only a
noisy measurement:
e
∇Γ(θn) = ∇Γ(θn) + Wn
first order optimization, stochastic gradient descent...
2 / 40
Root-Finding Under Presence of Noise
Optimization
We have an objective Γ : Rd → R+ that we wish to minimize.
First order condition for optimality of θ∗ ∈ Rd:
s
f(θ∗
)
def
= −∇Γ(θ∗
) = 0
Our first root-finding problem
Challenges we will address:
• In complex systems we might not have access to the gradient but only a
noisy measurement:
e
∇Γ(θn) = ∇Γ(θn) + Wn
first order optimization, stochastic gradient descent...
• We might only have noisy measurements of the objective:
Yn = Γ(θn) + Wn
zeroth order optimization, gradient free optimization, extremum seeking control...
2 / 40
Root-Finding Under Presence of Noise
Gradient-Free Optimization
How would we estimate θopt
∈ arg min
θ
Γ if we have access to Γ for any θ?
3 / 40
Root-Finding Under Presence of Noise
Gradient-Free Optimization
How would we estimate θopt
∈ arg min
θ
Γ if we have access to Γ for any θ?
For any fixed θ and a small ε > 0, let
f(θ, ξ) = −
1
ε
ξΓ(θ + εξ)
where ξ is zero-mean.
=⇒ s
f(θ) := E[f(θ, ξ)] approximates −∇Γ (θ)
3 / 40
Root-Finding Under Presence of Noise
Gradient-Free Optimization
How would we estimate θopt
∈ arg min
θ
Γ if we have access to Γ for any θ?
f(θ, ξ) = −1
ε ξΓ(θ + εξ) with ξ is zero-mean.
3 / 40
Root-Finding Under Presence of Noise
Gradient-Free Optimization
How would we estimate θopt
∈ arg min
θ
Γ if we have access to Γ for any θ?
f(θ, ξ) = −1
ε ξΓ(θ + εξ) with ξ is zero-mean.
• A bit of Taylor series...
f(θ, ξ) = −
1
ε
ξΓ(θ) − ξξ⊺
∇Γ(θ) + O(ε)
3 / 40
Root-Finding Under Presence of Noise
Gradient-Free Optimization
How would we estimate θopt
∈ arg min
θ
Γ if we have access to Γ for any θ?
f(θ, ξ) = −1
ε ξΓ(θ + εξ) with ξ is zero-mean.
• A bit of Taylor series...
f(θ, ξ) = −
1
ε
ξΓ(θ) − ξξ⊺
∇Γ(θ) + O(ε)
• Taking expectations of both sides yields
E[f(θ, ξ)] = −Cov(ξ)∇Γ(θ) + O(ε)
3 / 40
Root-Finding Under Presence of Noise
Gradient-Free Optimization
How would we estimate θopt
∈ arg min
θ
Γ if we have access to Γ for any θ?
f(θ, ξ) = −1
ε ξΓ(θ + εξ) with ξ is zero-mean.
• A bit of Taylor series...
f(θ, ξ) = −
1
ε
ξΓ(θ) − ξξ⊺
∇Γ(θ) + O(ε)
• Taking expectations of both sides yields
s
f(θ) := E[f(θ, ξ)] = −Cov(ξ)∇Γ(θ) + O(ε)
3 / 40
Root-Finding Under Presence of Noise
Gradient-Free Optimization
How would we estimate θopt
∈ arg min
θ
Γ if we have access to Γ for any θ?
1SPSA: f(θ, ξ) = −1
ε ξΓ(θ + εξ) with ξ is zero-mean.
• A bit of Taylor series...
f(θ, ξ) = −
1
ε
ξΓ(θ) − ξξ⊺
∇Γ(θ) + O(ε)
• Taking expectations of both sides yields
s
f(θ) := E[f(θ, ξ)] = −Cov(ξ)∇Γ(θ) + O(ε)
s
f(θ∗
) = 0 , s
f(θopt
) = O(ε)
see Spall [67] and Ariyur & Krstić [3].
3 / 40
Root-Finding Under Presence of Noise
How do we implement this algorithm?
• The vector θ∗ can be estimated recursively through
θn+1 = θn − α
1
ε
ξn+1Γ(θn + εξn+1)
4 / 40
Root-Finding Under Presence of Noise
How do we implement this algorithm?
• The vector θ∗ can be estimated recursively through
θn+1 = θn − α
1
ε
ξn+1Γ(θn + εξn+1)
⋄ α > 0 is a constant independent of n.
4 / 40
Root-Finding Under Presence of Noise
How do we implement this algorithm?
• The vector θ∗ can be estimated recursively through
θn+1 = θn − α
1
ε
ξn+1Γ(θn + εξn+1)
⋄ α > 0 is a constant independent of n.
⋄ {ξn} is a zero-mean sequence.
Spall takes this to be i.i.d., entries ±1
4 / 40
Root-Finding Under Presence of Noise
How do we implement this algorithm?
• The vector θ∗ can be estimated recursively through
θn+1 = θn − α
1
ε
ξn+1Γ(θn + εξn+1)
• Deterministic counterpart, as the ODE
d
dt Θt = −α
1
ε
ξtΓ(Θt + εξt)
4 / 40
Root-Finding Under Presence of Noise
How do we implement this algorithm?
• The vector θ∗ can be estimated recursively through
θn+1 = θn − α
1
ε
ξn+1Γ(θn + εξn+1)
• Deterministic counterpart, as the ODE
ESC-0: d
dt Θt = −α
1
ε
ξtΓ(Θt + εξt)
This is the simplest Extremum Seeking Control (ESC) ODE
4 / 40
Root-Finding Under Presence of Noise
Extremum Seeking Control
• Being born in the 1920s, Extremum seeking control (ESC) is said to be
the oldest approach to gradient-free optimization.
• A typical architecture for ESC for optimization is illustrated below:
5 / 40
Root-Finding Under Presence of Noise
Extremum Seeking Control
ESC-0
• The high-pass (HP) filter is removed entirely:
ξ̌t = ξt
Y̌n
t = Yn
t =
1
ε
Γ(Θt + εξt)
5 / 40
Root-Finding Under Presence of Noise
Extremum Seeking Control
ESC-0
• The low-pass (LP) filter is an integrator:
d
dt Θt = −αM e
∇tΓ = −αξt
1
ε
Γ(Θt + εξt)
5 / 40
d
dt
Θt = atf(Θt, ξt)
Θt
ξt
Quasi-Stochastic Approximation
Quasi-Stochastic Approximation
Zooming Out
• In quasi-stochastic approximation, ξ is a smooth deterministic process,
θn+1 = θn + αn+1f(θn, ξn+1)
The probing signal ξ is typically chosen as: ξt = G(Φt) where Φ ∈ CK
with entries
Φi
t = exp(2πj[ωit + ϕi])
and {ωi} distinct.
6 / 40
Quasi-Stochastic Approximation
Zooming Out
• In quasi-stochastic approximation, ξ is a smooth deterministic process,
θn+1 = θn + αn+1f(θn, ξn+1)
The probing signal ξ is typically chosen as: ξt = G(Φt) where Φ ∈ CK
with entries
Φi
t = exp(2πj[ωit + ϕi])
and {ωi} distinct.
• Expressed as ODEs for ease of analysis,
QSA ODE: d
dt Θt = atf(Θt, ξt)
Common choices for {at} include:
⋄ Vanishing gain: at = (t + 1)−ρ with ρ ∈ (1/2, 1)
⋄ Constant gain: at ≡ α > 0 for all t
6 / 40
Quasi-Stochastic Approximation
Quasi-Stochastic Approximation d
dt Θt = atf(Θt, ξt)
• Algorithm design and analysis are based upon another ODE,
Mean Flow: d
dt ϑt = s
f(ϑt)
s
f(θ) := lim
T→∞
1
T
Z T
0
f(θ, ξt) dt
7 / 40
Quasi-Stochastic Approximation
Quasi-Stochastic Approximation d
dt Θt = atf(Θt, ξt)
• Algorithm design and analysis are based upon another ODE,
Mean Flow: d
dt ϑt = s
f(ϑt)
s
f(θ) := lim
T→∞
1
T
Z T
0
f(θ, ξt) dt
If stable: ϑt → θ∗
and s
f(ϑt) → s
f(θ∗
) = 0
7 / 40
Quasi-Stochastic Approximation
Quasi-Stochastic Approximation d
dt Θt = atf(Θt, ξt)
• Algorithm design and analysis are based upon another ODE,
Mean Flow: d
dt ϑt = s
f(ϑt)
s
f(θ) := lim
T→∞
1
T
Z T
0
f(θ, ξt) dt
If stable: ϑt → θ∗
and s
f(ϑt) → s
f(θ∗
) = 0
• {Θt} couples with solutions of the mean flow under general conditions.
7 / 40
Quasi-Stochastic Approximation
Quasi-Stochastic Approximation d
dt Θt = atf(Θt, ξt)
• Algorithm design and analysis are based upon another ODE,
Mean Flow: d
dt ϑt = s
f(ϑt)
s
f(θ) := lim
T→∞
1
T
Z T
0
f(θ, ξt) dt
If stable: ϑt → θ∗
and s
f(ϑt) → s
f(θ∗
) = 0
• {Θt} couples with solutions of the mean flow under general conditions.
Caveat: Global Lipschitz continuity of f is crucial. Always assumed.
7 / 40
Quasi-Stochastic Approximation
Quasi-Stochastic Approximation d
dt Θt = atf(Θt, ξt)
• Algorithm design and analysis are based upon another ODE,
Mean Flow: d
dt ϑt = s
f(ϑt)
s
f(θ) := lim
T→∞
1
T
Z T
0
f(θ, ξt) dt
If stable: ϑt → θ∗
and s
f(ϑt) → s
f(θ∗
) = 0
• {Θt} couples with solutions of the mean flow under general conditions.
Caveat: Global Lipschitz continuity of f is crucial. Always assumed.
• Pertubative mean flow #1:
d
dt Θt = at[ s
f(Θt) + e
Ξt] , e
Ξt := f(Θt, ξt) − s
f(Θt)
7 / 40
Quasi-Stochastic Approximation
Quasi-Stochastic Approximation d
dt Θt = atf(Θt, ξt)
• Algorithm design and analysis are based upon another ODE,
Mean Flow: d
dt ϑt = s
f(ϑt)
s
f(θ) := lim
T→∞
1
T
Z T
0
f(θ, ξt) dt
If stable: ϑt → θ∗
and s
f(ϑt) → s
f(θ∗
) = 0
• {Θt} couples with solutions of the mean flow under general conditions.
Caveat: Global Lipschitz continuity of f is crucial. Always assumed.
• Pertubative mean flow #1:
d
dt Θt = at[ s
f(Θt) + e
Ξt] , e
Ξt := f(Θt, ξt) − s
f(Θt)
Can we do any better?
7 / 40
Quasi-Stochastic Approximation
Métivier and Priouret To The Rescue!
• Pertubative mean (p-mean) flow #1:
d
dt Θt = at[ s
f(Θt) + e
Ξt] , e
Ξt := f(Θt, ξt) − s
f(Θt)
• Representation for e
Ξ based on solutions to Poisson’s equation.
First instance, solution ˆ
f with forcing function f:
d
dt
ˆ
f(θ, Φt) = −[f(θ, ξt) − s
f(θ)] , θ ∈ Rd
8 / 40
Quasi-Stochastic Approximation
Métivier and Priouret To The Rescue!
• Pertubative mean (p-mean) flow #1:
d
dt Θt = at[ s
f(Θt) + e
Ξt] , e
Ξt := f(Θt, ξt) − s
f(Θt)
• Representation for e
Ξ based on solutions to Poisson’s equation.
First instance, solution ˆ
f with forcing function f:
d
dt
ˆ
f(θ, Φt) = −[f(θ, ξt) − s
f(θ)] , θ ∈ Rd
=⇒ d
dt
ˆ
f(Θt, Φt) = −e
Ξt + ∂θ
ˆ
f(Θt, Φt) · d
dt Θt
8 / 40
Quasi-Stochastic Approximation
Métivier and Priouret To The Rescue!
• Pertubative mean (p-mean) flow #1:
d
dt Θt = at[ s
f(Θt) + e
Ξt] , e
Ξt := f(Θt, ξt) − s
f(Θt)
• Representation for e
Ξ based on solutions to Poisson’s equation.
First instance, solution ˆ
f with forcing function f:
d
dt
ˆ
f(θ, Φt) = −[f(θ, ξt) − s
f(θ)] , θ ∈ Rd
=⇒ d
dt
ˆ
f(Θt, Φt) = −e
Ξt + ∂θ
ˆ
f(Θt, Φt)[atf(Θt, ξt)]
e
Ξt = zero mean + small
8 / 40
Quasi-Stochastic Approximation
Métivier and Priouret To The Rescue!
• Pertubative mean (p-mean) flow #1:
d
dt Θt = at[ s
f(Θt) + e
Ξt] , e
Ξt := f(Θt, ξt) − s
f(Θt)
• Representation for e
Ξ based on solutions to Poisson’s equation.
First instance, solution ˆ
f with forcing function f:
d
dt
ˆ
f(θ, Φt) = −[f(θ, ξt) − s
f(θ)] , θ ∈ Rd
=⇒ d
dt
ˆ
f(Θt, Φt) = −e
Ξt + ∂θ
ˆ
f(Θt, Φt)[atf(Θt, ξt)]
e
Ξt = zero mean + small
• Borrowed from the stochastic approximation literature:
disturbance decomposition introduced by Métivier and Priouret.
8 / 40
Quasi-Stochastic Approximation
Perturbative Mean Flow
The perturbative mean (p-mean) flow representation
d
dt Θt = at[ s
f(Θt) + e
Ξt]
e
Ξt = −at
s
Υ(Θt) +
2
X
i=0
a2−i
t
di
dti
Wi
t
where {s
Υt, Wi
t : i = 0, 1, 2} are smooth deterministic functions of (Θt, Φt)
admitting representations in terms of solutions to Poisson’s equation.
• Opens doors for analysis: transient bounds and filter design.
9 / 40
Quasi-Stochastic Approximation
Perturbative Mean Flow
The perturbative mean (p-mean) flow representation
d
dt Θt = at[ s
f(Θt) + e
Ξt]
e
Ξt = −at
s
Υ(Θt) +
2
X
i=0
a2−i
t
di
dti
Wi
t
where {s
Υt, Wi
t : i = 0, 1, 2} are smooth deterministic functions of (Θt, Φt)
admitting representations in terms of solutions to Poisson’s equation.
• Opens doors for analysis: transient bounds and filter design.
What is s
Υ? It appears with multiplicative noise:
s
Υ(θ) := − lim
T→∞
1
T
Z T
0
∂θ
ˆ
f(θ, Φt)f(θ, ξt) dt
9 / 40
Quasi-Stochastic Approximation
Convergence and Acceleration d
dt Θt = atf(Θt, ξt)
• When at = (1 + t)−ρ with ρ ∈ (1/2, 1),
Θt = θ∗
+ at[A∗
]−1 s
Υ∗
+ nicet
o
⇒ ∥Θt − θ∗
∥2
= O(a2
t )
where s
Υ∗ = s
Υ(θ∗) and A∗ = ∂θ
s
f(θ∗).
10 / 40
Quasi-Stochastic Approximation
Convergence and Acceleration d
dt Θt = atf(Θt, ξt)
• When at = (1 + t)−ρ with ρ ∈ (1/2, 1),
Θt = θ∗
+ at[A∗
]−1 s
Υ∗
+ nicet
o
⇒ ∥Θt − θ∗
∥2
= O(a2
t )
where s
Υ∗ = s
Υ(θ∗) and A∗ = ∂θ
s
f(θ∗).
• Convergence is accelerated through Polyak-Ruppert (PR) averaging
ΘPR
T =
1
T − δT
Z T
δT
Θt dt , δ ∈ (0, 1)
10 / 40
Quasi-Stochastic Approximation
Convergence and Acceleration d
dt Θt = atf(Θt, ξt)
• When at = (1 + t)−ρ with ρ ∈ (1/2, 1),
Θt = θ∗
+ at[A∗
]−1 s
Υ∗
+ nicet
o
⇒ ∥Θt − θ∗
∥2
= O(a2
t )
where s
Υ∗ = s
Υ(θ∗) and A∗ = ∂θ
s
f(θ∗).
• Convergence is accelerated through Polyak-Ruppert (PR) averaging
ΘPR
T =
1
T − δT
Z T
δT
Θt dt , δ ∈ (0, 1)
• Extremely fast rates are obtained:
ΘPR
T = θ∗
+ O(aT ∥s
Υ∗
∥) + O(a2
T ) ⇒ ∥ΘPR
T − θ∗
∥2
= O(a4
T )
| {z }
If s
Υ∗=0
10 / 40
Quasi-Stochastic Approximation
Killing s
Υ∗
Φi
t = exp(2πj[ωit + ϕi])
Clever Probing design
⋄ Design ξ so that ξt = G(Φt) with G analytic and choose frequencies
{ω1 , . . . , ωK} satisfying,
ωi = log(ai/bi) > 0 , {ai, bi} positive integers.
11 / 40
Quasi-Stochastic Approximation
Killing s
Υ∗
Φi
t = exp(2πj[ωit + ϕi])
Clever Probing design
⋄ Design ξ so that ξt = G(Φt) with G analytic and choose frequencies
{ω1 , . . . , ωK} satisfying,
ωi = log(ai/bi) > 0 , {ai, bi} positive integers.
Cleverness # 1: existence of solutions to Poisson’s equation.
• Solutions can be represented as sums of integrals
Z t
0
exp(2πj[ω◦
t + ϕ◦
]) dt
ω◦ = n1ω1 + · · · nKωK.
=⇒ Require bounds on 1/ω◦
11 / 40
Quasi-Stochastic Approximation
Killing s
Υ∗
Φi
t = exp(2πj[ωit + ϕi])
Clever Probing design
⋄ Design ξ so that ξt = G(Φt) with G analytic and choose frequencies
{ω1 , . . . , ωK} satisfying,
ωi = log(ai/bi) > 0 , {ai, bi} positive integers.
Cleverness # 1: existence of solutions to Poisson’s equation.
• Solutions can be represented as sums of integrals
Z t
0
exp(2πj[ω◦
t + ϕ◦
]) dt
ω◦ = n1ω1 + · · · nKωK.
=⇒ Require bounds on 1/ω◦
Great lower bounds on |ω◦| from Baker’s Theorem.
11 / 40
Quasi-Stochastic Approximation
Killing s
Υ∗
Φi
t = exp(2πj[ωit + ϕi])
Clever Probing design
⋄ Design ξ so that ξt = G(Φt) with G analytic and choose frequencies
{ω1 , . . . , ωK} satisfying,
ωi = log(ai/bi) > 0 , {ai, bi} positive integers.
Cleverness # 1: existence of solutions to Poisson’s equation.
• Solutions can be represented as sums of integrals
=⇒ Require bounds on 1/ω◦
Great lower bounds on |ω◦| from Baker’s Theorem.
Cleverness # 2: ĝ ⊥ h for smooth functions g, h of the probing signal.
11 / 40
Quasi-Stochastic Approximation
Killing s
Υ∗
Φi
t = exp(2πj[ωit + ϕi])
Clever Probing design
⋄ Design ξ so that ξt = G(Φt) with G analytic and choose frequencies
{ω1 , . . . , ωK} satisfying,
ωi = log(ai/bi) > 0 , {ai, bi} positive integers.
Cleverness # 1: existence of solutions to Poisson’s equation.
• Solutions can be represented as sums of integrals
=⇒ Require bounds on 1/ω◦
Great lower bounds on |ω◦| from Baker’s Theorem.
Cleverness # 2: ĝ ⊥ h for smooth functions g, h of the probing signal.
s
Υi(θ) =
X
j
⟨ĝi,j, hj⟩, with g = ∂θf and h = f.
11 / 40
Quasi-Stochastic Approximation
Killing s
Υ∗
Φi
t = exp(2πj[ωit + ϕi])
Clever Probing design
⋄ Design ξ so that ξt = G(Φt) with G analytic and choose frequencies
{ω1 , . . . , ωK} satisfying,
ωi = log(ai/bi) > 0 , {ai, bi} positive integers.
Cleverness # 1: existence of solutions to Poisson’s equation.
• Solutions can be represented as sums of integrals
=⇒ Require bounds on 1/ω◦
Great lower bounds on |ω◦| from Baker’s Theorem.
Cleverness # 2: ĝ ⊥ h for smooth functions g, h of the probing signal.
s
Υi(θ) =
X
j
⟨ĝi,j, hj⟩, with g = ∂θf and h = f.
= 0
11 / 40
Quasi-Stochastic Approximation
Noise vs Quasi-Noise
Example: Optimization of Rastrigin’s objective through
1SPSA: f(θ, ζ) = −
1
ε
ζΓ(θ + εζ) ESC-0: f(θ, ξ) = −
1
ε
ξΓ(θ + εξ)
where ζ is i.i.d and ξ is a mixture of sinusoids, Cov(ξ) = Cov(ζ).
1SPSA
ESC-0
ESC-0
Dimension 2
Dimension 1
0 1 2 2
-2 -1
0
-2 4
2 8
6
×10−4
×10 4 -2
40%
outliers
removed
0 1
-2 -1
0 4
2 8
6
×10−4
×10−2
×10−2
0 2 4 6 8
0
1
2
5
20
40
5
60
0
80
0
-5 -5
Γ(θ)
θ1
θ2
T
PR Averaging
105 1
a2
T
σT
12 / 40
Quasi-Stochastic Approximation
Fixed Gain Algorithms for QSA
• The QSA ODE with fixed gain is old news! (recall the averaging principle)
d
dt Θt = αf(Θt, ξt) , α > 0
see Khalil [24].
13 / 40
Quasi-Stochastic Approximation
Fixed Gain Algorithms for QSA
• The QSA ODE with fixed gain is old news! (recall the averaging principle)
d
dt Θt = αf(Θt, ξt) , α > 0
see Khalil [24].
• Motivation is tracking: { s
ft} ⇒ {θ∗
t }
13 / 40
Quasi-Stochastic Approximation
Fixed Gain Algorithms for QSA
• The QSA ODE with fixed gain is old news! (recall the averaging principle)
d
dt Θt = αf(Θt, ξt) , α > 0
see Khalil [24].
• Motivation is tracking: { s
ft} ⇒ {θ∗
t }
• Without averaging, MSE is
lim sup
t→∞
∥Θt − θ∗
∥2
= O(α2
)
13 / 40
Quasi-Stochastic Approximation
Fixed Gain Algorithms for QSA
• The QSA ODE with fixed gain is old news! (recall the averaging principle)
d
dt Θt = αf(Θt, ξt) , α > 0
see Khalil [24].
• Motivation is tracking: { s
ft} ⇒ {θ∗
t }
• Without averaging, MSE is
lim sup
t→∞
∥Θt − θ∗
∥2
= O(α2
)
• A p-mean flow representation inspires a low pass filter to obtain
lim sup
t→∞
∥ΘF
t − θ∗
∥2
= O(α4
)
13 / 40
Quasi-Stochastic Approximation
Filtering d
dt Θt = αf(Θt, ξt)
Second order filter is required
d2
dt2
ΘF
t + 2γζ
d
dt
ΘF
t + γ2
ΘF
t = γ2
Θt
with γ = O(α) and ζ ∈ (0, 1).
14 / 40
Quasi-Stochastic Approximation
Filtering d
dt Θt = αf(Θt, ξt)
Second order filter is required
d2
dt2
ΘF
t + 2γζ
d
dt
ΘF
t + γ2
ΘF
t = γ2
Θt
with γ = O(α) and ζ ∈ (0, 1).
14 / 40
Quasi-Stochastic Approximation
Filtering d
dt Θt = αf(Θt, ξt)
Second order filter is required
d2
dt2
ΘF
t + 2γζ
d
dt
ΘF
t + γ2
ΘF
t = γ2
Θt
with γ = O(α) and ζ ∈ (0, 1).
Obtain from p-mean flow
lim sup
t→∞
∥ΘF
t − θ∗
∥2
= O(α2
∥s
Υ∗
∥) + O(α4
)
14 / 40
Quasi-Stochastic Approximation
Filtering d
dt Θt = αf(Θt, ξt)
Second order filter is required
d2
dt2
ΘF
t + 2γζ
d
dt
ΘF
t + γ2
ΘF
t = γ2
Θt
with γ = O(α) and ζ ∈ (0, 1).
Obtain from p-mean flow
lim sup
t→∞
∥ΘF
t − θ∗
∥2
= O(α2
∥s
Υ∗
∥) + O(α4
)
Recall s
Υ can be eliminated with careful design of ξ.
14 / 40
Quasi-Stochastic Approximation
Error Attenuation in Tracking
ESC-0: f(Θt, ξt) = −
1
ε
ξtΓt(Θt + εξt)
ΘF
t
t
θopt
Θt
Γ(θ −
− θopt
t )
Traveling Camel
Transients: First 20% Final 80%
ESC-0
with
and
without
filtering
15 / 40
A Return to Extremum Seeking Control
Lipschitz Continuity Matters! f(θ, ξ) = −1
ε ξΓ(θ + εξ)
Recall: global Lipschitz continuity of f is always assumed
16 / 40
A Return to Extremum Seeking Control
Lipschitz Continuity Matters! f(θ, ξ) = −1
ε ξΓ(θ + εξ)
Recall: global Lipschitz continuity of f is always assumed
This is rarely the case in pratice ! (consider Γ quadratic)
16 / 40
A Return to Extremum Seeking Control
Lipschitz Continuity Matters! f(θ, ξ) = −1
ε ξΓ(θ + εξ)
Recall: global Lipschitz continuity of f is always assumed
This is rarely the case in pratice ! (consider Γ quadratic)
Finite escape time is typical
16 / 40
A Return to Extremum Seeking Control
Lipschitz Continuity Matters! f(θ, ξ) = −1
ε ξΓ(θ + εξ)
Recall: global Lipschitz continuity of f is always assumed
This is rarely the case in pratice ! (consider Γ quadratic)
Finite escape time is typical
State-dependent probing gain:
f(θ, ξ) = −
1
ϵ(θ)
ξΓ(θ + ϵ(θ)ξ)
16 / 40
A Return to Extremum Seeking Control
Lipschitz Continuity Matters! f(θ, ξ) = − 1
ϵ(θ) ξΓ(θ + ϵ(θ)ξ)
Recall: global Lipschitz continuity of f is always assumed
This is rarely the case in pratice ! (consider Γ quadratic)
Finite escape time is typical
State-dependent probing gain:
f(θ, ξ) = −
1
ϵ(θ)
ξΓ(θ + ϵ(θ)ξ)
Examples: ϵ(θ) = ε
p
1 + Γ(θ) [WLOG Γ ≥ 0]
16 / 40
A Return to Extremum Seeking Control
Lipschitz Continuity Matters! f(θ, ξ) = − 1
ϵ(θ) ξΓ(θ + ϵ(θ)ξ)
Recall: global Lipschitz continuity of f is always assumed
This is rarely the case in pratice ! (consider Γ quadratic)
Finite escape time is typical
State-dependent probing gain:
f(θ, ξ) = −
1
ϵ(θ)
ξΓ(θ + ϵ(θ)ξ)
Examples: ϵ(θ) = ε
p
1 + Γ(θ) [WLOG Γ ≥ 0]
ϵ(θ) = ε
q
1 + ∥θ − θctr∥2/σ2
p
16 / 40
A Return to Extremum Seeking Control
Lipschitz Continuity Matters! f(θ, ξ) = − 1
ϵ(θ) ξΓ(θ + ϵ(θ)ξ)
Recall: global Lipschitz continuity of f is always assumed
This is rarely the case in pratice ! (consider Γ quadratic)
Finite escape time is typical
State-dependent probing gain:
f(θ, ξ) = −
1
ϵ(θ)
ξΓ(θ + ϵ(θ)ξ)
Examples: ϵ(θ) = ε
p
1 + Γ(θ) [WLOG Γ ≥ 0]
ϵ(θ) = ε
q
1 + ∥θ − θctr∥2/σ2
p
• The algorithm is globally stable
subject to coercivity of Γ ⊕ Lipschitz gradient
• It makes sense to explore more when Γ(θ) is big!
16 / 40
A Return to Extremum Seeking Control
Finite Escape Time for ESC f(θ, ξ) = − 1
ϵ(θ) ξΓ(θ + ϵ(θ)ξ)
• Consider the scalar-valued objective Γ(θ) = θ2
• ESC-0 was applied with ξt = 2 cos( t
4) and at = (t + 1)−0.8
a) ϵ = 0.5 , b) ϵ(θ) = 0.5
p
1 + ∥θ∥2
10
10
2
8
6
4
2
0
8
4
0
-4
-8
0 2 4 6 8 10
-140
-100
-60
-20
20
10
-1
17 / 40
A Return to Extremum Seeking Control
Vanishing vs Fixed Gain Algorithms for Optimization
a) at = 0.1(1 + t)−0.65 , b) αb = 3 × 10−3 , c) αs = 7 × 10−4
100
10−10
10−5
100
10−10
10−5
θ1
θ2
Γ(Θt)
Vanishing Gain Constant Gain
Θt
Θt
PR
Θt
Θt
F1
Θt
F2
at = a0(1 + t)−ρ αb αs
t × 104
Evolution
of
Estimates
Evolution
of
Cost
18 / 40
Conclusions
Steps to a Succesful Design:
1) Design s
f so the mean flow ϑ̇ = s
f(ϑ) is GAS to θ∗ and ∂ s
f(θ∗) is
Hurwitz.
2) Design step-size: αn = n−ρ with 1
2 ≤ ρ < 1 for optimization.
αn ≡ α > 0 for tracking.
3) Use “clever” exploration when applicable.
4) Perform PR Averaging or filtering with bandwidth O(α).
5) Repeat! Obtain histograms for M runs
{TN θPR
N
(m)
: 1 ≤ m ≤ M} TN = (N − N0)2ρ
with θPR
0
(m)
widely dispersed.
19 / 40
Conclusions
Final Thoughts
• A simple averaging (or filtering) trick can greatly improve algorithmic
performance:
⋄ Vanishing gain: MSE vanishes at rates arbitrarily close to O(n−4).
⋄ Fixed gain: In general, MSE bounds are O(α4).
• The incorporation of a state-dependent probing gain leads to ESC-0
algorithms that are globally stable (and more).
20 / 40
Conclusions
Final Thoughts
• A simple averaging (or filtering) trick can greatly improve algorithmic
performance:
⋄ Vanishing gain: MSE vanishes at rates arbitrarily close to O(n−4).
⋄ Fixed gain: In general, MSE bounds are O(α4).
• The incorporation of a state-dependent probing gain leads to ESC-0
algorithms that are globally stable (and more).
• Work in progress: two time-scale QSA comes up for ESC and elsewhere.
The p-mean flow representation should be done by today’s CDC deadline!
Thank you!
Financial support from ARO award W911NF2010055 and NSF award EPCN 1935389 is gratefully acknowledged.
20 / 40
Appendices
Assumptions for QSA
(A1)
VG: The process a is non-negative, monotonically decreasing, and
lim
t→∞
at = 0,
Z ∞
0
ar dr = ∞.
BG: For all t, the gain process satisfies at ≡ α > 0 for some
0 < α < α0 < 1.
(A2) The functions s
f and f are Lipschitz continuous: for a constant
Lf < ∞,
∥ s
f(θ′
) − s
f(θ)∥ ≤ Lf ∥θ′
− θ∥,
∥f(θ′
, ξ) − f(θ, ξ)∥ + ∥f(θ, ξ′
) − f(θ, ξ)∥ ≤ Lf [∥θ′
− θ∥ + ∥ξ′
− ξ∥] ,
θ′
, θ ∈ Rd
, ξ, ξ′
∈ Rm
21 / 40
Appendices
Assumptions for QSA
(A3) The ODE d
dtϑt = s
f(ϑt) is globally asymptotically stable with unique
equilibrium θ∗. Moreover, one of the following conditions holds:
(a) There is a Lipschitz continuous Lyapunov function V : Rd → R+, a
constant δ0 > 0 and a compact set S such that ∇V (ϑt) · s
f(ϑt) ≤
−δ0∥ϑt∥ whenever ϑt /
∈ S.
(b) The scaled vector field s
f∞ : Rd → Rd defined by s
f∞(θ) :=
limc→∞
s
f(cθ)/c, θ ∈ Rd, exists as a continuous function. Moreover,
the ODE@∞ defined by d
dt xt = s
f∞(xt) is globally asymptotically sta-
ble [48, §4.8.4].
(A4) The vector field s
f is differentiable, with derivative denoted Ā(θ) =
∂θ
s
f (θ).
That is, Ā(θ) is a d × d matrix for each θ ∈ Rd, with Āi,j(θ) =
∂
∂θj
s
fi (θ).
Moreover, the derivative Ā is Lipschitz continuous, and Ā∗ = Ā(θ∗) is
Hurwitz.
22 / 40
Appendices
Assumptions for QSA
(A5) Φ is the state process for a dynamical system d
dt Φt = H(Φt), H :
Ω → Ω with unique invariant measure π. It satisfies the following ergodic
theorems for the functions of interest, for each initial condition Φ0 ∈ Ω:
(i) For each θ there exists a solution ˆ
f(θ, · ) to Poisson’s equation with
forcing function f. That is,
ˆ
f(θ, Φt0 ) =
Z t1
t0
[f(θ, ξt) − s
f(θ)] dt + ˆ
f(θ, Φt1 ) , 0 ≤ t0 ≤ t1
and for each θ,
R
Ω
ˆ
f(θ, z) π(dz) = 0. Finally, ˆ
f is continuously
differentiable (C1) on Rd × Ω. Its Jacobian with respect to θ is
denoted
b
A(θ, z) := ∂θ
ˆ
f(θ, z)
where
Z
Ω
b
A(θ, z) π(dz) = 0 for each θ ∈ Rd
23 / 40
Appendices
Assumptions for QSA
(ii) For each θ, there are C1 solutions to Poisson’s equation with forcing
functions ˆ
f and Υ. They are denoted
ˆ
ˆ
f and b
Υ, respectively, satisfying
ˆ
ˆ
f(θ, Φt0 ) =
Z t1
t0
ˆ
f(θ, ξt) dt +
ˆ
ˆ
f(θ, Φt1 )
b
Υ(θ, Φt0 ) =
Z t1
t0
[Υ(θ, Φt) − s
Υ(θ)] dt + b
Υ(θ, Φt1 ) , 0 ≤ t0 ≤ t1
with s
Υ(θ) = −
Z
Ω
b
A(θ, z)f(θ, G(z)) π(dz)
Moreover, for each θ,
Z
Ω
ˆ
ˆ
f(θ, z) π(dz) =
Z
Ω
b
Υ(θ, z) π(dz) = 0
24 / 40
References
References I
[1] L. Amerio and G. Prouse.
Almost-periodic functions and functional equations.
Springer Science & Business Media, 2013.
[2] K. B. Ariyur and M. Krstić.
Analysis and design of multivariable extremum seeking.
In American Control Conference, volume 4, pages 2903–2908. IEEE, 2002.
[3] K. B. Ariyur and M. Krstić.
Real Time Optimization by Extremum Seeking Control.
John Wiley & Sons, Inc., New York, NY, 2003.
[4] S. Asmussen and P. W. Glynn.
Stochastic Simulation: Algorithms and Analysis, volume 57 of Stochastic Modelling and
Applied Probability.
Springer-Verlag, New York, 2007.
[5] A. Benveniste, M. Métivier, and P. Priouret.
Adaptive algorithms and stochastic approximations, volume 22.
Springer Science & Business Media, Berlin Heidelberg, 2012.
25 / 40
References
References II
[6] D. P. Bertsekas.
Reinforcement learning and optimal control.
Athena Scientific, Belmont, MA, 2019.
[7] S. Bhatnagar, M. C. Fu, S. I. Marcus, and I.-J. Wang.
Two-timescale simultaneous perturbation stochastic approximation using deterministic
perturbation sequences.
ACM Transactions on Modeling and Computer Simulation (TOMACS), 13(2):180–209,
2003.
[8] H. Bohr.
Almost periodic functions.
Courier Dover Publications (Reprint of the 1947 edition, Chelsea Publishing Company),
2018.
[9] V. Borkar, S. Chen, A. Devraj, I. Kontoyiannis, and S. Meyn.
The ODE method for asymptotic statistics in stochastic approximation and reinforcement
learning.
arXiv e-prints:2110.14427, pages 1–50, 2021.
[10] V. S. Borkar.
Stochastic Approximation: A Dynamical Systems Viewpoint.
Hindustan Book Agency, Delhi, India, 2nd edition, 2021.
26 / 40
References
References III
[11] V. S. Borkar and S. P. Meyn.
The ODE method for convergence of stochastic approximation and reinforcement learning.
SIAM J. Control Optim., 38(2):447–469, 2000.
[12] S. Chen, A. Devraj, A. Bernstein, and S. Meyn.
Accelerating optimization and reinforcement learning with quasi stochastic approximation.
In Proc. of the American Control Conf., pages 1965–1972, May 2021.
[13] S. Chen, A. Devraj, A. Bernstein, and S. Meyn.
Revisiting the ODE method for recursive algorithms: Fast convergence using quasi
stochastic approximation.
Journal of Systems Science and Complexity, 34(5):1681–1702, 2021.
[14] X. Chen, Y. Tang, and N. Li.
Improve single-point zeroth-order optimization using high-pass and low-pass filters.
In International Conference on Machine Learning, pages 3603–3620. PMLR, 2022.
[15] A. M. Devraj, A. Bušić, and S. Meyn.
Fundamental design principles for reinforcement learning algorithms.
In K. G. Vamvoudakis, Y. Wan, F. L. Lewis, and D. Cansever, editors, Handbook on
Reinforcement Learning and Control, Studies in Systems, Decision and Control series
(SSDC, volume 325). Springer, 2021.
27 / 40
References
References IV
[16] J. Dippon.
Accelerated randomized stochastic optimization.
The Annals of Statistics, 31(4):1260–1281, 2003.
[17] J. Dippon and J. Renz.
Weighted means in stochastic approximation of minima.
SIAM Journal on Control and Optimization, 35(5):1811–1827, 1997.
[18] C. S. Draper and Y. T. Li.
Principles of optimalizing control systems and an application to the internal combustion
engine.
American Society of Mechanical Engineers, 1951.
[19] A. Durmus, E. Moulines, A. Naumov, S. Samsonov, K. Scaman, and H.-T. Wai.
Tight high probability bounds for linear stochastic approximation with fixed step-size.
Advances in Neural Information Processing Systems and arXiv:2106.01257,
34:30063–30074, 2021.
[20] V. Fabian.
On the choice of design in stochastic approximation methods.
The Annals of Mathematical Statistics, pages 457–465, 1968.
28 / 40
References
References V
[21] A. Fradkov and B. T. Polyak.
Adaptive and robust control in the USSR.
IFAC–PapersOnLine, 53(2):1373–1378, 2020.
21th IFAC World Congress.
[22] T. Jaakola, M. Jordan, and S. Singh.
On the convergence of stochastic iterative dynamic programming algorithms.
Neural Computation, 6:1185–1201, 1994.
[23] C. Kalil Lauand and S. Meyn.
Approaching quartic convergence rates for quasi-stochastic approximation with application
to gradient-free optimization.
In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances
in Neural Information Processing Systems, volume 35, pages 15743–15756. Curran
Associates, Inc., 2022.
[24] H. K. Khalil.
Nonlinear systems.
Prentice-Hall, Upper Saddle River, NJ, 3rd edition, 2002.
29 / 40
References
References VI
[25] J. Kiefer and J. Wolfowitz.
Stochastic estimation of the maximum of a regression function.
Ann. Math. Statist., 23(3):462–466, September 1952.
[26] N. J. Killingsworth and M. Krstic.
Pid tuning using extremum seeking: online, model-free performance optimization.
IEEE control systems magazine, 26(1):70–79, 2006.
[27] P. Kokotović, H. K. Khalil, and J. O’Reilly.
Singular Perturbation Methods in Control: Analysis and Design.
Society for Industrial and Applied Mathematics, 1999.
[28] P. Kokotovic, R. O’Malley, and P. Sannuti.
Singular perturbations and order reduction in control theory — an overview.
Automatica, 12(2):123–132, 1976.
[29] V. Konda.
Actor-critic algorithms.
PhD thesis, Massachusetts Institute of Technology, 2002.
[30] V. R. Konda and J. N. Tsitsiklis.
On actor-critic algorithms.
SIAM J. Control Optim., 42(4):1143–1166 (electronic), 2003.
30 / 40
References
References VII
[31] M. Krstić and H.-H. Wang.
Stability of extremum seeking feedback for general nonlinear dynamic systems.
Automatica, 36(4):595–601, 2000.
[32] H. J. Kushner and G. G. Yin.
Stochastic approximation algorithms and applications, volume 35 of Applications of
Mathematics (New York).
Springer-Verlag, New York, 1997.
[33] B. Lapeybe, G. Pages, and K. Sab.
Sequences with low discrepancy generalisation and application to Robbins-Monro
algorithm.
Statistics, 21(2):251–272, 1990.
[34] J. Larson, M. Menickelly, and S. M. Wild.
Derivative-free optimization methods.
Acta Numerica, 28:287–404, 2019.
[35] S. Laruelle and G. Pagès.
Stochastic approximation with averaging innovation applied to finance.
Monte Carlo Methods and Applications, 18(1):1–51, 2012.
31 / 40
References
References VIII
[36] C. K. Lauand and S. Meyn.
Bias in stochastic approximation cannot be eliminated with averaging.
In Allerton Conference on Communication, Control, and Computing, pages 1–4, Sep. 2022.
[37] C. K. Lauand and S. Meyn.
Extremely fast convergence rates for extremum seeking control with Polyak-Ruppert
averaging.
arXiv 2206.00814, 2022.
[38] C. K. Lauand and S. Meyn.
Markovian foundations for quasi stochastic approximation with applications to extremum
seeking control.
arXiv 2207.06371, 2022.
[39] M. Le Blanc.
Sur l’electrification des chemins de fer au moyen de courants alternatifs de frequence elevee
[On the electrification of railways by means of alternating currents of high frequency].
Revue Generale de l’Electricite, 12(8):275–277, 1922.
[40] L. Lessard.
The analysis of optimization algorithms: A dissipativity approach.
IEEE Control Systems Magazine, 42(3):58–72, June 2022.
32 / 40
References
References IX
[41] S. Liu and M. Krstic.
Introduction to extremum seeking.
In Stochastic Averaging and Stochastic Extremum Seeking, Communications and Control
Engineering. Springer, London, 2012.
[42] L. Ljung.
Analysis of recursive stochastic algorithms.
Trans. on Automatic Control, 22(4):551–575, 1977.
[43] J. C. Luxat and L. H. Lees.
Stability of peak-holding control systems.
IEEE Transactions on Industrial Electronics and Control Instrumentation,
IECI-18(1):11–15, 1971.
[44] S. M. Meerkov.
Asymptotic methods for investigating a class of forced states in extremal systems.
Automation and Remote Control, 28(12):1916–1920, 1967.
[45] P. G. Mehta and S. P. Meyn.
Q-learning and Pontryagin’s minimum principle.
In Proc. of the Conf. on Dec. and Control, pages 3598–3605, Dec. 2009.
33 / 40
References
References X
[46] M. Métivier and P. Priouret.
Applications of a Kushner and Clark lemma to general classes of stochastic algorithms.
Trans. on Information Theory, 30(2):140–151, March 1984.
[47] M. Metivier and P. Priouret.
Theoremes de convergence presque sure pour une classe d’algorithmes stochastiques a pas
decroissants.
Prob. Theory Related Fields, 74:403–428, 1987.
[48] S. Meyn.
Control Systems and Reinforcement Learning.
Cambridge University Press, Cambridge, 2022.
[49] S. P. Meyn and R. L. Tweedie.
Markov chains and stochastic stability.
Cambridge University Press, Cambridge, second edition, 2009.
Published in the Cambridge Mathematical Library. 1993 edition online.
[50] H. Mohammadi, M. Razaviyayn, and M. R. Jovanović.
Robustness of accelerated first-order algorithms for strongly convex optimization problems.
IEEE Transactions on Automatic Control, 66(6):2480–2495, 2020.
34 / 40
References
References XI
[51] W. Mou, C. Junchi Li, M. J. Wainwright, P. L. Bartlett, and M. I. Jordan.
On linear stochastic approximation: Fine-grained Polyak-Ruppert and non-asymptotic
concentration.
Conference on Learning Theory and arXiv:2004.04719, pages 2947–2997, 2020.
[52] E. Moulines and F. R. Bach.
Non-asymptotic analysis of stochastic approximation algorithms for machine learning.
In Advances in Neural Information Processing Systems 24, pages 451–459, 2011.
[53] V. Obabkov.
Theory of multichannel extremal control systems with sinusoidal probe signals.
Automation and Remote Control, 28:48–54, 1967.
[54] R. Pasupathy and S. Ghosh.
Simulation optimization: A concise overview and implementation guide.
Theory Driven by Influential Applications, pages 122–150, 2013.
[55] B. T. Polyak.
Some methods of speeding up the convergence of iteration methods.
USSR Computational Mathematics and Mathematical Physics, 4(5):1–17, 1964.
35 / 40
References
References XII
[56] B. T. Polyak.
A new method of stochastic approximation type.
Avtomatika i telemekhanika (in Russian). translated in Automat. Remote Control, 51
(1991), pages 98–107, 1990.
[57] B. T. Polyak and A. B. Juditsky.
Acceleration of stochastic approximation by averaging.
SIAM J. Control Optim., 30(4):838–855, 1992.
[58] B. T. Polyak and A. B. Tsybakov.
Optimal order of accuracy of search algorithms in stochastic optimization.
Problemy Peredachi Informatsii (Prob. Inform. Trans.), 26(2):45–53, 1990.
[59] L. Rastrigin.
Extremum control by means of random scan.
Avtomat. i Telemekh, 21(9):1264–1271, 1960.
[60] L. A. Rastrigin.
Random search in problems of optimization, identification and training of control systems.
Journal of Cybernetics, 3(3):93–103, 1973.
36 / 40
References
References XIII
[61] H. Robbins and S. Monro.
A stochastic approximation method.
Annals of Mathematical Statistics, 22:400–407, 1951.
[62] D. Ruppert.
Efficient estimators from a slowly convergent Robbins-Monro processes.
Technical Report Tech. Rept. No. 781, Cornell University, School of Operations Research
and Industrial Engineering, Ithaca, NY, 1988.
[63] J. A. Sanders, F. Verhulst, and J. Murdock.
Averaging methods in nonlinear dynamical systems, volume 59.
Springer, 2007.
[64] P. J. Schweitzer.
Perturbation theory and finite Markov chains.
J. Appl. Prob., 5:401–403, 1968.
[65] S. Shirodkar and S. Meyn.
Quasi stochastic approximation.
In Proc. of the American Control Conf., pages 2429–2435, July 2011.
37 / 40
References
References XIV
[66] D. R. Smith.
Singular-perturbation theory: an introduction with applications.
Cambridge University Press, 1985.
[67] J. C. Spall.
Stochastic optimization.
In Handbook of computational statistics, pages 173–201. Springer, 2012.
[68] S. Surjanovic and D. Bingham.
Virtual library of simulation experiments: Test functions and datasets.
Retrieved May 16, 2022, from http://www.sfu.ca/~ssurjano.
[69] R. Sutton and A. Barto.
Reinforcement Learning: An Introduction.
MIT Press, Cambridge, MA, 2nd edition, 2018.
[70] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour.
Policy gradient methods for reinforcement learning with function approximation.
Advances in neural information processing systems, 12, 1999.
[71] Y. Tan, W. H. Moase, C. Manzie, D. Nešić, and I. Mareels.
Extremum seeking from 1922 to 2010.
In Proc. of the 29th Chinese control conference, pages 14–26. IEEE, 2010.
38 / 40
References
References XV
[72] Y. Tan, D. Nešić, and I. Mareels.
On non-local stability properties of extremum seeking control.
Automatica, 42(6):889–903, 2006.
[73] A. Teel and D. Popovic.
Solving smooth and nonsmooth multivariable extremum seeking problems by the methods
of nonlinear programming.
In Proceedings of the American Control Conference, volume 3, pages 2394–2399 vol.3,
June 2001.
[74] J. Tsitsiklis.
Asynchronous stochastic approximation and Q-learning.
Machine Learning, 16:185–202, 1994.
[75] J. N. Tsitsiklis and B. Van Roy.
An analysis of temporal-difference learning with function approximation.
IEEE Trans. Automat. Control, 42(5):674–690, 1997.
[76] Y. Z. Tsypkin and Z. J. Nikolic.
Adaptation and learning in automatic systems.
Academic Press, New York, 1971.
39 / 40
References
References XVI
[77] B. Van Roy.
Learning and Value Function Approximation in Complex Decision Processes.
PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, 1998.
AAI0599623.
[78] M. Vidyasagar.
A new converse Lyapunov theorem for global exponential stability and applications to
stochastic approximation.
In IEEE Trans. Automat. Control, pages 2319–2321. IEEE, 2022.
Extended version on arXiv:2205.01303.
[79] M. Vidyasagar.
Convergence of stochastic approximation via martingale and converse Lyapunov methods.
Mathematics of Control, Signals, and Systems, pages 1–24, 2023.
40 / 40

More Related Content

Similar to Quasi-Stochastic Approximation: Algorithm Design Principles with Applications to Machine Learning and Optimization.

Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsPK Lehre
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化Akira Tanimoto
 
Crib Sheet AP Calculus AB and BC exams
Crib Sheet AP Calculus AB and BC examsCrib Sheet AP Calculus AB and BC exams
Crib Sheet AP Calculus AB and BC examsA Jorge Garcia
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methodsChristian Robert
 
Bachelor_Defense
Bachelor_DefenseBachelor_Defense
Bachelor_DefenseTeja Turk
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
Fourier Transform in Signal and System of Telecom
Fourier Transform in Signal and System of TelecomFourier Transform in Signal and System of Telecom
Fourier Transform in Signal and System of TelecomAmirKhan877722
 
CAL1-CH0-NEW-CAL1-CH0-NEWCAL1-CH0-NEW.pdf
CAL1-CH0-NEW-CAL1-CH0-NEWCAL1-CH0-NEW.pdfCAL1-CH0-NEW-CAL1-CH0-NEWCAL1-CH0-NEW.pdf
CAL1-CH0-NEW-CAL1-CH0-NEWCAL1-CH0-NEW.pdfAbdallah Odeibat
 
JoeAr-LIMITSReports in Math- MAT 2024 403
JoeAr-LIMITSReports in Math- MAT 2024 403JoeAr-LIMITSReports in Math- MAT 2024 403
JoeAr-LIMITSReports in Math- MAT 2024 403JoelynRubio1
 
07 periodic functions and fourier series
07 periodic functions and fourier series07 periodic functions and fourier series
07 periodic functions and fourier seriesKrishna Gali
 
Joel Spencer – Finding Needles in Exponential Haystacks
Joel Spencer – Finding Needles in Exponential Haystacks Joel Spencer – Finding Needles in Exponential Haystacks
Joel Spencer – Finding Needles in Exponential Haystacks Yandex
 
Regret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationRegret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationTasuku Soma
 
Master method theorem
Master method theoremMaster method theorem
Master method theoremRajendran
 
Master method theorem
Master method theoremMaster method theorem
Master method theoremRajendran
 

Similar to Quasi-Stochastic Approximation: Algorithm Design Principles with Applications to Machine Learning and Optimization. (20)

Runtime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary AlgorithmsRuntime Analysis of Population-based Evolutionary Algorithms
Runtime Analysis of Population-based Evolutionary Algorithms
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
 
Signal Processing Homework Help
Signal Processing Homework HelpSignal Processing Homework Help
Signal Processing Homework Help
 
ENFPC 2010
ENFPC 2010ENFPC 2010
ENFPC 2010
 
Crib Sheet AP Calculus AB and BC exams
Crib Sheet AP Calculus AB and BC examsCrib Sheet AP Calculus AB and BC exams
Crib Sheet AP Calculus AB and BC exams
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
Bachelor_Defense
Bachelor_DefenseBachelor_Defense
Bachelor_Defense
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
QMC: Operator Splitting Workshop, Thresholdings, Robustness, and Generalized ...
QMC: Operator Splitting Workshop, Thresholdings, Robustness, and Generalized ...QMC: Operator Splitting Workshop, Thresholdings, Robustness, and Generalized ...
QMC: Operator Splitting Workshop, Thresholdings, Robustness, and Generalized ...
 
Fourier Transform in Signal and System of Telecom
Fourier Transform in Signal and System of TelecomFourier Transform in Signal and System of Telecom
Fourier Transform in Signal and System of Telecom
 
Lect7-Fourier-Transform.pdf
Lect7-Fourier-Transform.pdfLect7-Fourier-Transform.pdf
Lect7-Fourier-Transform.pdf
 
CAL1-CH0-NEW-CAL1-CH0-NEWCAL1-CH0-NEW.pdf
CAL1-CH0-NEW-CAL1-CH0-NEWCAL1-CH0-NEW.pdfCAL1-CH0-NEW-CAL1-CH0-NEWCAL1-CH0-NEW.pdf
CAL1-CH0-NEW-CAL1-CH0-NEWCAL1-CH0-NEW.pdf
 
JoeAr-LIMITSReports in Math- MAT 2024 403
JoeAr-LIMITSReports in Math- MAT 2024 403JoeAr-LIMITSReports in Math- MAT 2024 403
JoeAr-LIMITSReports in Math- MAT 2024 403
 
stochastic processes assignment help
stochastic processes assignment helpstochastic processes assignment help
stochastic processes assignment help
 
07 periodic functions and fourier series
07 periodic functions and fourier series07 periodic functions and fourier series
07 periodic functions and fourier series
 
Joel Spencer – Finding Needles in Exponential Haystacks
Joel Spencer – Finding Needles in Exponential Haystacks Joel Spencer – Finding Needles in Exponential Haystacks
Joel Spencer – Finding Needles in Exponential Haystacks
 
Regret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function MaximizationRegret Minimization in Multi-objective Submodular Function Maximization
Regret Minimization in Multi-objective Submodular Function Maximization
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Master method theorem
Master method theoremMaster method theorem
Master method theorem
 
Master method theorem
Master method theoremMaster method theorem
Master method theorem
 

More from Sean Meyn

DeepLearn2022 3. TD and Q Learning
DeepLearn2022 3. TD and Q LearningDeepLearn2022 3. TD and Q Learning
DeepLearn2022 3. TD and Q LearningSean Meyn
 
Smart Grid Tutorial - January 2019
Smart Grid Tutorial - January 2019Smart Grid Tutorial - January 2019
Smart Grid Tutorial - January 2019Sean Meyn
 
State Space Collapse in Resource Allocation for Demand Dispatch - May 2019
State Space Collapse in Resource Allocation for Demand Dispatch - May 2019State Space Collapse in Resource Allocation for Demand Dispatch - May 2019
State Space Collapse in Resource Allocation for Demand Dispatch - May 2019Sean Meyn
 
Irrational Agents and the Power Grid
Irrational Agents and the Power GridIrrational Agents and the Power Grid
Irrational Agents and the Power GridSean Meyn
 
State estimation and Mean-Field Control with application to demand dispatch
State estimation and Mean-Field Control with application to demand dispatchState estimation and Mean-Field Control with application to demand dispatch
State estimation and Mean-Field Control with application to demand dispatchSean Meyn
 
Demand-Side Flexibility for Reliable Ancillary Services
Demand-Side Flexibility for Reliable Ancillary ServicesDemand-Side Flexibility for Reliable Ancillary Services
Demand-Side Flexibility for Reliable Ancillary ServicesSean Meyn
 
Spectral Decomposition of Demand-Side Flexibility for Reliable Ancillary Serv...
Spectral Decomposition of Demand-Side Flexibility for Reliable Ancillary Serv...Spectral Decomposition of Demand-Side Flexibility for Reliable Ancillary Serv...
Spectral Decomposition of Demand-Side Flexibility for Reliable Ancillary Serv...Sean Meyn
 
Demand-Side Flexibility for Reliable Ancillary Services in a Smart Grid: Elim...
Demand-Side Flexibility for Reliable Ancillary Services in a Smart Grid: Elim...Demand-Side Flexibility for Reliable Ancillary Services in a Smart Grid: Elim...
Demand-Side Flexibility for Reliable Ancillary Services in a Smart Grid: Elim...Sean Meyn
 
Why Do We Ignore Risk in Power Economics?
Why Do We Ignore Risk in Power Economics?Why Do We Ignore Risk in Power Economics?
Why Do We Ignore Risk in Power Economics?Sean Meyn
 
Distributed Randomized Control for Ancillary Service to the Power Grid
Distributed Randomized Control for Ancillary Service to the Power GridDistributed Randomized Control for Ancillary Service to the Power Grid
Distributed Randomized Control for Ancillary Service to the Power GridSean Meyn
 
Ancillary service to the grid from deferrable loads: the case for intelligent...
Ancillary service to the grid from deferrable loads: the case for intelligent...Ancillary service to the grid from deferrable loads: the case for intelligent...
Ancillary service to the grid from deferrable loads: the case for intelligent...Sean Meyn
 
2012 Tutorial: Markets for Differentiated Electric Power Products
2012 Tutorial:  Markets for Differentiated Electric Power Products2012 Tutorial:  Markets for Differentiated Electric Power Products
2012 Tutorial: Markets for Differentiated Electric Power ProductsSean Meyn
 
Control Techniques for Complex Systems
Control Techniques for Complex SystemsControl Techniques for Complex Systems
Control Techniques for Complex SystemsSean Meyn
 
Tutorial for Energy Systems Week - Cambridge 2010
Tutorial for Energy Systems Week - Cambridge 2010Tutorial for Energy Systems Week - Cambridge 2010
Tutorial for Energy Systems Week - Cambridge 2010Sean Meyn
 
Panel Lecture for Energy Systems Week
Panel Lecture for Energy Systems WeekPanel Lecture for Energy Systems Week
Panel Lecture for Energy Systems WeekSean Meyn
 
The Value of Volatile Resources... Caltech, May 6 2010
The Value of Volatile Resources... Caltech, May 6 2010The Value of Volatile Resources... Caltech, May 6 2010
The Value of Volatile Resources... Caltech, May 6 2010Sean Meyn
 
Approximate dynamic programming using fluid and diffusion approximations with...
Approximate dynamic programming using fluid and diffusion approximations with...Approximate dynamic programming using fluid and diffusion approximations with...
Approximate dynamic programming using fluid and diffusion approximations with...Sean Meyn
 
Anomaly Detection Using Projective Markov Models
Anomaly Detection Using Projective Markov ModelsAnomaly Detection Using Projective Markov Models
Anomaly Detection Using Projective Markov ModelsSean Meyn
 
Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009Sean Meyn
 
Q-Learning and Pontryagin's Minimum Principle
Q-Learning and Pontryagin's Minimum PrincipleQ-Learning and Pontryagin's Minimum Principle
Q-Learning and Pontryagin's Minimum PrincipleSean Meyn
 

More from Sean Meyn (20)

DeepLearn2022 3. TD and Q Learning
DeepLearn2022 3. TD and Q LearningDeepLearn2022 3. TD and Q Learning
DeepLearn2022 3. TD and Q Learning
 
Smart Grid Tutorial - January 2019
Smart Grid Tutorial - January 2019Smart Grid Tutorial - January 2019
Smart Grid Tutorial - January 2019
 
State Space Collapse in Resource Allocation for Demand Dispatch - May 2019
State Space Collapse in Resource Allocation for Demand Dispatch - May 2019State Space Collapse in Resource Allocation for Demand Dispatch - May 2019
State Space Collapse in Resource Allocation for Demand Dispatch - May 2019
 
Irrational Agents and the Power Grid
Irrational Agents and the Power GridIrrational Agents and the Power Grid
Irrational Agents and the Power Grid
 
State estimation and Mean-Field Control with application to demand dispatch
State estimation and Mean-Field Control with application to demand dispatchState estimation and Mean-Field Control with application to demand dispatch
State estimation and Mean-Field Control with application to demand dispatch
 
Demand-Side Flexibility for Reliable Ancillary Services
Demand-Side Flexibility for Reliable Ancillary ServicesDemand-Side Flexibility for Reliable Ancillary Services
Demand-Side Flexibility for Reliable Ancillary Services
 
Spectral Decomposition of Demand-Side Flexibility for Reliable Ancillary Serv...
Spectral Decomposition of Demand-Side Flexibility for Reliable Ancillary Serv...Spectral Decomposition of Demand-Side Flexibility for Reliable Ancillary Serv...
Spectral Decomposition of Demand-Side Flexibility for Reliable Ancillary Serv...
 
Demand-Side Flexibility for Reliable Ancillary Services in a Smart Grid: Elim...
Demand-Side Flexibility for Reliable Ancillary Services in a Smart Grid: Elim...Demand-Side Flexibility for Reliable Ancillary Services in a Smart Grid: Elim...
Demand-Side Flexibility for Reliable Ancillary Services in a Smart Grid: Elim...
 
Why Do We Ignore Risk in Power Economics?
Why Do We Ignore Risk in Power Economics?Why Do We Ignore Risk in Power Economics?
Why Do We Ignore Risk in Power Economics?
 
Distributed Randomized Control for Ancillary Service to the Power Grid
Distributed Randomized Control for Ancillary Service to the Power GridDistributed Randomized Control for Ancillary Service to the Power Grid
Distributed Randomized Control for Ancillary Service to the Power Grid
 
Ancillary service to the grid from deferrable loads: the case for intelligent...
Ancillary service to the grid from deferrable loads: the case for intelligent...Ancillary service to the grid from deferrable loads: the case for intelligent...
Ancillary service to the grid from deferrable loads: the case for intelligent...
 
2012 Tutorial: Markets for Differentiated Electric Power Products
2012 Tutorial:  Markets for Differentiated Electric Power Products2012 Tutorial:  Markets for Differentiated Electric Power Products
2012 Tutorial: Markets for Differentiated Electric Power Products
 
Control Techniques for Complex Systems
Control Techniques for Complex SystemsControl Techniques for Complex Systems
Control Techniques for Complex Systems
 
Tutorial for Energy Systems Week - Cambridge 2010
Tutorial for Energy Systems Week - Cambridge 2010Tutorial for Energy Systems Week - Cambridge 2010
Tutorial for Energy Systems Week - Cambridge 2010
 
Panel Lecture for Energy Systems Week
Panel Lecture for Energy Systems WeekPanel Lecture for Energy Systems Week
Panel Lecture for Energy Systems Week
 
The Value of Volatile Resources... Caltech, May 6 2010
The Value of Volatile Resources... Caltech, May 6 2010The Value of Volatile Resources... Caltech, May 6 2010
The Value of Volatile Resources... Caltech, May 6 2010
 
Approximate dynamic programming using fluid and diffusion approximations with...
Approximate dynamic programming using fluid and diffusion approximations with...Approximate dynamic programming using fluid and diffusion approximations with...
Approximate dynamic programming using fluid and diffusion approximations with...
 
Anomaly Detection Using Projective Markov Models
Anomaly Detection Using Projective Markov ModelsAnomaly Detection Using Projective Markov Models
Anomaly Detection Using Projective Markov Models
 
Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009Markov Tutorial CDC Shanghai 2009
Markov Tutorial CDC Shanghai 2009
 
Q-Learning and Pontryagin's Minimum Principle
Q-Learning and Pontryagin's Minimum PrincipleQ-Learning and Pontryagin's Minimum Principle
Q-Learning and Pontryagin's Minimum Principle
 

Recently uploaded

💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...vershagrag
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptxrouholahahmadi9876
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptAfnanAhmad53
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesChandrakantDivate1
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 

Recently uploaded (20)

💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 

Quasi-Stochastic Approximation: Algorithm Design Principles with Applications to Machine Learning and Optimization.

  • 1. Quasi-Stochastic Approximation Algorithm Design Principles with Applications to Machine Learning and Optimization Caio Kalil Lauand (Joint work with Sean Meyn) Department of Electrical and Computer Engineering University of Florida Support from ARO award W911NF2010055 and NSF awards EPCN 1935389, CCF 2306023 is gratefully acknowledged.
  • 2. Agenda 1 Root-Finding Under Presence of Noise 2 Quasi-Stochastic Approximation 3 A Return to Extremum Seeking Control 4 Conclusions 5 Appendices 6 References 1 / 40
  • 3. Root-Finding Under Presence of Noise Optimization We have an objective Γ : Rd → R+ that we wish to minimize. First order condition for optimality of θ∗ ∈ Rd: ∇Γ(θ∗ ) = 0 2 / 40
  • 4. Root-Finding Under Presence of Noise Optimization We have an objective Γ : Rd → R+ that we wish to minimize. First order condition for optimality of θ∗ ∈ Rd: s f(θ∗ ) def = −∇Γ(θ∗ ) = 0 Our first root-finding problem 2 / 40
  • 5. Root-Finding Under Presence of Noise Optimization We have an objective Γ : Rd → R+ that we wish to minimize. First order condition for optimality of θ∗ ∈ Rd: s f(θ∗ ) def = −∇Γ(θ∗ ) = 0 Our first root-finding problem Challenges we will address: • In complex systems we might not have access to the gradient but only a noisy measurement: e ∇Γ(θn) = ∇Γ(θn) + Wn first order optimization, stochastic gradient descent... 2 / 40
  • 6. Root-Finding Under Presence of Noise Optimization We have an objective Γ : Rd → R+ that we wish to minimize. First order condition for optimality of θ∗ ∈ Rd: s f(θ∗ ) def = −∇Γ(θ∗ ) = 0 Our first root-finding problem Challenges we will address: • In complex systems we might not have access to the gradient but only a noisy measurement: e ∇Γ(θn) = ∇Γ(θn) + Wn first order optimization, stochastic gradient descent... • We might only have noisy measurements of the objective: Yn = Γ(θn) + Wn zeroth order optimization, gradient free optimization, extremum seeking control... 2 / 40
  • 7. Root-Finding Under Presence of Noise Gradient-Free Optimization How would we estimate θopt ∈ arg min θ Γ if we have access to Γ for any θ? 3 / 40
  • 8. Root-Finding Under Presence of Noise Gradient-Free Optimization How would we estimate θopt ∈ arg min θ Γ if we have access to Γ for any θ? For any fixed θ and a small ε > 0, let f(θ, ξ) = − 1 ε ξΓ(θ + εξ) where ξ is zero-mean. =⇒ s f(θ) := E[f(θ, ξ)] approximates −∇Γ (θ) 3 / 40
  • 9. Root-Finding Under Presence of Noise Gradient-Free Optimization How would we estimate θopt ∈ arg min θ Γ if we have access to Γ for any θ? f(θ, ξ) = −1 ε ξΓ(θ + εξ) with ξ is zero-mean. 3 / 40
  • 10. Root-Finding Under Presence of Noise Gradient-Free Optimization How would we estimate θopt ∈ arg min θ Γ if we have access to Γ for any θ? f(θ, ξ) = −1 ε ξΓ(θ + εξ) with ξ is zero-mean. • A bit of Taylor series... f(θ, ξ) = − 1 ε ξΓ(θ) − ξξ⊺ ∇Γ(θ) + O(ε) 3 / 40
  • 11. Root-Finding Under Presence of Noise Gradient-Free Optimization How would we estimate θopt ∈ arg min θ Γ if we have access to Γ for any θ? f(θ, ξ) = −1 ε ξΓ(θ + εξ) with ξ is zero-mean. • A bit of Taylor series... f(θ, ξ) = − 1 ε ξΓ(θ) − ξξ⊺ ∇Γ(θ) + O(ε) • Taking expectations of both sides yields E[f(θ, ξ)] = −Cov(ξ)∇Γ(θ) + O(ε) 3 / 40
  • 12. Root-Finding Under Presence of Noise Gradient-Free Optimization How would we estimate θopt ∈ arg min θ Γ if we have access to Γ for any θ? f(θ, ξ) = −1 ε ξΓ(θ + εξ) with ξ is zero-mean. • A bit of Taylor series... f(θ, ξ) = − 1 ε ξΓ(θ) − ξξ⊺ ∇Γ(θ) + O(ε) • Taking expectations of both sides yields s f(θ) := E[f(θ, ξ)] = −Cov(ξ)∇Γ(θ) + O(ε) 3 / 40
  • 13. Root-Finding Under Presence of Noise Gradient-Free Optimization How would we estimate θopt ∈ arg min θ Γ if we have access to Γ for any θ? 1SPSA: f(θ, ξ) = −1 ε ξΓ(θ + εξ) with ξ is zero-mean. • A bit of Taylor series... f(θ, ξ) = − 1 ε ξΓ(θ) − ξξ⊺ ∇Γ(θ) + O(ε) • Taking expectations of both sides yields s f(θ) := E[f(θ, ξ)] = −Cov(ξ)∇Γ(θ) + O(ε) s f(θ∗ ) = 0 , s f(θopt ) = O(ε) see Spall [67] and Ariyur & Krstić [3]. 3 / 40
  • 14. Root-Finding Under Presence of Noise How do we implement this algorithm? • The vector θ∗ can be estimated recursively through θn+1 = θn − α 1 ε ξn+1Γ(θn + εξn+1) 4 / 40
  • 15. Root-Finding Under Presence of Noise How do we implement this algorithm? • The vector θ∗ can be estimated recursively through θn+1 = θn − α 1 ε ξn+1Γ(θn + εξn+1) ⋄ α > 0 is a constant independent of n. 4 / 40
  • 16. Root-Finding Under Presence of Noise How do we implement this algorithm? • The vector θ∗ can be estimated recursively through θn+1 = θn − α 1 ε ξn+1Γ(θn + εξn+1) ⋄ α > 0 is a constant independent of n. ⋄ {ξn} is a zero-mean sequence. Spall takes this to be i.i.d., entries ±1 4 / 40
  • 17. Root-Finding Under Presence of Noise How do we implement this algorithm? • The vector θ∗ can be estimated recursively through θn+1 = θn − α 1 ε ξn+1Γ(θn + εξn+1) • Deterministic counterpart, as the ODE d dt Θt = −α 1 ε ξtΓ(Θt + εξt) 4 / 40
  • 18. Root-Finding Under Presence of Noise How do we implement this algorithm? • The vector θ∗ can be estimated recursively through θn+1 = θn − α 1 ε ξn+1Γ(θn + εξn+1) • Deterministic counterpart, as the ODE ESC-0: d dt Θt = −α 1 ε ξtΓ(Θt + εξt) This is the simplest Extremum Seeking Control (ESC) ODE 4 / 40
  • 19. Root-Finding Under Presence of Noise Extremum Seeking Control • Being born in the 1920s, Extremum seeking control (ESC) is said to be the oldest approach to gradient-free optimization. • A typical architecture for ESC for optimization is illustrated below: 5 / 40
  • 20. Root-Finding Under Presence of Noise Extremum Seeking Control ESC-0 • The high-pass (HP) filter is removed entirely: ξ̌t = ξt Y̌n t = Yn t = 1 ε Γ(Θt + εξt) 5 / 40
  • 21. Root-Finding Under Presence of Noise Extremum Seeking Control ESC-0 • The low-pass (LP) filter is an integrator: d dt Θt = −αM e ∇tΓ = −αξt 1 ε Γ(Θt + εξt) 5 / 40
  • 22. d dt Θt = atf(Θt, ξt) Θt ξt Quasi-Stochastic Approximation
  • 23. Quasi-Stochastic Approximation Zooming Out • In quasi-stochastic approximation, ξ is a smooth deterministic process, θn+1 = θn + αn+1f(θn, ξn+1) The probing signal ξ is typically chosen as: ξt = G(Φt) where Φ ∈ CK with entries Φi t = exp(2πj[ωit + ϕi]) and {ωi} distinct. 6 / 40
  • 24. Quasi-Stochastic Approximation Zooming Out • In quasi-stochastic approximation, ξ is a smooth deterministic process, θn+1 = θn + αn+1f(θn, ξn+1) The probing signal ξ is typically chosen as: ξt = G(Φt) where Φ ∈ CK with entries Φi t = exp(2πj[ωit + ϕi]) and {ωi} distinct. • Expressed as ODEs for ease of analysis, QSA ODE: d dt Θt = atf(Θt, ξt) Common choices for {at} include: ⋄ Vanishing gain: at = (t + 1)−ρ with ρ ∈ (1/2, 1) ⋄ Constant gain: at ≡ α > 0 for all t 6 / 40
  • 25. Quasi-Stochastic Approximation Quasi-Stochastic Approximation d dt Θt = atf(Θt, ξt) • Algorithm design and analysis are based upon another ODE, Mean Flow: d dt ϑt = s f(ϑt) s f(θ) := lim T→∞ 1 T Z T 0 f(θ, ξt) dt 7 / 40
  • 26. Quasi-Stochastic Approximation Quasi-Stochastic Approximation d dt Θt = atf(Θt, ξt) • Algorithm design and analysis are based upon another ODE, Mean Flow: d dt ϑt = s f(ϑt) s f(θ) := lim T→∞ 1 T Z T 0 f(θ, ξt) dt If stable: ϑt → θ∗ and s f(ϑt) → s f(θ∗ ) = 0 7 / 40
  • 27. Quasi-Stochastic Approximation Quasi-Stochastic Approximation d dt Θt = atf(Θt, ξt) • Algorithm design and analysis are based upon another ODE, Mean Flow: d dt ϑt = s f(ϑt) s f(θ) := lim T→∞ 1 T Z T 0 f(θ, ξt) dt If stable: ϑt → θ∗ and s f(ϑt) → s f(θ∗ ) = 0 • {Θt} couples with solutions of the mean flow under general conditions. 7 / 40
  • 28. Quasi-Stochastic Approximation Quasi-Stochastic Approximation d dt Θt = atf(Θt, ξt) • Algorithm design and analysis are based upon another ODE, Mean Flow: d dt ϑt = s f(ϑt) s f(θ) := lim T→∞ 1 T Z T 0 f(θ, ξt) dt If stable: ϑt → θ∗ and s f(ϑt) → s f(θ∗ ) = 0 • {Θt} couples with solutions of the mean flow under general conditions. Caveat: Global Lipschitz continuity of f is crucial. Always assumed. 7 / 40
  • 29. Quasi-Stochastic Approximation Quasi-Stochastic Approximation d dt Θt = atf(Θt, ξt) • Algorithm design and analysis are based upon another ODE, Mean Flow: d dt ϑt = s f(ϑt) s f(θ) := lim T→∞ 1 T Z T 0 f(θ, ξt) dt If stable: ϑt → θ∗ and s f(ϑt) → s f(θ∗ ) = 0 • {Θt} couples with solutions of the mean flow under general conditions. Caveat: Global Lipschitz continuity of f is crucial. Always assumed. • Pertubative mean flow #1: d dt Θt = at[ s f(Θt) + e Ξt] , e Ξt := f(Θt, ξt) − s f(Θt) 7 / 40
  • 30. Quasi-Stochastic Approximation Quasi-Stochastic Approximation d dt Θt = atf(Θt, ξt) • Algorithm design and analysis are based upon another ODE, Mean Flow: d dt ϑt = s f(ϑt) s f(θ) := lim T→∞ 1 T Z T 0 f(θ, ξt) dt If stable: ϑt → θ∗ and s f(ϑt) → s f(θ∗ ) = 0 • {Θt} couples with solutions of the mean flow under general conditions. Caveat: Global Lipschitz continuity of f is crucial. Always assumed. • Pertubative mean flow #1: d dt Θt = at[ s f(Θt) + e Ξt] , e Ξt := f(Θt, ξt) − s f(Θt) Can we do any better? 7 / 40
  • 31. Quasi-Stochastic Approximation Métivier and Priouret To The Rescue! • Pertubative mean (p-mean) flow #1: d dt Θt = at[ s f(Θt) + e Ξt] , e Ξt := f(Θt, ξt) − s f(Θt) • Representation for e Ξ based on solutions to Poisson’s equation. First instance, solution ˆ f with forcing function f: d dt ˆ f(θ, Φt) = −[f(θ, ξt) − s f(θ)] , θ ∈ Rd 8 / 40
  • 32. Quasi-Stochastic Approximation Métivier and Priouret To The Rescue! • Pertubative mean (p-mean) flow #1: d dt Θt = at[ s f(Θt) + e Ξt] , e Ξt := f(Θt, ξt) − s f(Θt) • Representation for e Ξ based on solutions to Poisson’s equation. First instance, solution ˆ f with forcing function f: d dt ˆ f(θ, Φt) = −[f(θ, ξt) − s f(θ)] , θ ∈ Rd =⇒ d dt ˆ f(Θt, Φt) = −e Ξt + ∂θ ˆ f(Θt, Φt) · d dt Θt 8 / 40
  • 33. Quasi-Stochastic Approximation Métivier and Priouret To The Rescue! • Pertubative mean (p-mean) flow #1: d dt Θt = at[ s f(Θt) + e Ξt] , e Ξt := f(Θt, ξt) − s f(Θt) • Representation for e Ξ based on solutions to Poisson’s equation. First instance, solution ˆ f with forcing function f: d dt ˆ f(θ, Φt) = −[f(θ, ξt) − s f(θ)] , θ ∈ Rd =⇒ d dt ˆ f(Θt, Φt) = −e Ξt + ∂θ ˆ f(Θt, Φt)[atf(Θt, ξt)] e Ξt = zero mean + small 8 / 40
  • 34. Quasi-Stochastic Approximation Métivier and Priouret To The Rescue! • Pertubative mean (p-mean) flow #1: d dt Θt = at[ s f(Θt) + e Ξt] , e Ξt := f(Θt, ξt) − s f(Θt) • Representation for e Ξ based on solutions to Poisson’s equation. First instance, solution ˆ f with forcing function f: d dt ˆ f(θ, Φt) = −[f(θ, ξt) − s f(θ)] , θ ∈ Rd =⇒ d dt ˆ f(Θt, Φt) = −e Ξt + ∂θ ˆ f(Θt, Φt)[atf(Θt, ξt)] e Ξt = zero mean + small • Borrowed from the stochastic approximation literature: disturbance decomposition introduced by Métivier and Priouret. 8 / 40
  • 35. Quasi-Stochastic Approximation Perturbative Mean Flow The perturbative mean (p-mean) flow representation d dt Θt = at[ s f(Θt) + e Ξt] e Ξt = −at s Υ(Θt) + 2 X i=0 a2−i t di dti Wi t where {s Υt, Wi t : i = 0, 1, 2} are smooth deterministic functions of (Θt, Φt) admitting representations in terms of solutions to Poisson’s equation. • Opens doors for analysis: transient bounds and filter design. 9 / 40
  • 36. Quasi-Stochastic Approximation Perturbative Mean Flow The perturbative mean (p-mean) flow representation d dt Θt = at[ s f(Θt) + e Ξt] e Ξt = −at s Υ(Θt) + 2 X i=0 a2−i t di dti Wi t where {s Υt, Wi t : i = 0, 1, 2} are smooth deterministic functions of (Θt, Φt) admitting representations in terms of solutions to Poisson’s equation. • Opens doors for analysis: transient bounds and filter design. What is s Υ? It appears with multiplicative noise: s Υ(θ) := − lim T→∞ 1 T Z T 0 ∂θ ˆ f(θ, Φt)f(θ, ξt) dt 9 / 40
  • 37. Quasi-Stochastic Approximation Convergence and Acceleration d dt Θt = atf(Θt, ξt) • When at = (1 + t)−ρ with ρ ∈ (1/2, 1), Θt = θ∗ + at[A∗ ]−1 s Υ∗ + nicet o ⇒ ∥Θt − θ∗ ∥2 = O(a2 t ) where s Υ∗ = s Υ(θ∗) and A∗ = ∂θ s f(θ∗). 10 / 40
  • 38. Quasi-Stochastic Approximation Convergence and Acceleration d dt Θt = atf(Θt, ξt) • When at = (1 + t)−ρ with ρ ∈ (1/2, 1), Θt = θ∗ + at[A∗ ]−1 s Υ∗ + nicet o ⇒ ∥Θt − θ∗ ∥2 = O(a2 t ) where s Υ∗ = s Υ(θ∗) and A∗ = ∂θ s f(θ∗). • Convergence is accelerated through Polyak-Ruppert (PR) averaging ΘPR T = 1 T − δT Z T δT Θt dt , δ ∈ (0, 1) 10 / 40
  • 39. Quasi-Stochastic Approximation Convergence and Acceleration d dt Θt = atf(Θt, ξt) • When at = (1 + t)−ρ with ρ ∈ (1/2, 1), Θt = θ∗ + at[A∗ ]−1 s Υ∗ + nicet o ⇒ ∥Θt − θ∗ ∥2 = O(a2 t ) where s Υ∗ = s Υ(θ∗) and A∗ = ∂θ s f(θ∗). • Convergence is accelerated through Polyak-Ruppert (PR) averaging ΘPR T = 1 T − δT Z T δT Θt dt , δ ∈ (0, 1) • Extremely fast rates are obtained: ΘPR T = θ∗ + O(aT ∥s Υ∗ ∥) + O(a2 T ) ⇒ ∥ΘPR T − θ∗ ∥2 = O(a4 T ) | {z } If s Υ∗=0 10 / 40
  • 40. Quasi-Stochastic Approximation Killing s Υ∗ Φi t = exp(2πj[ωit + ϕi]) Clever Probing design ⋄ Design ξ so that ξt = G(Φt) with G analytic and choose frequencies {ω1 , . . . , ωK} satisfying, ωi = log(ai/bi) > 0 , {ai, bi} positive integers. 11 / 40
  • 41. Quasi-Stochastic Approximation Killing s Υ∗ Φi t = exp(2πj[ωit + ϕi]) Clever Probing design ⋄ Design ξ so that ξt = G(Φt) with G analytic and choose frequencies {ω1 , . . . , ωK} satisfying, ωi = log(ai/bi) > 0 , {ai, bi} positive integers. Cleverness # 1: existence of solutions to Poisson’s equation. • Solutions can be represented as sums of integrals Z t 0 exp(2πj[ω◦ t + ϕ◦ ]) dt ω◦ = n1ω1 + · · · nKωK. =⇒ Require bounds on 1/ω◦ 11 / 40
  • 42. Quasi-Stochastic Approximation Killing s Υ∗ Φi t = exp(2πj[ωit + ϕi]) Clever Probing design ⋄ Design ξ so that ξt = G(Φt) with G analytic and choose frequencies {ω1 , . . . , ωK} satisfying, ωi = log(ai/bi) > 0 , {ai, bi} positive integers. Cleverness # 1: existence of solutions to Poisson’s equation. • Solutions can be represented as sums of integrals Z t 0 exp(2πj[ω◦ t + ϕ◦ ]) dt ω◦ = n1ω1 + · · · nKωK. =⇒ Require bounds on 1/ω◦ Great lower bounds on |ω◦| from Baker’s Theorem. 11 / 40
  • 43. Quasi-Stochastic Approximation Killing s Υ∗ Φi t = exp(2πj[ωit + ϕi]) Clever Probing design ⋄ Design ξ so that ξt = G(Φt) with G analytic and choose frequencies {ω1 , . . . , ωK} satisfying, ωi = log(ai/bi) > 0 , {ai, bi} positive integers. Cleverness # 1: existence of solutions to Poisson’s equation. • Solutions can be represented as sums of integrals =⇒ Require bounds on 1/ω◦ Great lower bounds on |ω◦| from Baker’s Theorem. Cleverness # 2: ĝ ⊥ h for smooth functions g, h of the probing signal. 11 / 40
  • 44. Quasi-Stochastic Approximation Killing s Υ∗ Φi t = exp(2πj[ωit + ϕi]) Clever Probing design ⋄ Design ξ so that ξt = G(Φt) with G analytic and choose frequencies {ω1 , . . . , ωK} satisfying, ωi = log(ai/bi) > 0 , {ai, bi} positive integers. Cleverness # 1: existence of solutions to Poisson’s equation. • Solutions can be represented as sums of integrals =⇒ Require bounds on 1/ω◦ Great lower bounds on |ω◦| from Baker’s Theorem. Cleverness # 2: ĝ ⊥ h for smooth functions g, h of the probing signal. s Υi(θ) = X j ⟨ĝi,j, hj⟩, with g = ∂θf and h = f. 11 / 40
  • 45. Quasi-Stochastic Approximation Killing s Υ∗ Φi t = exp(2πj[ωit + ϕi]) Clever Probing design ⋄ Design ξ so that ξt = G(Φt) with G analytic and choose frequencies {ω1 , . . . , ωK} satisfying, ωi = log(ai/bi) > 0 , {ai, bi} positive integers. Cleverness # 1: existence of solutions to Poisson’s equation. • Solutions can be represented as sums of integrals =⇒ Require bounds on 1/ω◦ Great lower bounds on |ω◦| from Baker’s Theorem. Cleverness # 2: ĝ ⊥ h for smooth functions g, h of the probing signal. s Υi(θ) = X j ⟨ĝi,j, hj⟩, with g = ∂θf and h = f. = 0 11 / 40
  • 46. Quasi-Stochastic Approximation Noise vs Quasi-Noise Example: Optimization of Rastrigin’s objective through 1SPSA: f(θ, ζ) = − 1 ε ζΓ(θ + εζ) ESC-0: f(θ, ξ) = − 1 ε ξΓ(θ + εξ) where ζ is i.i.d and ξ is a mixture of sinusoids, Cov(ξ) = Cov(ζ). 1SPSA ESC-0 ESC-0 Dimension 2 Dimension 1 0 1 2 2 -2 -1 0 -2 4 2 8 6 ×10−4 ×10 4 -2 40% outliers removed 0 1 -2 -1 0 4 2 8 6 ×10−4 ×10−2 ×10−2 0 2 4 6 8 0 1 2 5 20 40 5 60 0 80 0 -5 -5 Γ(θ) θ1 θ2 T PR Averaging 105 1 a2 T σT 12 / 40
  • 47. Quasi-Stochastic Approximation Fixed Gain Algorithms for QSA • The QSA ODE with fixed gain is old news! (recall the averaging principle) d dt Θt = αf(Θt, ξt) , α > 0 see Khalil [24]. 13 / 40
  • 48. Quasi-Stochastic Approximation Fixed Gain Algorithms for QSA • The QSA ODE with fixed gain is old news! (recall the averaging principle) d dt Θt = αf(Θt, ξt) , α > 0 see Khalil [24]. • Motivation is tracking: { s ft} ⇒ {θ∗ t } 13 / 40
  • 49. Quasi-Stochastic Approximation Fixed Gain Algorithms for QSA • The QSA ODE with fixed gain is old news! (recall the averaging principle) d dt Θt = αf(Θt, ξt) , α > 0 see Khalil [24]. • Motivation is tracking: { s ft} ⇒ {θ∗ t } • Without averaging, MSE is lim sup t→∞ ∥Θt − θ∗ ∥2 = O(α2 ) 13 / 40
  • 50. Quasi-Stochastic Approximation Fixed Gain Algorithms for QSA • The QSA ODE with fixed gain is old news! (recall the averaging principle) d dt Θt = αf(Θt, ξt) , α > 0 see Khalil [24]. • Motivation is tracking: { s ft} ⇒ {θ∗ t } • Without averaging, MSE is lim sup t→∞ ∥Θt − θ∗ ∥2 = O(α2 ) • A p-mean flow representation inspires a low pass filter to obtain lim sup t→∞ ∥ΘF t − θ∗ ∥2 = O(α4 ) 13 / 40
  • 51. Quasi-Stochastic Approximation Filtering d dt Θt = αf(Θt, ξt) Second order filter is required d2 dt2 ΘF t + 2γζ d dt ΘF t + γ2 ΘF t = γ2 Θt with γ = O(α) and ζ ∈ (0, 1). 14 / 40
  • 52. Quasi-Stochastic Approximation Filtering d dt Θt = αf(Θt, ξt) Second order filter is required d2 dt2 ΘF t + 2γζ d dt ΘF t + γ2 ΘF t = γ2 Θt with γ = O(α) and ζ ∈ (0, 1). 14 / 40
  • 53. Quasi-Stochastic Approximation Filtering d dt Θt = αf(Θt, ξt) Second order filter is required d2 dt2 ΘF t + 2γζ d dt ΘF t + γ2 ΘF t = γ2 Θt with γ = O(α) and ζ ∈ (0, 1). Obtain from p-mean flow lim sup t→∞ ∥ΘF t − θ∗ ∥2 = O(α2 ∥s Υ∗ ∥) + O(α4 ) 14 / 40
  • 54. Quasi-Stochastic Approximation Filtering d dt Θt = αf(Θt, ξt) Second order filter is required d2 dt2 ΘF t + 2γζ d dt ΘF t + γ2 ΘF t = γ2 Θt with γ = O(α) and ζ ∈ (0, 1). Obtain from p-mean flow lim sup t→∞ ∥ΘF t − θ∗ ∥2 = O(α2 ∥s Υ∗ ∥) + O(α4 ) Recall s Υ can be eliminated with careful design of ξ. 14 / 40
  • 55. Quasi-Stochastic Approximation Error Attenuation in Tracking ESC-0: f(Θt, ξt) = − 1 ε ξtΓt(Θt + εξt) ΘF t t θopt Θt Γ(θ − − θopt t ) Traveling Camel Transients: First 20% Final 80% ESC-0 with and without filtering 15 / 40
  • 56. A Return to Extremum Seeking Control Lipschitz Continuity Matters! f(θ, ξ) = −1 ε ξΓ(θ + εξ) Recall: global Lipschitz continuity of f is always assumed 16 / 40
  • 57. A Return to Extremum Seeking Control Lipschitz Continuity Matters! f(θ, ξ) = −1 ε ξΓ(θ + εξ) Recall: global Lipschitz continuity of f is always assumed This is rarely the case in pratice ! (consider Γ quadratic) 16 / 40
  • 58. A Return to Extremum Seeking Control Lipschitz Continuity Matters! f(θ, ξ) = −1 ε ξΓ(θ + εξ) Recall: global Lipschitz continuity of f is always assumed This is rarely the case in pratice ! (consider Γ quadratic) Finite escape time is typical 16 / 40
  • 59. A Return to Extremum Seeking Control Lipschitz Continuity Matters! f(θ, ξ) = −1 ε ξΓ(θ + εξ) Recall: global Lipschitz continuity of f is always assumed This is rarely the case in pratice ! (consider Γ quadratic) Finite escape time is typical State-dependent probing gain: f(θ, ξ) = − 1 ϵ(θ) ξΓ(θ + ϵ(θ)ξ) 16 / 40
  • 60. A Return to Extremum Seeking Control Lipschitz Continuity Matters! f(θ, ξ) = − 1 ϵ(θ) ξΓ(θ + ϵ(θ)ξ) Recall: global Lipschitz continuity of f is always assumed This is rarely the case in pratice ! (consider Γ quadratic) Finite escape time is typical State-dependent probing gain: f(θ, ξ) = − 1 ϵ(θ) ξΓ(θ + ϵ(θ)ξ) Examples: ϵ(θ) = ε p 1 + Γ(θ) [WLOG Γ ≥ 0] 16 / 40
  • 61. A Return to Extremum Seeking Control Lipschitz Continuity Matters! f(θ, ξ) = − 1 ϵ(θ) ξΓ(θ + ϵ(θ)ξ) Recall: global Lipschitz continuity of f is always assumed This is rarely the case in pratice ! (consider Γ quadratic) Finite escape time is typical State-dependent probing gain: f(θ, ξ) = − 1 ϵ(θ) ξΓ(θ + ϵ(θ)ξ) Examples: ϵ(θ) = ε p 1 + Γ(θ) [WLOG Γ ≥ 0] ϵ(θ) = ε q 1 + ∥θ − θctr∥2/σ2 p 16 / 40
  • 62. A Return to Extremum Seeking Control Lipschitz Continuity Matters! f(θ, ξ) = − 1 ϵ(θ) ξΓ(θ + ϵ(θ)ξ) Recall: global Lipschitz continuity of f is always assumed This is rarely the case in pratice ! (consider Γ quadratic) Finite escape time is typical State-dependent probing gain: f(θ, ξ) = − 1 ϵ(θ) ξΓ(θ + ϵ(θ)ξ) Examples: ϵ(θ) = ε p 1 + Γ(θ) [WLOG Γ ≥ 0] ϵ(θ) = ε q 1 + ∥θ − θctr∥2/σ2 p • The algorithm is globally stable subject to coercivity of Γ ⊕ Lipschitz gradient • It makes sense to explore more when Γ(θ) is big! 16 / 40
  • 63. A Return to Extremum Seeking Control Finite Escape Time for ESC f(θ, ξ) = − 1 ϵ(θ) ξΓ(θ + ϵ(θ)ξ) • Consider the scalar-valued objective Γ(θ) = θ2 • ESC-0 was applied with ξt = 2 cos( t 4) and at = (t + 1)−0.8 a) ϵ = 0.5 , b) ϵ(θ) = 0.5 p 1 + ∥θ∥2 10 10 2 8 6 4 2 0 8 4 0 -4 -8 0 2 4 6 8 10 -140 -100 -60 -20 20 10 -1 17 / 40
  • 64. A Return to Extremum Seeking Control Vanishing vs Fixed Gain Algorithms for Optimization a) at = 0.1(1 + t)−0.65 , b) αb = 3 × 10−3 , c) αs = 7 × 10−4 100 10−10 10−5 100 10−10 10−5 θ1 θ2 Γ(Θt) Vanishing Gain Constant Gain Θt Θt PR Θt Θt F1 Θt F2 at = a0(1 + t)−ρ αb αs t × 104 Evolution of Estimates Evolution of Cost 18 / 40
  • 65. Conclusions Steps to a Succesful Design: 1) Design s f so the mean flow ϑ̇ = s f(ϑ) is GAS to θ∗ and ∂ s f(θ∗) is Hurwitz. 2) Design step-size: αn = n−ρ with 1 2 ≤ ρ < 1 for optimization. αn ≡ α > 0 for tracking. 3) Use “clever” exploration when applicable. 4) Perform PR Averaging or filtering with bandwidth O(α). 5) Repeat! Obtain histograms for M runs {TN θPR N (m) : 1 ≤ m ≤ M} TN = (N − N0)2ρ with θPR 0 (m) widely dispersed. 19 / 40
  • 66. Conclusions Final Thoughts • A simple averaging (or filtering) trick can greatly improve algorithmic performance: ⋄ Vanishing gain: MSE vanishes at rates arbitrarily close to O(n−4). ⋄ Fixed gain: In general, MSE bounds are O(α4). • The incorporation of a state-dependent probing gain leads to ESC-0 algorithms that are globally stable (and more). 20 / 40
  • 67. Conclusions Final Thoughts • A simple averaging (or filtering) trick can greatly improve algorithmic performance: ⋄ Vanishing gain: MSE vanishes at rates arbitrarily close to O(n−4). ⋄ Fixed gain: In general, MSE bounds are O(α4). • The incorporation of a state-dependent probing gain leads to ESC-0 algorithms that are globally stable (and more). • Work in progress: two time-scale QSA comes up for ESC and elsewhere. The p-mean flow representation should be done by today’s CDC deadline! Thank you! Financial support from ARO award W911NF2010055 and NSF award EPCN 1935389 is gratefully acknowledged. 20 / 40
  • 68. Appendices Assumptions for QSA (A1) VG: The process a is non-negative, monotonically decreasing, and lim t→∞ at = 0, Z ∞ 0 ar dr = ∞. BG: For all t, the gain process satisfies at ≡ α > 0 for some 0 < α < α0 < 1. (A2) The functions s f and f are Lipschitz continuous: for a constant Lf < ∞, ∥ s f(θ′ ) − s f(θ)∥ ≤ Lf ∥θ′ − θ∥, ∥f(θ′ , ξ) − f(θ, ξ)∥ + ∥f(θ, ξ′ ) − f(θ, ξ)∥ ≤ Lf [∥θ′ − θ∥ + ∥ξ′ − ξ∥] , θ′ , θ ∈ Rd , ξ, ξ′ ∈ Rm 21 / 40
  • 69. Appendices Assumptions for QSA (A3) The ODE d dtϑt = s f(ϑt) is globally asymptotically stable with unique equilibrium θ∗. Moreover, one of the following conditions holds: (a) There is a Lipschitz continuous Lyapunov function V : Rd → R+, a constant δ0 > 0 and a compact set S such that ∇V (ϑt) · s f(ϑt) ≤ −δ0∥ϑt∥ whenever ϑt / ∈ S. (b) The scaled vector field s f∞ : Rd → Rd defined by s f∞(θ) := limc→∞ s f(cθ)/c, θ ∈ Rd, exists as a continuous function. Moreover, the ODE@∞ defined by d dt xt = s f∞(xt) is globally asymptotically sta- ble [48, §4.8.4]. (A4) The vector field s f is differentiable, with derivative denoted Ā(θ) = ∂θ s f (θ). That is, Ā(θ) is a d × d matrix for each θ ∈ Rd, with Āi,j(θ) = ∂ ∂θj s fi (θ). Moreover, the derivative Ā is Lipschitz continuous, and Ā∗ = Ā(θ∗) is Hurwitz. 22 / 40
  • 70. Appendices Assumptions for QSA (A5) Φ is the state process for a dynamical system d dt Φt = H(Φt), H : Ω → Ω with unique invariant measure π. It satisfies the following ergodic theorems for the functions of interest, for each initial condition Φ0 ∈ Ω: (i) For each θ there exists a solution ˆ f(θ, · ) to Poisson’s equation with forcing function f. That is, ˆ f(θ, Φt0 ) = Z t1 t0 [f(θ, ξt) − s f(θ)] dt + ˆ f(θ, Φt1 ) , 0 ≤ t0 ≤ t1 and for each θ, R Ω ˆ f(θ, z) π(dz) = 0. Finally, ˆ f is continuously differentiable (C1) on Rd × Ω. Its Jacobian with respect to θ is denoted b A(θ, z) := ∂θ ˆ f(θ, z) where Z Ω b A(θ, z) π(dz) = 0 for each θ ∈ Rd 23 / 40
  • 71. Appendices Assumptions for QSA (ii) For each θ, there are C1 solutions to Poisson’s equation with forcing functions ˆ f and Υ. They are denoted ˆ ˆ f and b Υ, respectively, satisfying ˆ ˆ f(θ, Φt0 ) = Z t1 t0 ˆ f(θ, ξt) dt + ˆ ˆ f(θ, Φt1 ) b Υ(θ, Φt0 ) = Z t1 t0 [Υ(θ, Φt) − s Υ(θ)] dt + b Υ(θ, Φt1 ) , 0 ≤ t0 ≤ t1 with s Υ(θ) = − Z Ω b A(θ, z)f(θ, G(z)) π(dz) Moreover, for each θ, Z Ω ˆ ˆ f(θ, z) π(dz) = Z Ω b Υ(θ, z) π(dz) = 0 24 / 40
  • 72. References References I [1] L. Amerio and G. Prouse. Almost-periodic functions and functional equations. Springer Science & Business Media, 2013. [2] K. B. Ariyur and M. Krstić. Analysis and design of multivariable extremum seeking. In American Control Conference, volume 4, pages 2903–2908. IEEE, 2002. [3] K. B. Ariyur and M. Krstić. Real Time Optimization by Extremum Seeking Control. John Wiley & Sons, Inc., New York, NY, 2003. [4] S. Asmussen and P. W. Glynn. Stochastic Simulation: Algorithms and Analysis, volume 57 of Stochastic Modelling and Applied Probability. Springer-Verlag, New York, 2007. [5] A. Benveniste, M. Métivier, and P. Priouret. Adaptive algorithms and stochastic approximations, volume 22. Springer Science & Business Media, Berlin Heidelberg, 2012. 25 / 40
  • 73. References References II [6] D. P. Bertsekas. Reinforcement learning and optimal control. Athena Scientific, Belmont, MA, 2019. [7] S. Bhatnagar, M. C. Fu, S. I. Marcus, and I.-J. Wang. Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences. ACM Transactions on Modeling and Computer Simulation (TOMACS), 13(2):180–209, 2003. [8] H. Bohr. Almost periodic functions. Courier Dover Publications (Reprint of the 1947 edition, Chelsea Publishing Company), 2018. [9] V. Borkar, S. Chen, A. Devraj, I. Kontoyiannis, and S. Meyn. The ODE method for asymptotic statistics in stochastic approximation and reinforcement learning. arXiv e-prints:2110.14427, pages 1–50, 2021. [10] V. S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Hindustan Book Agency, Delhi, India, 2nd edition, 2021. 26 / 40
  • 74. References References III [11] V. S. Borkar and S. P. Meyn. The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim., 38(2):447–469, 2000. [12] S. Chen, A. Devraj, A. Bernstein, and S. Meyn. Accelerating optimization and reinforcement learning with quasi stochastic approximation. In Proc. of the American Control Conf., pages 1965–1972, May 2021. [13] S. Chen, A. Devraj, A. Bernstein, and S. Meyn. Revisiting the ODE method for recursive algorithms: Fast convergence using quasi stochastic approximation. Journal of Systems Science and Complexity, 34(5):1681–1702, 2021. [14] X. Chen, Y. Tang, and N. Li. Improve single-point zeroth-order optimization using high-pass and low-pass filters. In International Conference on Machine Learning, pages 3603–3620. PMLR, 2022. [15] A. M. Devraj, A. Bušić, and S. Meyn. Fundamental design principles for reinforcement learning algorithms. In K. G. Vamvoudakis, Y. Wan, F. L. Lewis, and D. Cansever, editors, Handbook on Reinforcement Learning and Control, Studies in Systems, Decision and Control series (SSDC, volume 325). Springer, 2021. 27 / 40
  • 75. References References IV [16] J. Dippon. Accelerated randomized stochastic optimization. The Annals of Statistics, 31(4):1260–1281, 2003. [17] J. Dippon and J. Renz. Weighted means in stochastic approximation of minima. SIAM Journal on Control and Optimization, 35(5):1811–1827, 1997. [18] C. S. Draper and Y. T. Li. Principles of optimalizing control systems and an application to the internal combustion engine. American Society of Mechanical Engineers, 1951. [19] A. Durmus, E. Moulines, A. Naumov, S. Samsonov, K. Scaman, and H.-T. Wai. Tight high probability bounds for linear stochastic approximation with fixed step-size. Advances in Neural Information Processing Systems and arXiv:2106.01257, 34:30063–30074, 2021. [20] V. Fabian. On the choice of design in stochastic approximation methods. The Annals of Mathematical Statistics, pages 457–465, 1968. 28 / 40
  • 76. References References V [21] A. Fradkov and B. T. Polyak. Adaptive and robust control in the USSR. IFAC–PapersOnLine, 53(2):1373–1378, 2020. 21th IFAC World Congress. [22] T. Jaakola, M. Jordan, and S. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6:1185–1201, 1994. [23] C. Kalil Lauand and S. Meyn. Approaching quartic convergence rates for quasi-stochastic approximation with application to gradient-free optimization. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 15743–15756. Curran Associates, Inc., 2022. [24] H. K. Khalil. Nonlinear systems. Prentice-Hall, Upper Saddle River, NJ, 3rd edition, 2002. 29 / 40
  • 77. References References VI [25] J. Kiefer and J. Wolfowitz. Stochastic estimation of the maximum of a regression function. Ann. Math. Statist., 23(3):462–466, September 1952. [26] N. J. Killingsworth and M. Krstic. Pid tuning using extremum seeking: online, model-free performance optimization. IEEE control systems magazine, 26(1):70–79, 2006. [27] P. Kokotović, H. K. Khalil, and J. O’Reilly. Singular Perturbation Methods in Control: Analysis and Design. Society for Industrial and Applied Mathematics, 1999. [28] P. Kokotovic, R. O’Malley, and P. Sannuti. Singular perturbations and order reduction in control theory — an overview. Automatica, 12(2):123–132, 1976. [29] V. Konda. Actor-critic algorithms. PhD thesis, Massachusetts Institute of Technology, 2002. [30] V. R. Konda and J. N. Tsitsiklis. On actor-critic algorithms. SIAM J. Control Optim., 42(4):1143–1166 (electronic), 2003. 30 / 40
  • 78. References References VII [31] M. Krstić and H.-H. Wang. Stability of extremum seeking feedback for general nonlinear dynamic systems. Automatica, 36(4):595–601, 2000. [32] H. J. Kushner and G. G. Yin. Stochastic approximation algorithms and applications, volume 35 of Applications of Mathematics (New York). Springer-Verlag, New York, 1997. [33] B. Lapeybe, G. Pages, and K. Sab. Sequences with low discrepancy generalisation and application to Robbins-Monro algorithm. Statistics, 21(2):251–272, 1990. [34] J. Larson, M. Menickelly, and S. M. Wild. Derivative-free optimization methods. Acta Numerica, 28:287–404, 2019. [35] S. Laruelle and G. Pagès. Stochastic approximation with averaging innovation applied to finance. Monte Carlo Methods and Applications, 18(1):1–51, 2012. 31 / 40
  • 79. References References VIII [36] C. K. Lauand and S. Meyn. Bias in stochastic approximation cannot be eliminated with averaging. In Allerton Conference on Communication, Control, and Computing, pages 1–4, Sep. 2022. [37] C. K. Lauand and S. Meyn. Extremely fast convergence rates for extremum seeking control with Polyak-Ruppert averaging. arXiv 2206.00814, 2022. [38] C. K. Lauand and S. Meyn. Markovian foundations for quasi stochastic approximation with applications to extremum seeking control. arXiv 2207.06371, 2022. [39] M. Le Blanc. Sur l’electrification des chemins de fer au moyen de courants alternatifs de frequence elevee [On the electrification of railways by means of alternating currents of high frequency]. Revue Generale de l’Electricite, 12(8):275–277, 1922. [40] L. Lessard. The analysis of optimization algorithms: A dissipativity approach. IEEE Control Systems Magazine, 42(3):58–72, June 2022. 32 / 40
  • 80. References References IX [41] S. Liu and M. Krstic. Introduction to extremum seeking. In Stochastic Averaging and Stochastic Extremum Seeking, Communications and Control Engineering. Springer, London, 2012. [42] L. Ljung. Analysis of recursive stochastic algorithms. Trans. on Automatic Control, 22(4):551–575, 1977. [43] J. C. Luxat and L. H. Lees. Stability of peak-holding control systems. IEEE Transactions on Industrial Electronics and Control Instrumentation, IECI-18(1):11–15, 1971. [44] S. M. Meerkov. Asymptotic methods for investigating a class of forced states in extremal systems. Automation and Remote Control, 28(12):1916–1920, 1967. [45] P. G. Mehta and S. P. Meyn. Q-learning and Pontryagin’s minimum principle. In Proc. of the Conf. on Dec. and Control, pages 3598–3605, Dec. 2009. 33 / 40
  • 81. References References X [46] M. Métivier and P. Priouret. Applications of a Kushner and Clark lemma to general classes of stochastic algorithms. Trans. on Information Theory, 30(2):140–151, March 1984. [47] M. Metivier and P. Priouret. Theoremes de convergence presque sure pour une classe d’algorithmes stochastiques a pas decroissants. Prob. Theory Related Fields, 74:403–428, 1987. [48] S. Meyn. Control Systems and Reinforcement Learning. Cambridge University Press, Cambridge, 2022. [49] S. P. Meyn and R. L. Tweedie. Markov chains and stochastic stability. Cambridge University Press, Cambridge, second edition, 2009. Published in the Cambridge Mathematical Library. 1993 edition online. [50] H. Mohammadi, M. Razaviyayn, and M. R. Jovanović. Robustness of accelerated first-order algorithms for strongly convex optimization problems. IEEE Transactions on Automatic Control, 66(6):2480–2495, 2020. 34 / 40
  • 82. References References XI [51] W. Mou, C. Junchi Li, M. J. Wainwright, P. L. Bartlett, and M. I. Jordan. On linear stochastic approximation: Fine-grained Polyak-Ruppert and non-asymptotic concentration. Conference on Learning Theory and arXiv:2004.04719, pages 2947–2997, 2020. [52] E. Moulines and F. R. Bach. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In Advances in Neural Information Processing Systems 24, pages 451–459, 2011. [53] V. Obabkov. Theory of multichannel extremal control systems with sinusoidal probe signals. Automation and Remote Control, 28:48–54, 1967. [54] R. Pasupathy and S. Ghosh. Simulation optimization: A concise overview and implementation guide. Theory Driven by Influential Applications, pages 122–150, 2013. [55] B. T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17, 1964. 35 / 40
  • 83. References References XII [56] B. T. Polyak. A new method of stochastic approximation type. Avtomatika i telemekhanika (in Russian). translated in Automat. Remote Control, 51 (1991), pages 98–107, 1990. [57] B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM J. Control Optim., 30(4):838–855, 1992. [58] B. T. Polyak and A. B. Tsybakov. Optimal order of accuracy of search algorithms in stochastic optimization. Problemy Peredachi Informatsii (Prob. Inform. Trans.), 26(2):45–53, 1990. [59] L. Rastrigin. Extremum control by means of random scan. Avtomat. i Telemekh, 21(9):1264–1271, 1960. [60] L. A. Rastrigin. Random search in problems of optimization, identification and training of control systems. Journal of Cybernetics, 3(3):93–103, 1973. 36 / 40
  • 84. References References XIII [61] H. Robbins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22:400–407, 1951. [62] D. Ruppert. Efficient estimators from a slowly convergent Robbins-Monro processes. Technical Report Tech. Rept. No. 781, Cornell University, School of Operations Research and Industrial Engineering, Ithaca, NY, 1988. [63] J. A. Sanders, F. Verhulst, and J. Murdock. Averaging methods in nonlinear dynamical systems, volume 59. Springer, 2007. [64] P. J. Schweitzer. Perturbation theory and finite Markov chains. J. Appl. Prob., 5:401–403, 1968. [65] S. Shirodkar and S. Meyn. Quasi stochastic approximation. In Proc. of the American Control Conf., pages 2429–2435, July 2011. 37 / 40
  • 85. References References XIV [66] D. R. Smith. Singular-perturbation theory: an introduction with applications. Cambridge University Press, 1985. [67] J. C. Spall. Stochastic optimization. In Handbook of computational statistics, pages 173–201. Springer, 2012. [68] S. Surjanovic and D. Bingham. Virtual library of simulation experiments: Test functions and datasets. Retrieved May 16, 2022, from http://www.sfu.ca/~ssurjano. [69] R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 2nd edition, 2018. [70] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999. [71] Y. Tan, W. H. Moase, C. Manzie, D. Nešić, and I. Mareels. Extremum seeking from 1922 to 2010. In Proc. of the 29th Chinese control conference, pages 14–26. IEEE, 2010. 38 / 40
  • 86. References References XV [72] Y. Tan, D. Nešić, and I. Mareels. On non-local stability properties of extremum seeking control. Automatica, 42(6):889–903, 2006. [73] A. Teel and D. Popovic. Solving smooth and nonsmooth multivariable extremum seeking problems by the methods of nonlinear programming. In Proceedings of the American Control Conference, volume 3, pages 2394–2399 vol.3, June 2001. [74] J. Tsitsiklis. Asynchronous stochastic approximation and Q-learning. Machine Learning, 16:185–202, 1994. [75] J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automat. Control, 42(5):674–690, 1997. [76] Y. Z. Tsypkin and Z. J. Nikolic. Adaptation and learning in automatic systems. Academic Press, New York, 1971. 39 / 40
  • 87. References References XVI [77] B. Van Roy. Learning and Value Function Approximation in Complex Decision Processes. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, 1998. AAI0599623. [78] M. Vidyasagar. A new converse Lyapunov theorem for global exponential stability and applications to stochastic approximation. In IEEE Trans. Automat. Control, pages 2319–2321. IEEE, 2022. Extended version on arXiv:2205.01303. [79] M. Vidyasagar. Convergence of stochastic approximation via martingale and converse Lyapunov methods. Mathematics of Control, Signals, and Systems, pages 1–24, 2023. 40 / 40