SlideShare a Scribd company logo
TA
Control as Inference
5
Control as Inference
(POMDP)
…
…
…
???
???
???
etc.
…
‣
‣ MDP (POMDP)
Control as Inference
(POMDP)
x1, …, xN ∼ p (X)
p (X) θ p (X ∣ θ)
p (X = k ∣ θ) = μk
θ(1 − μθ)1−k
μθ 1 − μθ
μθ
1.
e.g.,
2.
e.g., 0.5
➡ …
p (X ∣ θ)
μθ
1.
e.g.,
2.
e.g., 0.5
➡ …
p (X ∣ θ)
μθ
θ
N
x
N
x
y θ
/
N
z
x
θ
N
x
y θ
/
N
z
x
θ
DNN
Y
p (Y ∣ X, θ) = Normal (fθ (X), Σ)
fθ
N
x
y θ
DNN
Y
p (Y = k ∣ X, θ) =
exp (fθ (X)[k])
∑
K
k′=1
exp (fθ (X)[k′])
fθ
N
x
y θ
VAE ( )
Z
p (X, Z ∣ θ) = p (Z ∣ θ) p (X ∣ Z, θ)
θ Z
N
z
x
θ
Maximum Likelihood Estimation (MLE)
̂θ = argmax
θ
N
∏
i=1
p (X = xi ∣ θ)
Maximum a Posteriori Estimation (MAP)
MLE
p (θ)
̂θ = argmax
θ
p (θ ∣ X = x1, …, xN)
= argmax
θ
p (θ)
N
∏
i=1
p (X = xi ∣ θ)
p (θ) = const .
Bayesian Inference
1
p (X ∣ x1, …, xN) = 𝔼p(θ ∣ X = x1, …, xN) [p (X ∣ θ)]
MLE/MAP
exp
−log p (x, θ)
θ
p (θ ∣ x)
θ
1 MLE/MAP
p (θ ∣ X = x1, …, xN)
x1, …, xN x
p (X, θ) = p (θ) p (X ∣ θ)
p (θ ∣ X = x) p (θ ∣ x)
(MCMC)
qϕ (θ)
p (θ ∣ x)
p (θ ∣ x)
Variational Inference
Kullback–Leibler divergenceqϕ (θ) p (θ ∣ x)
p (θ ∣ x) ≈ ̂qϕ (θ) = argmin
qϕ
KL (qϕ (θ) ∥ p (θ ∣ x))
qϕ (θ) Normal
(
μϕ, diag (σ2
ϕ))
ϕ = {μϕ, σ2
ϕ}
Variational Inference
( )
KL (qϕ (θ) ∥ p (θ ∣ x)) =
∫
qϕ (θ) log
qϕ (θ)
p (θ ∣ x)
dΘ
= 𝔼qϕ
[
log
qϕ (θ)
p (x, θ) ]
+ log p (x)
log p (x) qϕ ℒϕ (x)
ℒϕ (x) ℒϕ (x) ≤ log p (x)
−ℒϕ (x)
Reparameterization Gradient
ℒϕ (x) ϕ
qϕ
∇ϕℒϕ (x) = − ∇ϕ 𝔼qϕ
[
log
qϕ (θ)
p (x, θ) ]
Reparameterization Gradient
qϕ (θ) = Normal
(
μϕ, diag (σ2
ϕ))
𝔼qϕ
[
log
qϕ (θ)
p (x, θ) ]
= 𝔼p(ϵ) log
qϕ (θ)
p (x, θ)
θ=f(ϵ, ϕ)
p (ϵ) = Normal (0, I), f (ϵ, ϕ) = μϕ + σϕ ⊙ ϵ
Reparameterization Gradient
∇ϕ 𝔼qϕ
[
log
qϕ (θ)
p (x, θ) ]
= 𝔼p(ϵ) ∇ϕlog
qϕ (θ)
p (x, θ)
Θ=f(ϵ, ϕ)
≈
1
L
L
∑
l=1
∇ϕlog
qϕ (θ)
p (x, θ)
θ=f(ϵ(l), ϕ)
ϵ(1)
, ⋯, ϵ(L)
∼ p (ϵ)
Reparameterization Gradient
1.
‣
‣ ( ) http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-
reparameterisation-tricks/
2.
qϕ f
ϕ
MLE/MAP
MAP
0
MLE
δ (θ − μϕ)
δ (θ − μϕ) = lim
σ2
→0
Normal (μϕ, diag (σ2
))
p (θ) = const .
δ (x)
https://commons.wikimedia.org/wiki/File:Dirac_distribution_PDF.png
Amortized Variational Inference
z1:N
qϕ (Z1:N) =
N
∏
i=1
qϕi (Zi)
N
zi
ϕi
N
z
x
θ
Amortized Variational Inference
qϕ (Z1:N) =
N
∏
i=1
qϕ (Zi ∣ fϕ (xi))
xi fϕ
ϕ
z
N
z
x
θ
Amortized Variational Inference
DNN
qϕ (Z) =
N
∏
i=1
Normal
(
μϕ (xi), diag (σ2
ϕ (xi)))
μϕ, σ2
ϕ
N
z
x
θ
Variational Autoencoder (VAE)
DNN
Autoencoder
p (X ∣ z, θ) =
N
∏
i=1
Normal
(
μθ (zi), diag (σ2
θ (zi)))
qϕ (Z) =
N
∏
i=1
Normal
(
μϕ (xi), diag (σ2
ϕ (xi)))
μθ, σ2
θ μϕ, σ2
ϕ
qϕ
N
z
x
θ
e.g.,
Markov Chain Monte Carlo (MCMC)
p (θ ∣ x)
p (θ ∣ x) ≈
1
T
T
∑
T=1
δ (θ − θ(t)
)
θ(1)
, …, θ(T)
∼ p (θ ∣ x)
Markov Chain Monte Carlo (MCMC)
1.
2.
3. 2
θ(0)
θ(t+1)
∼ p (θ′ ∣ θ = θ(t)
)
T {θ(1)
, …, θ(T)
}
Langevin Dynamics
MCMC
pβ
(θ′ ∣ θ) = Normal
(
θ + η
∂
∂θ
log p (x, θ), 2ηβ−1
I
)
η → 0 pβ
(θ ∣ x) = (p (θ ∣ x))
β
β = 1 p (θ ∣ x)
Langevin Dynamics
https://upload.wikimedia.org/wikipedia/commons/0/0d/First_passage_time_in_double_well_potential_under_langevin_dynamics.gif
−log p (x, θ)
MLE/MAP
MAP
MLE
β → ∞
lim
β→∞
pβ
(θ′ ∣ θ) = δ
(
θ′−
(
θ + η
∂
∂θ
log p (x, θ)
))
p (θ) = const .
MCMC
•
• MCMC
Control as Inference
(POMDP)
st π at
st+1 r (st, at)
∞
∑
t=1
r (st, at) π
※
Action-Value Function (Q-function)
st at π
Qπ
(st, at) = r (st, at) + 𝔼π
[
∞
∑
k=1
r (st+k, at+k)
]
※
Optimal Action-Value Function (Optimal Q-function)
st at
Q* (st, at) = r (st, at) + max
a
∞
∑
k=1
r (st+k, at+k)
= max
π
Qπ
(st, at)
※
(State) Value Function
st π
Vπ
(st) = 𝔼π
[
∞
∑
k=0
r (st+k, at+k)
]
= 𝔼π [Qπ
(st, at)]
※
Optimal (State) Value Function
st
V* (st) = max
a
∞
∑
k=0
r (st+k, at+k)
= max
π
Vπ
(st)
= max
a
Q* (st, at)
※
Bellman Equation
Qπ
(st, at) = r (st, at) + Vπ
(st+1)
Vπ
(st) = 𝔼π [r (st, at)] + Vπ
(st+1)
※
Bellman Optimality Equation
Q* (st, at) = r (st, at) + V* (st+1)
V* (st) = max
a
r (st, a) + V* (st+1)
※
Q
Q-learning
(greedy )
Q (st, at) ← Q (st, at) + η
[
r (st, at) + max
a
Q (st+1, a) − Q (st, at)]
π (s) = argmax
a
Q (s, a)
※
Q +
Q-learning + Function Approximation
(e.g., )
DNN (e.g., DQN)
Qθ
θ ← θ − η∇θ 𝔼
[
r (st, at) + max
a
Qθ (st+1, a) − Qθ (st, at)
2
]
Qθ
※
Policy Gradient (REINFORCE)
DNN
πϕ (a ∣ s)
πϕ (a ∣ s) = Normal
(
μϕ (s), diag (σ2
ϕ (s)))
μϕ, σ2
ϕ
※
Policy Gradient (REINFORCE)
θ
ϕ ← ϕ + η∇ϕ 𝔼πϕ
[
T
∑
t=1
r (st, at)
]
∇ϕ 𝔼πϕ
[
T
∑
t=1
r (st, at)
]
= 𝔼πϕ
[
T
∑
t=1
r (st, at)
T
∑
t=1
∇ϕlog πϕ (at ∣ st)
]
※
Actor-Critic
Q
πϕ
θ
πϕ
ϕ ← ϕ + ηϕ ∇ϕ 𝔼πϕ [Q
πϕ
θ
(s, a)]
θ ← θ − ηθ ∇θ 𝔼
[
r (st, at) + V
πϕ
θ (st+1) − Q
πϕ
θ (st, at)
2]
V
πϕ
θ
(s) = 𝔼πϕ [Q
πϕ
θ
(s, a)]
※
Q Q
or Actor-Critic
※
(e.g., Qt-opt Q )
vs
On-policy vs Off-policy
(on-policy)
(e.g. , )
(off-policy)
(e.g., Q )
Maximum Entropy Reinforcement Learning (MERL)
∞
∑
t=1
r (st, at) + ℋ (π (at ∣ st))
※
Soft Actor-Critic
Actor-Critic
ϕ ← ϕ + ηϕϕ
∇ϕ 𝔼πϕ [Q
πϕ
θ
(s, a)−log πϕ (a ∣ s)]
θ ← θ − ηθ ∇θ 𝔼
[
r (st, at) + V
πϕ
θ (st+1) − Q
πϕ
θ (st, at)
2]
V
πϕ
θ
(s) = 𝔼π [Q
πϕ
θ
(s, a)−log πϕ (at ∣ st)]
※
https://arxiv.org/abs/1801.01290
Soft Actor-Critic
Actor-Critic
➡ Actor-Critic on-policy
πϕ 𝔼πϕ [Q
πϕ
θ (st, at)]
πϕ
Soft Actor-Critic
SAC
KL divergence
➡ SAC off-policy
𝔼πϕ [Q
πϕ
θ
(s, a)−log πϕ (a ∣ s)]
πϕ ̂π (a ∣ s) ∝ exp (Qπ
ϕ (s, a))
KL (πϕ ∥ ̂π) = − 𝔼πϕ [Q
πϕ
θ
(s, a)−log πϕ (a ∣ s)] + log
∫
exp (Q
πϕ
θ
(s, a)) da
• Q
•
• Actor-Critic
Control as Inference
(POMDP)
Control as Inference
Markov Decision Process (MDP)
N
st st+1
at
rt
at+1
rt+1
••••••
Markov Decision Process (MDP) + Optimality Variables
N
st st+1
ot
at
rt
ot+1
at+1
rt+1
••••••
Optimality Variable
‣
s a
O = 1 O = 0
r O
p (O = 1 ∣ r) ∝ exp (r (s, a))
2
1.
2.
p (s1:T, a1:T ∣ O1:T = 1)
s1:T, a1:T
p (at ∣ st, O≥t = 1)
➡
➡ p (s1:T, a1:T ∣ O1:T = 1) p (at ∣ st, O≥t = 1)
Ot = 1 ot
p (at ∣ st, o≥t) ∝ p (at ∣ st) p (o≥t ∣ st, at)
p (at ∣ st)
p (at ∣ st, o≥t) ∝ p (o≥t ∣ st, at)
Q* (st, at) = log p (o≥t ∣ st, at), V* (st) = log p (o≥t ∣ st)
Q* (st, at) = log p (ot ∣ st, at) + log p (o≥t+1 ∣ st, at)
= r (st, at) + log
∫
p (st+1 ∣ st, at) p (o≥t+1 ∣ st+1) dst+1
= r (st, at) + log 𝔼p(st+1 ∣ st, at) [
exp (V* (st+1))]
※
i.e.,
Q* (st, at) = r (st, at) + log 𝔼p(st+1 ∣ st, at) [
exp (V* (st+1))]
p (st+1 ∣ st, at) = δ (st+1 − f (st, at))
Q* (st, at) = r (st, at) + V* (st+1)
V* (s) = log
∫
exp (Q* (s, a)) da ≠ max Q* (s, a)
※
p (s1:T, a1:T ∣ o1:T) p (at ∣ st, o≥t)
Q* (st, at) = log p (o≥t ∣ st, at),
V* (st) = log p (o≥t ∣ st)
Q* (st, at) = r (st, at) + log 𝔼p(st+1 ∣ st, at) [
exp (V* (st+1))]
※
2
1.
2.
p (s1:T, a1:T ∣ o1:T)
s1:T, a1:T
p (at ∣ st, o≥t)
➡
p (s1:T, a1:T ∣ o1:T) ∝ p (s1)
T
∏
t=1
p (st+1 ∣ st, at) exp (r (st, at))
qϕ (s1:T, a1:T) = p (s1)
T
∏
t=1
p (st+1 ∣ st, at) πϕ (at ∣ st)
πϕ (a ∣ s) = Normal
(
μϕ (s), diag (σ2
ϕ (s)))
μϕ, σ2
ϕ ϕ s
KL divergenceqϕ (s1:T, a1:T) p (s1:T, a1:T ∣ o1:T)
KL (qϕ (s1:T, a1:T) ∥ p (s1:T, a1:T ∣ o1:T))
= 𝔼qϕ
[
log
qϕ (s1:T, a1:T)
p (s1:T, a1:T ∣ o1:T) ]
= 𝔼qϕ
[
T
∑
t=1
log πϕ (at ∣ st) − r (st, at)
]
+ log p (o1:T)
∇ϕ 𝔼qϕ
[
T
∑
t=1
r (st, at)
]
= 𝔼qϕ
[
T
∑
t=1
r (st, at)∇ϕlog qϕ (s1:T, a1:T)
]
= 𝔼qϕ
[
T
∑
t=1
r (st, at)
T
∑
t=1
∇ϕlog πϕ (at ∣ st)
]
➡
𝔼qϕ
[
T
∑
t=1
log πϕ (at ∣ st) − r (st, at)
]
2
1.
2.
p (s1:T, a1:T ∣ o1:T)
s1:T, a1:T
p (at ∣ st, o≥t)
p (at ∣ st, o≥t) ∝ exp (Q* (st, at)) Q*
p (at ∣ st, o≥t) =
exp (Q* (st, at))
∑a∈A
exp (Q* (st, a))
➡
p (at ∣ st, o≥t) =
exp (Q* (st, at))
∫ exp (Q* (st, a)) da
p (at ∣ st, o≥t) πϕ (at ∣ st)
πϕ (a ∣ s) = Normal
(
μϕ (s), diag (σ2
ϕ (s)))
μϕ, σ2
ϕ ϕ s
KL divergenceπϕ (at ∣ st) p (at ∣ st, o≥t)
KL (πϕ (at ∣ st) ∥ p (at ∣ st, o≥t))
= 𝔼πϕ
[
log
πϕ (at ∣ st)
p (at ∣ st, o≥t)]
= 𝔼πϕ [log πϕ (at ∣ st) − Q* (st, at)] + V* (st)
KL (πϕ (at ∣ st) ∥ p (at ∣ st, o≥t))
= 𝔼πϕ [log πϕ (at ∣ st)−Q* (st, at)] + V* (st)
Q* (st, at)
Q* (st, at) = r (st, at) + log 𝔼p(st+1 ∣ st, at) [
exp (V* (st+1))]
V* (s) = log
∫
exp (Q* (s, a)) da
V*
※
1.
Soft Q-learning
V* (s) = log 𝔼πϕ
[
exp (Q* (s, a))
πϕ (a ∣ s) ]
≈ log
1
L
L
∑
l=1
exp (Q* (s, a(l)
))
πϕ (a(l) ∣ s)
a1, …aL ∼ πϕ (a ∣ s)
L → ∞ V* (s)
※
2.
Q* (st, at) = r (st, at) + log 𝔼p(st+1 ∣ st, at) [
exp (V* (st+1))]
≥ r (st, at) + log 𝔼p(st+1 ∣ st, at) [
exp (Vπϕ
(st+1))]
= Qπϕ
(st, at)
※
2.
V*(s) = log 𝔼πϕ
[
exp (Q*(s, a))
πϕ(a ∣ s) ]
≥ 𝔼πϕ [Q*(s, a) − log πϕ(a ∣ s)]
≥ 𝔼πϕ [Qπϕ(s, a) − log πϕ(a ∣ s)]
= Vπϕ(s)
※
2.
➡
Soft Actor-Critic
Qπϕ, Vπϕ Q*, V* πϕ (at ∣ st) = p (at ∣ st, o≥t)
Qπϕ, Vπϕ Q*, V*
※
Qπϕ Q
πϕ
θ
θ ← θ − ηθ ∇θ 𝔼
[
r (st, at) + V
πϕ
θ (st+1) − Q
πϕ
θ (st, at)
2]
V
πϕ
θ
(s) = 𝔼πϕ [Q
πϕ
θ
(s, a) − log πϕ (a ∣ s)]
V
πϕ
θ
πϕ
※
Soft Actor-Critic
πϕ (at, st) ̂π (a ∣ s) ∝ exp (Q
πϕ
θ
(s, a))
KL (πϕ (at ∣ st) ∥ ̂π (at ∣ st))
= 𝔼πϕ [log πϕ (at ∣ st) − Q
πϕ
θ (st, at)] + log
∫
exp (Q
πϕ
θ
(s, a)) da
Soft Actor-Critic (SAC)
SAC off-policy
1
On-policy Off-policy
➡ On-policy
➡ Off-policy
(st, at, rt, st+1)
Control as Inference
(POMDP)
MDP
MDP
MDP
s
?
?
CartPole
MDP
DQN MDP
‣ 4
➡
Partially Observable Markov Decision Process (POMDP)
Partially Observable Markov Decision Process (POMDP)
N
xt
at
rt
••••••
st
xt+1
at+1
rt+1
st+1
POMDP + Optimality Variables
N
xt
ot
at
rt
••••••
st
xt+1
ot+1
at+1
rt+1
st+1
POMDP
POMDP p (at ∣ st, o≥t)
x s p (st ∣ xt, st−1, at−1)
p (s≤t, at ∣ x≤t, a<t, o≥t)
= p (at ∣ st, o≥t) p (s1 ∣ x1)
t
∏
τ=1
p (sτ+1 ∣ xτ+1, sτ, aτ)
p (s≤t, at ∣ x≤t, a<t, o≥t)
qϕ (s≤t, at ∣ x≤t, a<t)
= πϕ (at ∣ st) qϕ (s1 ∣ x1)
t
∏
τ=1
qϕ (sτ+1 ∣ xτ+1, sτ, aτ)
KL divergence
KL (qϕ (s≤t, at ∣ x≤t, a<t) ∥ p (s≤t, at ∣ x≤t, a<t, o≥t))
= 𝔼qϕ
[
log
qϕ (s≤t, at ∣ x≤t, a<t)
p (s≤t, at ∣ x≤t, a<t, o≥t)]
= 𝔼qϕ
[
log πϕ (at ∣ st) + log
qϕ (s1 ∣ x1)
p (x1, s1)
+
t
∑
τ=1
log
qϕ (sτ+1 ∣ xτ+1, sτ, aτ)
p (xτ+1, sτ+1 ∣ sτ, aτ)
− Q* (st, at)
]
+log p (x≤t ∣ a<t) + V* (st)
−ℒϕ (x≤t, a<t, o≥t)
KL divergence
KL (qϕ (s≤t, at ∣ x≤t, a<t) ∥ p (s≤t, at ∣ x≤t, a<t, o≥t))
= 𝔼qϕ
[
log
qϕ (s≤t, at ∣ x≤t, a<t)
p (s≤t, at ∣ x≤t, a<t, o≥t)]
= 𝔼qϕ
[
log πϕ (at ∣ st) + log
qϕ (s1 ∣ x1)
p (x1, s1)
+
t
∑
τ=1
log
qϕ (sτ+1 ∣ xτ+1, sτ, aτ)
p (xτ+1, sτ+1 ∣ sτ, aτ)
− Q* (st, at)
]
+log p (x≤t ∣ a<t) + V* (st)
−ℒϕ (x≤t, a<t, o≥t)
KL divergence
KL (qϕ (s≤t, at ∣ x≤t, a<t) ∥ pψ (s≤t, at ∣ x≤t, a<t, o≥t))
= 𝔼qϕ
[
log
qϕ (s≤t, at ∣ x≤t, a<t)
pψ (s≤t, at ∣ x≤t, a<t, o≥t) ]
= 𝔼qϕ
[
log πϕ (at ∣ st) + log
qϕ (s1 ∣ x1)
pψ (x1, s1)
+
t
∑
τ=1
log
qϕ (sτ+1 ∣ xτ+1, sτ, aτ)
pψ (xτ+1, sτ+1 ∣ sτ, aτ)
− Q* (st, at)
]
+log pψ (x≤t ∣ a<t) + V* (st)
➡
−ℒϕ,ψ (x≤t, a<t, o≥t)
log pψ (x≤t ∣ a<t) + V* (st) ≥ ℒϕ,ψ (x≤t, a<t, o≥t)
qϕ (s≤t, at ∣ x≤t, a<t) = pψ (s≤t, at ∣ x≤t, a<t, o≥t)
qϕ (s≤t, at ∣ x≤t, a<t) = pψ (s≤t, at ∣ x≤t, a<t, o≥t)
argmax
ψ
ℒϕ,ψ (x≤t, a<t, o≥t) = argmax
ψ
pψ (x≤t ∣ a<t)
ψ ℒϕ,ψ (x≤t, a<t, o≥t)
SAC
Q* (st, at) ≥ r (st, at) + log 𝔼p(st+1 ∣ st, at) [
exp (Vπϕ
(st+1))]
= Qπϕ
(st, at) ≈ Q
πϕ
θ (st, at)
V*(s) ≥ 𝔼πϕ [Qπϕ(s, a) − log πϕ(a ∣ s)]
= Vπϕ(s) ≈ V
πϕ
θ
(s)
Stochastic Latent Actor-Critic (SLAC)
̂θ = argmin 𝔼
[
r (st, at) + V
πϕ
θ (st+1) − Q
πϕ
θ (st, at)
2]
̂ϕ, ̂ψ = argmax
ϕ,ψ
ℒϕ,ψ (x≤t, a<t, o≥t)
POMDP
POMDP
Stochastic Latent Actor-Critic (SLAC) SAC POMDP
p (at ∣ st, o≥t)
p (st ∣ xt, st−1, at−1)
pψ (xt+1, st+1 ∣ st, at)
➡ Control as Inference (or )
(Bayesian RL)
POMDP
➡
Control as Inference
(POMDP)
POMDP
➡
≒ POMDP (+ )
pψ (xt+1, st+1 ∣ st, at)
RL
1.
2.
3.
1 ~ 3
π D
D = {x1, a1, r1, …, xT, aT, rT}
D pψ
pψ (x1:T, r1:T ∣ a1:T)
π https://arxiv.org/abs/1903.00374
RL
1.
2.
➡
RL
1.
2.
RL
1.
2.
3.
1 ~ 3
π D
D = {x1, a1, r1, …, xT, aT, rT}
D pψ
pψ (x1:T, r1:T ∣ a1:T)
π https://arxiv.org/abs/1903.00374
Partially Observable Markov Decision Process
N
xt
at
rt
••••••
st
xt+1
at+1
rt+1
st+1
log pψ (x1:T, r1:T ∣ a1:T)
= log
∫
p (s1)
T
∏
t=1
pψ (st+1 ∣ st, at) pψ (rt ∣ st, at) pψ (xt ∣ st) ds1:T
= log 𝔼qϕ
[
pψ (s1)
qϕ (s1 ∣ x1)
T
∏
t=1
pψ (st+1 ∣ st, at) pψ (rt ∣ st, at) pψ (xt ∣ st)
qϕ (st+1 ∣ xt+1, rt, st, at) ]
≥ 𝔼qϕ
[
log
pψ (s1)
qϕ (s1 ∣ x1)
+
T
∑
t=1
log
pψ (st+1 ∣ st, at) pψ (rt ∣ st, at) pψ (xt ∣ st)
qϕ (st+1 ∣ xt+1, rt, st, at) ]
= ℒϕ,ψ (x1:T, r1:T, a1:T)
log pψ (x1:T, r1:T ∣ a1:T) ≥ ℒϕ,ψ (x1:T, r1:T, a1:T)
qϕ (s1:T ∣ x1:T, r1:T, a1:T) = pψ (s1:T ∣ x1:T, r1:T, a1:T)
qϕ (s1:T ∣ x1:T, r1:T, a1:T) = pψ (s1:T ∣ x1:T, r1:T, a1:T)
argmax
ψ
ℒϕ,ψ (x1:T, r1:T, a1:T) = argmax
ψ
pψ (x1:T, r1:T ∣ a1:T)
ψ ℒϕ,ψ (x1:T, r1:T, a1:T)
RL
1.
2.
3.
1 ~ 3
π D
D = {x1, a1, r1, …, xT, aT, rT}
D pψ
pψ (x1:T, r1:T ∣ a1:T)
π https://arxiv.org/abs/1903.00374
1. (Model Predictive Control,MPC)
1.
2.
3.
a(1)
t:T
, a(2)
t:T
, ⋯, a(K)
t:T
R (a(k)
t:T ) = 𝔼pψ
[
T
∑
τ=t
rψ (sτ, a(k)
τ )]
at = a
̂k
t
(
̂k = argmax
k
R (a(k)
t:T ))
1. (Model Predictive Control,MPC)
MPC 3
• Random-sample Shooting (RS)
MPC
• Cross Entropy Method (CEM)
2.
ϕ ← ϕ + η∇ϕ 𝔼pψ,πϕ
[
T
∑
t=1
rψ (st, at)
]
2.
rψ
∇ϕ 𝔼pψ,πϕ
[
T
∑
t=1
rψ (st, at)
]
= 𝔼p(ϵ)
[
T
∑
t=1
∇ϕrψ (st = fψ (st−1, at−1, ϵ), at = fϕ (st, ϵ))]
2.
∇ϕ 𝔼pψ,πϕ
[
T
∑
t=1
rψ (st, at)
]
= 𝔼pψ,πϕ
[
T
∑
t=1
rψ (st, at)
T
∑
t=1
∇ϕlog πϕ (at ∣ st)
]
3.Actor-Critic
ϕ ← ϕ + ηϕ ∇ϕ 𝔼pψ,πϕ [V
πϕ
θ
(s)]
θ ← θ − ηθ ∇θ 𝔼pψ,πϕ [
rψ (st, at) + V
πϕ
θ (st+1) − Q
πϕ
θ (st, at)
2]
V
πϕ
θ
(s) = 𝔼πϕ [Q
πϕ
θ
(s, a)]
World Models
[Ha and Schmidhuber,2018]
VAE + MDN-RNN
CMA-ES
https://www.slideshare.net/masa_s/ss-97848402
https://arxiv.org/abs/1803.10122
https://worldmodels.github.io/
[Hafner,et al.,2019]
Recurrent State Space Model ( )
CEM
PlaNet
DM Control Suite
https://arxiv.org/abs/1811.04551
https://planetrl.github.io/
Gaussian State Space Model
DNN
pψ (st+1 ∣ st, at)
= Normal
(
μψ (st, at), diag (σ2
ψ (st, at)))
μψ, σ2
ψ
xt
at
rt
st
xt+1
at+1
rt+1
st+1
Recurrent State Space Model (RSSM)
LSTM RNN
s h
z
ht+1 = fψ (ht, zt, at)
pψ (zt ∣ ht) = Normal
(
μψ (ht), diag (σ2
ψ (ht)))
fψ
xt
at
rt
zt
xt+1
at+1
rt+1
zt+1
ht ht+1
RSSM
Recurrent State Space Model (RSSM)
[Hafner,et al.,2019]
PlaNet
Actor-Critic
( )
PlaNet
λ
Dreamer
https://arxiv.org/abs/1912.01603
https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html
Vπ
(st) = 𝔼π [r (st, at)] + Vπ
(st+1)
n
Vπ
n (st) = 𝔼π
[
n−1
∑
k=1
r (st+k, at+k)
]
+ Vπ
(st+n)
2
Vπ
n (st) = 𝔼π
[
n−1
∑
k=1
r (st+k, at+k)
]
+ Vπ
(st+n)
n = 1,…, ∞
¯Vπ
(st, λ) = (1 − λ)
∞
∑
n=1
λn−1
Vπ
n (st)
λ
Dreamer λ
θ ← θ − ηθ ∇θ 𝔼pψ,πϕ [
V
πϕ
θ (st) − ¯Vπ
(st, λ)
2]
H
¯Vπ
(st, λ) ≈ (1 − λ)
H−1
∑
n=1
λn−1
Vπ
n (st) + λH−1
Vπ
H (st)
λ
No value
λ H
≒ POMDP (+ )
RL
DNN
https://www.kspub.co.jp/book/detail/1538320.html
https://www.kspub.co.jp/book/detail/5168707.html
https://www.coronasha.co.jp/np/isbn/9784339024623/
Control as Inference
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review
https://arxiv.org/abs/1805.00909
UC Berkeley Deep RL course ( 14 )
http://rail.eecs.berkeley.edu/deeprlcourse-fa19/
Control as Inference
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a
Stochastic Actor https://arxiv.org/abs/1801.01290
Reinforcement Learning with Deep Energy-Based Policies
https://arxiv.org/abs/1702.08165
Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable
Model https://arxiv.org/abs/1907.00953
World Models
https://arxiv.org/abs/1803.10122
Learning Latent Dynamics for Planning from Pixels
https://arxiv.org/abs/1811.04551
Dream to Control: Learning Behaviors by Latent Imagination
https://arxiv.org/abs/1912.01603
Dreamer

More Related Content

What's hot

強化学習の基礎と深層強化学習(東京大学 松尾研究室 深層強化学習サマースクール講義資料)
強化学習の基礎と深層強化学習(東京大学 松尾研究室 深層強化学習サマースクール講義資料)強化学習の基礎と深層強化学習(東京大学 松尾研究室 深層強化学習サマースクール講義資料)
強化学習の基礎と深層強化学習(東京大学 松尾研究室 深層強化学習サマースクール講義資料)
Shota Imai
 
PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説
弘毅 露崎
 
グラフニューラルネットワーク入門
グラフニューラルネットワーク入門グラフニューラルネットワーク入門
グラフニューラルネットワーク入門
ryosuke-kojima
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習
Deep Learning JP
 
A3C解説
A3C解説A3C解説
A3C解説
harmonylab
 
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
Shota Imai
 
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
RyuichiKanoh
 
ブースティング入門
ブースティング入門ブースティング入門
ブースティング入門
Retrieva inc.
 
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
mlm_kansai
 
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
Deep Learning JP
 
強化学習その1
強化学習その1強化学習その1
強化学習その1
nishio
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
Masahiro Suzuki
 
強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習
Eiji Uchibe
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder
Sho Tatsuno
 
ノンパラメトリックベイズを用いた逆強化学習
ノンパラメトリックベイズを用いた逆強化学習ノンパラメトリックベイズを用いた逆強化学習
ノンパラメトリックベイズを用いた逆強化学習
Shota Ishikawa
 
ようやく分かった!最尤推定とベイズ推定
ようやく分かった!最尤推定とベイズ推定ようやく分かった!最尤推定とベイズ推定
ようやく分かった!最尤推定とベイズ推定
Akira Masuda
 
coordinate descent 法について
coordinate descent 法についてcoordinate descent 法について
[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs
Deep Learning JP
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep Learning
Seiya Tokui
 
深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)
Masahiro Suzuki
 

What's hot (20)

強化学習の基礎と深層強化学習(東京大学 松尾研究室 深層強化学習サマースクール講義資料)
強化学習の基礎と深層強化学習(東京大学 松尾研究室 深層強化学習サマースクール講義資料)強化学習の基礎と深層強化学習(東京大学 松尾研究室 深層強化学習サマースクール講義資料)
強化学習の基礎と深層強化学習(東京大学 松尾研究室 深層強化学習サマースクール講義資料)
 
PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説PCAの最終形態GPLVMの解説
PCAの最終形態GPLVMの解説
 
グラフニューラルネットワーク入門
グラフニューラルネットワーク入門グラフニューラルネットワーク入門
グラフニューラルネットワーク入門
 
[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習[DL輪読会]相互情報量最大化による表現学習
[DL輪読会]相互情報量最大化による表現学習
 
A3C解説
A3C解説A3C解説
A3C解説
 
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
強化学習エージェントの内発的動機付けによる探索とその応用(第4回 統計・機械学習若手シンポジウム 招待公演)
 
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
 
ブースティング入門
ブースティング入門ブースティング入門
ブースティング入門
 
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
 
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
 
強化学習その1
強化学習その1強化学習その1
強化学習その1
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習強化学習と逆強化学習を組み合わせた模倣学習
強化学習と逆強化学習を組み合わせた模倣学習
 
猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder猫でも分かるVariational AutoEncoder
猫でも分かるVariational AutoEncoder
 
ノンパラメトリックベイズを用いた逆強化学習
ノンパラメトリックベイズを用いた逆強化学習ノンパラメトリックベイズを用いた逆強化学習
ノンパラメトリックベイズを用いた逆強化学習
 
ようやく分かった!最尤推定とベイズ推定
ようやく分かった!最尤推定とベイズ推定ようやく分かった!最尤推定とベイズ推定
ようやく分かった!最尤推定とベイズ推定
 
coordinate descent 法について
coordinate descent 法についてcoordinate descent 法について
coordinate descent 法について
 
[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep Learning
 
深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)
 

Similar to Control as Inference (強化学習とベイズ統計)

【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
Deep Learning JP
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択
Masahiro Suzuki
 
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDPhase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Benjamin Jaedon Choi
 
Q prop
Q propQ prop
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ssusere0a682
 
統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半
Ken'ichi Matsui
 
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
Ken'ichi Matsui
 
cps170_bayes_nets.ppt
cps170_bayes_nets.pptcps170_bayes_nets.ppt
cps170_bayes_nets.ppt
FaizAbaas
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.model
sun peiyuan
 
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
ssusere0a682
 
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-
ssusere0a682
 
El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120
RCCSRENKEI
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averages
VjekoslavKovac1
 
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-ゲーム理論BASIC 第42回 -仁に関する定理の証明3-
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-
ssusere0a682
 
20191026 bayes dl
20191026 bayes dl20191026 bayes dl
20191026 bayes dl
Taku Yoshioka
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
Sean Meyn
 
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
Deep Learning JP
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysis
VjekoslavKovac1
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
Mark Chang
 
ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-
ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-
ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-
ssusere0a682
 

Similar to Control as Inference (強化学習とベイズ統計) (20)

【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
【DL輪読会】Unbiased Gradient Estimation for Marginal Log-likelihood
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択
 
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDPhase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
 
Q prop
Q propQ prop
Q prop
 
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
ゲーム理論NEXT 期待効用理論第10/11回 -期待効用定理の証明4/5
 
統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半
 
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
 
cps170_bayes_nets.ppt
cps170_bayes_nets.pptcps170_bayes_nets.ppt
cps170_bayes_nets.ppt
 
Notes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.modelNotes.on.popularity.versus.similarity.model
Notes.on.popularity.versus.similarity.model
 
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明Re:ゲーム理論入門 - ナッシュ均衡の存在証明
Re:ゲーム理論入門 - ナッシュ均衡の存在証明
 
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-
ゲーム理論NEXT 戦略形協力ゲーム第11回 -寡占市場ゲームにおける結託耐性ナッシュ均衡-
 
El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120El text.life science6.matsubayashi191120
El text.life science6.matsubayashi191120
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averages
 
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-ゲーム理論BASIC 第42回 -仁に関する定理の証明3-
ゲーム理論BASIC 第42回 -仁に関する定理の証明3-
 
20191026 bayes dl
20191026 bayes dl20191026 bayes dl
20191026 bayes dl
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
 
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
【DL輪読会】SUMO: Unbiased Estimation of Log Marginal Probability for Latent Varia...
 
Scattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysisScattering theory analogues of several classical estimates in Fourier analysis
Scattering theory analogues of several classical estimates in Fourier analysis
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-
ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-
ゲーム理論BASIC 演習41 -2人ゼロ和ゲームにおけるマックスミニ値-
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

Control as Inference (強化学習とベイズ統計)