SlideShare a Scribd company logo
Stochastic Optimal Control &
Information Theoretic Dualities
MSL Group Meeting, November 10th, 2017
Haruki Nishimura
PART 1: INTRODUCTION TO STOCHASTIC CONTROL
The Big Picture
2 [Williams et al., 2017]
Stochastic Optimal Control Theory
• “Optimality” defined by Bellman’s
principle of optimality.
• Solution methods based on stochastic
dynamic programming.
Information Theoretic Control Theory
• “Optimality” defined in the sense of
Legendre transform.
• Solution methods based on forward
sampling of stochastic differential
equations.
Two fundamentally different approaches to stochastic control problems.
The Big Picture
3 [Williams et al., 2017]
V(x0) Values of terminal states
x
0
Stochastic Dynamic Programming
Forward Sampling of SDEs
Space of value
function
State space
Outline
Part 1 (Today)
Stochastic Optimal Control
› Deterministic optimal control, Bellman’s principle of optimality
› Wiener processes and stochastic differential equations
› Stochastic Hamilton-Jacobi-Bellman equation
Information Theoretic Control
› Legendre transformation
Part 2 (Tentative, 12/1)
› Helmholtz free energy and its interpretations
› Relations to Bellman’s principle of optimality & linearly solvable optimal control
problems
› Algorithms and applications
› Limitations
Deterministic Optimal Control Problem
Consider a continuous-time optimization problem of the following
form:
where
5
Terminal Cost Per-stage Cost
: control input profile
How to solve for the optimal control?
1. Pontryagin’s Maximum Principle
• Based on calculus of variations.
• Solve a system of ODEs (2n).
(Hamiltonian System)
• Open-loop Specification.
6
2. Bellman’s Principle of Optimality
• Based on dynamic programming.
• Solve an n-dimensional PDE. (HJB
Equation)
• Closed-loop Specification.
Hamilton-Jacobi-Bellman Equation
“Optimal cost-to-go” or “value function”
Let’s find a recursive structure in V via Dynamic Programming.
7
t0
tft’
If the blue is optimal, then the red is necessarily
optimal as well.
Hamilton-Jacobi-Bellman Equation
Taylor expand V(t+dt, x(t+dt)) around V(t,x(t)).
Substitute this into the original equation with dx = f(t,x,u)dt.
Rearrange terms and take the limit dt -> 0.
8
Hamilton-Jacobi-Bellman Equation
Boundary Condition:
Numerically solve backwards in time for all (t,x) to obtain the closed-
loop optimal control policy:
9
Stochastic Optimal Control Problem
Goal is to derive HJB. Any differences from the deterministic case?
10
1. The dynamics is governed by a stochastic differential equation.
2. Uncertainties about future state trajectories.
Example: The Drunken Spider Problem
Presence of noise (alcohol) can change the optimal behavior
significantly.
11
• Without noise, the spider
will cross the bridge.
• When drunk, the cost of
crossing the bridge
increases and the spider
should go around the lake.
[Kappen, 2005]
What is the stochasticity in the dynamics?
Recall that the dynamics is described by the following stochastic
differential equation.
where
wt is a stochastic process called the Wiener Process (a.k.a. Standard
Brownian Motion).
12
The Wiener Process
A type of Gaussian Process with “good” properties to model random
behavior that evolves over time.
13
Applications:
• Finance
• Physics
• Chemistry
• Stochastic Control Theory
…and more.
Image URL: https://github.com/matthewfieger/wiener_process
The Wiener Process
1. Continuity: wt is continuous in t with probability 1.
2. Gaussianity:wt+s – ws is distributed according to N(0,t).
3. Independnce: wt+s – ws is independent of {wr}r<=s.
4. Stationarity:The distribution for wt+s – ws is independent of s.
14
The Wiener Process
1. Continuity: wt is continuous in t with probability 1.
2. Gaussianity:wt+s – ws is distributed according to N(0,t).
3. Independnce: wt+s – ws is independent of {wr}r<=s.
4. Stationarity:The distribution for wt+s – ws is independent of s.
15
The Wiener Process
1. Continuity: wt is continuous in t with probability 1.
2. Gaussianity:wt+s – ws is distributed according to N(0,t).
3. Independnce: wt+s – ws is independent of {wr}r<=s.
4. Stationarity:The distribution for wt+s – ws is independent of s.
16
The Wiener Process
1. Continuity: wt is continuous in t with probability 1.
2. Gaussianity:wt+s – ws is distributed according to N(0,t).
3. Independnce: wt+s – ws is independent of {wr}r<=s.
4. Stationarity:The distribution for wt+s – ws is independent of s.
17
Stochastic Differential Equation
Easiest to think of dwt as a Gaussian white noise with
.
Going back to our original equation,
Note that it is not written in (dx/dt) form because the Wiener process is shown to be nowhere-
differentiable with probability 1.
18
Drift term Diffusion term
Proof of Indifferentiability
Assume wt is differentiable at some t0. Then the derivative w’ must exist.
That is,
Without loss of generality take t >= t0. Now the definition of the Wiener process gives
and thus,
The probability of the quantity |.| exceeding any positive ε is always positive, and can be
made arbitrarily large by taking t sufficiently close to t0.
Q.E.D.
19
Deterministic HJB (Recap)
“Optimal cost-to-go” or “value function”
Let’s find a recursive structure in V via Dynamic Programming.
20
t0
tft’
If the blue is optimal, then the red is necessarily
optimal as well.
Deriving Stochastic HJB Equation
Define the value function as before.
Establish a recursive formula for V.
As usual, our strategy is to Taylor expand V(t+dt,x(t+dt)) and consider
only the terms that are O(dt).
21
Taylor Expansion & Itô‘s Lemma
Under the Wiener noise, the chain rule gives
Why?
22
Taylor Expansion & Itô‘s Lemma
Under the Wiener noise, the chain rule gives
We need to consider the 2nd order term. This result can be generalized
and is referred to as Itô‘s lemma.
23
Stochastic HJB Equation
After substitution and taking the limit dt -> 0, we obtain
Notice the difference from the deterministic HJB equation:
The magnitude of the diffusion term b(x) affects
the optimal control through the value function.
24
Solving HJB Equation
In general HJB becomes an n-dimensional PDE (in Vx), which needs to
be solved backward in time.
• Second-order, nonlinear PDE.
• Exponential growth of the computational and storage requirements
as the dimension n increases.
• Exception: LQG control
25
Summary – Stochastic Optimal Control
Problem Formulation:
Solution Method: Stochastic Dynamic Programming
 Solve HJB or approximate the problem to alleviate the complexity.
26
Outline
Part 1 (Today)
Stochastic Optimal Control
› Deterministic optimal control, Bellman’s principle of optimality
› Wiener processes and stochastic differential equations
› Stochastic Hamilton-Jacobi-Bellman equation
Information Theoretic Control
› Legendre transformation
Part 2 (Tentative, 12/1)
› Helmholtz free energy and its interpretations
› Relations to Bellman’s principle of optimality & linearly solvable optimal control
problems
› Algorithms and applications
› Limitations
Basic Idea
• Instead of solving for the value function, we aim at finding a lower
bound on the expected cost that can be easily evaluated.
• Use of an information theoretic inequality.
Theorem (Legendre Transform, Theodorou 2015)
Let and . Consider two probability distributions p
and q over x. Then for , the following inequality holds.
28 Note: q is assumed to be absolutely continuous w.r.t. p.
Proof of the Legendre Transform
Change of measure from p to q gives
Apply Jensen’s inequality to RHS.
Multiply with –λ (< 0) to get the inequality.
29
Jensen’s Inequality (Review)
Let f be a convex function over a real-valued random variable X.
Then,
Furthermore, if f is strictly convex, the equality holds if and only if X =
E[X] with probability 1, in which case X is a deterministic constant.
In our case, log() is a strictly concave function, so the direction of the
inequality is flipped.
30
Interpretations of the Legendre Transform
• Think of x as a path in the state space rooted at the current state.
• Think of J() as a cost over the path.
• A path x is generated by a forward integration of an SDE defined by
.
 As we vary the control input profile u, we change the resulting
distribution over the state trajectories.
31
Interpretations of the Legendre Transform
• LHS is independent of u, and is uniquely defined once the cost
function J, the system dynamics and λ are specified. It is a property
of the system and called the (Helmholtz) free energy.
• The first term in RHS is the expected state-dependent cost.
• The second term is non-negative since KL-divergence is non-
negative.
 It turns out that this is an implicit measure of the control effort.
32
Interpretations of the Legendre Transform
• LHS is independent of u, and is uniquely defined once the cost
function J, the system dynamics and λ are specified. It is a property
of the system and called the (Helmholtz) free energy.
• The first term in RHS is the expected state-dependent cost.
• The second term is non-negative since KL-divergence is non-
negative.
 It turns out that this is an implicit measure of the control effort.
33
Interpretations of the Legendre Transform
• LHS is independent of u, and is uniquely defined once the cost
function J, the system dynamics and λ are specified. It is a property
of the system and called the (Helmholtz) free energy.
• The first term in RHS is the expected state-dependent cost.
• The second term is non-negative since KL-divergence is non-
negative.
 It turns out that this is an implicit measure of the control effort.
34
Optimality in the Legendre Sense
The free energy is the solution to the following optimization problem.
The optimal distribution is given by
35 Note: q is assumed to be absolutely continuous w.r.t. p.
Remarks
• As we will see later, the KL-divergence is an implicit measure of the
control cost weighted by λ. We see that the control cost naturally
emerges out of the Legendre transform.
• Questions that we might have at this moment:
• Why is the minimizer called the free energy? Where does it come
from?
• How is it related to the Bellman’s principle of optimality? Are they
the same?
• How can we develop algorithms based on the information-theoretic
control framework? To be continued…
36
References
• H. J. Kappen, An Introduction to Stochastic Control Theory, Path Integrals and Reinforcement
Learning, AIP Conference Proceedings, 2007.
• H. J. Kappen, Path Integrals and Symmetry Breaking for Optimal Control Theory, Journal of
Statistical MechanicsL Theory and Experiment, 2005.
• E. A. Theodorou, Nonlinear Stochastic Control and Information Theoretic Dualities: Connections,
Interdependencies and Thermodynamic Interpretations, Entropy, 2015.
• G. Williams, P. Drews, B. Goldfain, J. M. Rehg, E. A. Theodorou, Information Theoretic Model
Predictive Control: Theory and Applications to Autonomous Driving, arXiv, 2017.
• C. Shalizi, Diffusions and the Wiener Process,
http://www.stat.cmu.edu/~cshalizi/754/notes/lecture-17.pdf, accessed on Nov. 10, 2017.
37

More Related Content

What's hot

Propriedades dos limites
Propriedades dos limitesPropriedades dos limites
Propriedades dos limites
Calculos Na Veia
 
Design and analysis of robust h infinity controller
Design and analysis of robust h infinity controllerDesign and analysis of robust h infinity controller
Design and analysis of robust h infinity controller
Alexander Decker
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
Hyungjoo Cho
 
Wk 6 part 2 non linearites and non linearization april 05
Wk 6 part 2 non linearites and non linearization april 05Wk 6 part 2 non linearites and non linearization april 05
Wk 6 part 2 non linearites and non linearization april 05
Charlton Inao
 
Calculus a Functions of Several Variables
Calculus a Functions of Several Variables Calculus a Functions of Several Variables
Calculus a Functions of Several Variables
Harington Dinklage
 
Integral Indefinida E Definida
Integral Indefinida E DefinidaIntegral Indefinida E Definida
Integral Indefinida E Definida
educacao f
 
Aula 07 derivadas - regras de derivação - parte 1
Aula 07   derivadas - regras de derivação - parte 1Aula 07   derivadas - regras de derivação - parte 1
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
Sanghyuk Chun
 
Speed Control of DC Motor using PID FUZZY Controller.
Speed Control of DC Motor using PID FUZZY Controller.Speed Control of DC Motor using PID FUZZY Controller.
Speed Control of DC Motor using PID FUZZY Controller.
Binod kafle
 
Formal Concept Analysis
Formal Concept AnalysisFormal Concept Analysis
Formal Concept Analysis
SSA KPI
 
Calculus of variations
Calculus of variationsCalculus of variations
Calculus of variations
sauravpatkotwar
 
03 convexfunctions
03 convexfunctions03 convexfunctions
03 convexfunctions
Sufyan Sahoo
 
Model Reference Adaptive Control.ppt
Model Reference Adaptive Control.pptModel Reference Adaptive Control.ppt
Model Reference Adaptive Control.ppt
niyefa3149
 
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
준식 최
 
1531 fourier series- integrals and trans
1531 fourier series- integrals and trans1531 fourier series- integrals and trans
1531 fourier series- integrals and trans
Dr Fereidoun Dejahang
 
Week 15 state space rep may 25 2016 final
Week 15 state space rep   may 25  2016 finalWeek 15 state space rep   may 25  2016 final
Week 15 state space rep may 25 2016 final
Charlton Inao
 
Discrete time control systems
Discrete time control systemsDiscrete time control systems
Discrete time control systems
add0103
 
Aula 15: O oscilador harmônico
Aula 15: O oscilador harmônicoAula 15: O oscilador harmônico
Aula 15: O oscilador harmônico
Adriano Silva
 
Convex Optimization
Convex OptimizationConvex Optimization
Convex Optimization
adil raja
 
L7 fuzzy relations
L7 fuzzy relationsL7 fuzzy relations
L7 fuzzy relations
Mohammad Umar Rehman
 

What's hot (20)

Propriedades dos limites
Propriedades dos limitesPropriedades dos limites
Propriedades dos limites
 
Design and analysis of robust h infinity controller
Design and analysis of robust h infinity controllerDesign and analysis of robust h infinity controller
Design and analysis of robust h infinity controller
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
 
Wk 6 part 2 non linearites and non linearization april 05
Wk 6 part 2 non linearites and non linearization april 05Wk 6 part 2 non linearites and non linearization april 05
Wk 6 part 2 non linearites and non linearization april 05
 
Calculus a Functions of Several Variables
Calculus a Functions of Several Variables Calculus a Functions of Several Variables
Calculus a Functions of Several Variables
 
Integral Indefinida E Definida
Integral Indefinida E DefinidaIntegral Indefinida E Definida
Integral Indefinida E Definida
 
Aula 07 derivadas - regras de derivação - parte 1
Aula 07   derivadas - regras de derivação - parte 1Aula 07   derivadas - regras de derivação - parte 1
Aula 07 derivadas - regras de derivação - parte 1
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
 
Speed Control of DC Motor using PID FUZZY Controller.
Speed Control of DC Motor using PID FUZZY Controller.Speed Control of DC Motor using PID FUZZY Controller.
Speed Control of DC Motor using PID FUZZY Controller.
 
Formal Concept Analysis
Formal Concept AnalysisFormal Concept Analysis
Formal Concept Analysis
 
Calculus of variations
Calculus of variationsCalculus of variations
Calculus of variations
 
03 convexfunctions
03 convexfunctions03 convexfunctions
03 convexfunctions
 
Model Reference Adaptive Control.ppt
Model Reference Adaptive Control.pptModel Reference Adaptive Control.ppt
Model Reference Adaptive Control.ppt
 
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
Paper Summary of Beta-VAE: Learning Basic Visual Concepts with a Constrained ...
 
1531 fourier series- integrals and trans
1531 fourier series- integrals and trans1531 fourier series- integrals and trans
1531 fourier series- integrals and trans
 
Week 15 state space rep may 25 2016 final
Week 15 state space rep   may 25  2016 finalWeek 15 state space rep   may 25  2016 final
Week 15 state space rep may 25 2016 final
 
Discrete time control systems
Discrete time control systemsDiscrete time control systems
Discrete time control systems
 
Aula 15: O oscilador harmônico
Aula 15: O oscilador harmônicoAula 15: O oscilador harmônico
Aula 15: O oscilador harmônico
 
Convex Optimization
Convex OptimizationConvex Optimization
Convex Optimization
 
L7 fuzzy relations
L7 fuzzy relationsL7 fuzzy relations
L7 fuzzy relations
 

Similar to Stochastic Optimal Control & Information Theoretic Dualities

Mechanical Engineering Assignment Help
Mechanical Engineering Assignment HelpMechanical Engineering Assignment Help
Mechanical Engineering Assignment Help
Matlab Assignment Experts
 
Introduction to Quantum Monte Carlo
Introduction to Quantum Monte CarloIntroduction to Quantum Monte Carlo
Introduction to Quantum Monte Carlo
Claudio Attaccalite
 
Lecture cochran
Lecture cochranLecture cochran
Lecture cochran
sabbir11
 
Quantum Mechanics II.ppt
Quantum Mechanics II.pptQuantum Mechanics II.ppt
Quantum Mechanics II.ppt
SKMishra47
 
TR-6.ppt
TR-6.pptTR-6.ppt
TR-6.ppt
ssuserdc5a3d
 
Nature16059
Nature16059Nature16059
Nature16059
Lagal Tchixa
 
2 general properties
2 general properties2 general properties
2 general properties
katamthreveni
 
Unit 5: All
Unit 5: AllUnit 5: All
Unit 5: All
Hector Zenil
 
lec11_OPTIMAL and MULTIVARIABLE CONTROLS.pptx
lec11_OPTIMAL and MULTIVARIABLE CONTROLS.pptxlec11_OPTIMAL and MULTIVARIABLE CONTROLS.pptx
lec11_OPTIMAL and MULTIVARIABLE CONTROLS.pptx
haziq674510
 
Laplace transform
Laplace transformLaplace transform
Laplace transform
Rodrigo Adasme Aguilera
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
irjes
 
Chapter 4. diffrential
Chapter 4. diffrentialChapter 4. diffrential
Chapter 4. diffrential
kidanemariam tesera
 
Control of Uncertain Hybrid Nonlinear Systems Using Particle Filters
Control of Uncertain Hybrid Nonlinear Systems Using Particle FiltersControl of Uncertain Hybrid Nonlinear Systems Using Particle Filters
Control of Uncertain Hybrid Nonlinear Systems Using Particle Filters
Leo Asselborn
 
Concepts and Problems in Quantum Mechanics, Lecture-II By Manmohan Dash
Concepts and Problems in Quantum Mechanics, Lecture-II By Manmohan DashConcepts and Problems in Quantum Mechanics, Lecture-II By Manmohan Dash
Concepts and Problems in Quantum Mechanics, Lecture-II By Manmohan Dash
Manmohan Dash
 
Non equilibrium thermodynamics in multiphase flows
Non equilibrium thermodynamics in multiphase flowsNon equilibrium thermodynamics in multiphase flows
Non equilibrium thermodynamics in multiphase flows
Springer
 
Non equilibrium thermodynamics in multiphase flows
Non equilibrium thermodynamics in multiphase flowsNon equilibrium thermodynamics in multiphase flows
Non equilibrium thermodynamics in multiphase flows
Springer
 
APPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONS
APPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONSAPPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONS
APPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONS
AYESHA JAVED
 
HMT CONVhdhdhdhdhdhdh hv vhvh vECTION 1.pdf
HMT CONVhdhdhdhdhdhdh hv vhvh vECTION 1.pdfHMT CONVhdhdhdhdhdhdh hv vhvh vECTION 1.pdf
HMT CONVhdhdhdhdhdhdh hv vhvh vECTION 1.pdf
RaviShankar269655
 
Numerical method for pricing american options under regime
Numerical method for pricing american options under regime Numerical method for pricing american options under regime
Numerical method for pricing american options under regime
Alexander Decker
 
Variational Principle
Variational PrincipleVariational Principle
Variational Principle
AmeenSoomro1
 

Similar to Stochastic Optimal Control & Information Theoretic Dualities (20)

Mechanical Engineering Assignment Help
Mechanical Engineering Assignment HelpMechanical Engineering Assignment Help
Mechanical Engineering Assignment Help
 
Introduction to Quantum Monte Carlo
Introduction to Quantum Monte CarloIntroduction to Quantum Monte Carlo
Introduction to Quantum Monte Carlo
 
Lecture cochran
Lecture cochranLecture cochran
Lecture cochran
 
Quantum Mechanics II.ppt
Quantum Mechanics II.pptQuantum Mechanics II.ppt
Quantum Mechanics II.ppt
 
TR-6.ppt
TR-6.pptTR-6.ppt
TR-6.ppt
 
Nature16059
Nature16059Nature16059
Nature16059
 
2 general properties
2 general properties2 general properties
2 general properties
 
Unit 5: All
Unit 5: AllUnit 5: All
Unit 5: All
 
lec11_OPTIMAL and MULTIVARIABLE CONTROLS.pptx
lec11_OPTIMAL and MULTIVARIABLE CONTROLS.pptxlec11_OPTIMAL and MULTIVARIABLE CONTROLS.pptx
lec11_OPTIMAL and MULTIVARIABLE CONTROLS.pptx
 
Laplace transform
Laplace transformLaplace transform
Laplace transform
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
Chapter 4. diffrential
Chapter 4. diffrentialChapter 4. diffrential
Chapter 4. diffrential
 
Control of Uncertain Hybrid Nonlinear Systems Using Particle Filters
Control of Uncertain Hybrid Nonlinear Systems Using Particle FiltersControl of Uncertain Hybrid Nonlinear Systems Using Particle Filters
Control of Uncertain Hybrid Nonlinear Systems Using Particle Filters
 
Concepts and Problems in Quantum Mechanics, Lecture-II By Manmohan Dash
Concepts and Problems in Quantum Mechanics, Lecture-II By Manmohan DashConcepts and Problems in Quantum Mechanics, Lecture-II By Manmohan Dash
Concepts and Problems in Quantum Mechanics, Lecture-II By Manmohan Dash
 
Non equilibrium thermodynamics in multiphase flows
Non equilibrium thermodynamics in multiphase flowsNon equilibrium thermodynamics in multiphase flows
Non equilibrium thermodynamics in multiphase flows
 
Non equilibrium thermodynamics in multiphase flows
Non equilibrium thermodynamics in multiphase flowsNon equilibrium thermodynamics in multiphase flows
Non equilibrium thermodynamics in multiphase flows
 
APPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONS
APPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONSAPPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONS
APPLICATION OF HIGHER ORDER DIFFERENTIAL EQUATIONS
 
HMT CONVhdhdhdhdhdhdh hv vhvh vECTION 1.pdf
HMT CONVhdhdhdhdhdhdh hv vhvh vECTION 1.pdfHMT CONVhdhdhdhdhdhdh hv vhvh vECTION 1.pdf
HMT CONVhdhdhdhdhdhdh hv vhvh vECTION 1.pdf
 
Numerical method for pricing american options under regime
Numerical method for pricing american options under regime Numerical method for pricing american options under regime
Numerical method for pricing american options under regime
 
Variational Principle
Variational PrincipleVariational Principle
Variational Principle
 

Recently uploaded

Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
SakkaravarthiShanmug
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
shadow0702a
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
PKavitha10
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
shahdabdulbaset
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 

Recently uploaded (20)

Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
 
Hematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood CountHematology Analyzer Machine - Complete Blood Count
Hematology Analyzer Machine - Complete Blood Count
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 

Stochastic Optimal Control & Information Theoretic Dualities

  • 1. Stochastic Optimal Control & Information Theoretic Dualities MSL Group Meeting, November 10th, 2017 Haruki Nishimura PART 1: INTRODUCTION TO STOCHASTIC CONTROL
  • 2. The Big Picture 2 [Williams et al., 2017] Stochastic Optimal Control Theory • “Optimality” defined by Bellman’s principle of optimality. • Solution methods based on stochastic dynamic programming. Information Theoretic Control Theory • “Optimality” defined in the sense of Legendre transform. • Solution methods based on forward sampling of stochastic differential equations. Two fundamentally different approaches to stochastic control problems.
  • 3. The Big Picture 3 [Williams et al., 2017] V(x0) Values of terminal states x 0 Stochastic Dynamic Programming Forward Sampling of SDEs Space of value function State space
  • 4. Outline Part 1 (Today) Stochastic Optimal Control › Deterministic optimal control, Bellman’s principle of optimality › Wiener processes and stochastic differential equations › Stochastic Hamilton-Jacobi-Bellman equation Information Theoretic Control › Legendre transformation Part 2 (Tentative, 12/1) › Helmholtz free energy and its interpretations › Relations to Bellman’s principle of optimality & linearly solvable optimal control problems › Algorithms and applications › Limitations
  • 5. Deterministic Optimal Control Problem Consider a continuous-time optimization problem of the following form: where 5 Terminal Cost Per-stage Cost : control input profile
  • 6. How to solve for the optimal control? 1. Pontryagin’s Maximum Principle • Based on calculus of variations. • Solve a system of ODEs (2n). (Hamiltonian System) • Open-loop Specification. 6 2. Bellman’s Principle of Optimality • Based on dynamic programming. • Solve an n-dimensional PDE. (HJB Equation) • Closed-loop Specification.
  • 7. Hamilton-Jacobi-Bellman Equation “Optimal cost-to-go” or “value function” Let’s find a recursive structure in V via Dynamic Programming. 7 t0 tft’ If the blue is optimal, then the red is necessarily optimal as well.
  • 8. Hamilton-Jacobi-Bellman Equation Taylor expand V(t+dt, x(t+dt)) around V(t,x(t)). Substitute this into the original equation with dx = f(t,x,u)dt. Rearrange terms and take the limit dt -> 0. 8
  • 9. Hamilton-Jacobi-Bellman Equation Boundary Condition: Numerically solve backwards in time for all (t,x) to obtain the closed- loop optimal control policy: 9
  • 10. Stochastic Optimal Control Problem Goal is to derive HJB. Any differences from the deterministic case? 10 1. The dynamics is governed by a stochastic differential equation. 2. Uncertainties about future state trajectories.
  • 11. Example: The Drunken Spider Problem Presence of noise (alcohol) can change the optimal behavior significantly. 11 • Without noise, the spider will cross the bridge. • When drunk, the cost of crossing the bridge increases and the spider should go around the lake. [Kappen, 2005]
  • 12. What is the stochasticity in the dynamics? Recall that the dynamics is described by the following stochastic differential equation. where wt is a stochastic process called the Wiener Process (a.k.a. Standard Brownian Motion). 12
  • 13. The Wiener Process A type of Gaussian Process with “good” properties to model random behavior that evolves over time. 13 Applications: • Finance • Physics • Chemistry • Stochastic Control Theory …and more. Image URL: https://github.com/matthewfieger/wiener_process
  • 14. The Wiener Process 1. Continuity: wt is continuous in t with probability 1. 2. Gaussianity:wt+s – ws is distributed according to N(0,t). 3. Independnce: wt+s – ws is independent of {wr}r<=s. 4. Stationarity:The distribution for wt+s – ws is independent of s. 14
  • 15. The Wiener Process 1. Continuity: wt is continuous in t with probability 1. 2. Gaussianity:wt+s – ws is distributed according to N(0,t). 3. Independnce: wt+s – ws is independent of {wr}r<=s. 4. Stationarity:The distribution for wt+s – ws is independent of s. 15
  • 16. The Wiener Process 1. Continuity: wt is continuous in t with probability 1. 2. Gaussianity:wt+s – ws is distributed according to N(0,t). 3. Independnce: wt+s – ws is independent of {wr}r<=s. 4. Stationarity:The distribution for wt+s – ws is independent of s. 16
  • 17. The Wiener Process 1. Continuity: wt is continuous in t with probability 1. 2. Gaussianity:wt+s – ws is distributed according to N(0,t). 3. Independnce: wt+s – ws is independent of {wr}r<=s. 4. Stationarity:The distribution for wt+s – ws is independent of s. 17
  • 18. Stochastic Differential Equation Easiest to think of dwt as a Gaussian white noise with . Going back to our original equation, Note that it is not written in (dx/dt) form because the Wiener process is shown to be nowhere- differentiable with probability 1. 18 Drift term Diffusion term
  • 19. Proof of Indifferentiability Assume wt is differentiable at some t0. Then the derivative w’ must exist. That is, Without loss of generality take t >= t0. Now the definition of the Wiener process gives and thus, The probability of the quantity |.| exceeding any positive ε is always positive, and can be made arbitrarily large by taking t sufficiently close to t0. Q.E.D. 19
  • 20. Deterministic HJB (Recap) “Optimal cost-to-go” or “value function” Let’s find a recursive structure in V via Dynamic Programming. 20 t0 tft’ If the blue is optimal, then the red is necessarily optimal as well.
  • 21. Deriving Stochastic HJB Equation Define the value function as before. Establish a recursive formula for V. As usual, our strategy is to Taylor expand V(t+dt,x(t+dt)) and consider only the terms that are O(dt). 21
  • 22. Taylor Expansion & Itô‘s Lemma Under the Wiener noise, the chain rule gives Why? 22
  • 23. Taylor Expansion & Itô‘s Lemma Under the Wiener noise, the chain rule gives We need to consider the 2nd order term. This result can be generalized and is referred to as Itô‘s lemma. 23
  • 24. Stochastic HJB Equation After substitution and taking the limit dt -> 0, we obtain Notice the difference from the deterministic HJB equation: The magnitude of the diffusion term b(x) affects the optimal control through the value function. 24
  • 25. Solving HJB Equation In general HJB becomes an n-dimensional PDE (in Vx), which needs to be solved backward in time. • Second-order, nonlinear PDE. • Exponential growth of the computational and storage requirements as the dimension n increases. • Exception: LQG control 25
  • 26. Summary – Stochastic Optimal Control Problem Formulation: Solution Method: Stochastic Dynamic Programming  Solve HJB or approximate the problem to alleviate the complexity. 26
  • 27. Outline Part 1 (Today) Stochastic Optimal Control › Deterministic optimal control, Bellman’s principle of optimality › Wiener processes and stochastic differential equations › Stochastic Hamilton-Jacobi-Bellman equation Information Theoretic Control › Legendre transformation Part 2 (Tentative, 12/1) › Helmholtz free energy and its interpretations › Relations to Bellman’s principle of optimality & linearly solvable optimal control problems › Algorithms and applications › Limitations
  • 28. Basic Idea • Instead of solving for the value function, we aim at finding a lower bound on the expected cost that can be easily evaluated. • Use of an information theoretic inequality. Theorem (Legendre Transform, Theodorou 2015) Let and . Consider two probability distributions p and q over x. Then for , the following inequality holds. 28 Note: q is assumed to be absolutely continuous w.r.t. p.
  • 29. Proof of the Legendre Transform Change of measure from p to q gives Apply Jensen’s inequality to RHS. Multiply with –λ (< 0) to get the inequality. 29
  • 30. Jensen’s Inequality (Review) Let f be a convex function over a real-valued random variable X. Then, Furthermore, if f is strictly convex, the equality holds if and only if X = E[X] with probability 1, in which case X is a deterministic constant. In our case, log() is a strictly concave function, so the direction of the inequality is flipped. 30
  • 31. Interpretations of the Legendre Transform • Think of x as a path in the state space rooted at the current state. • Think of J() as a cost over the path. • A path x is generated by a forward integration of an SDE defined by .  As we vary the control input profile u, we change the resulting distribution over the state trajectories. 31
  • 32. Interpretations of the Legendre Transform • LHS is independent of u, and is uniquely defined once the cost function J, the system dynamics and λ are specified. It is a property of the system and called the (Helmholtz) free energy. • The first term in RHS is the expected state-dependent cost. • The second term is non-negative since KL-divergence is non- negative.  It turns out that this is an implicit measure of the control effort. 32
  • 33. Interpretations of the Legendre Transform • LHS is independent of u, and is uniquely defined once the cost function J, the system dynamics and λ are specified. It is a property of the system and called the (Helmholtz) free energy. • The first term in RHS is the expected state-dependent cost. • The second term is non-negative since KL-divergence is non- negative.  It turns out that this is an implicit measure of the control effort. 33
  • 34. Interpretations of the Legendre Transform • LHS is independent of u, and is uniquely defined once the cost function J, the system dynamics and λ are specified. It is a property of the system and called the (Helmholtz) free energy. • The first term in RHS is the expected state-dependent cost. • The second term is non-negative since KL-divergence is non- negative.  It turns out that this is an implicit measure of the control effort. 34
  • 35. Optimality in the Legendre Sense The free energy is the solution to the following optimization problem. The optimal distribution is given by 35 Note: q is assumed to be absolutely continuous w.r.t. p.
  • 36. Remarks • As we will see later, the KL-divergence is an implicit measure of the control cost weighted by λ. We see that the control cost naturally emerges out of the Legendre transform. • Questions that we might have at this moment: • Why is the minimizer called the free energy? Where does it come from? • How is it related to the Bellman’s principle of optimality? Are they the same? • How can we develop algorithms based on the information-theoretic control framework? To be continued… 36
  • 37. References • H. J. Kappen, An Introduction to Stochastic Control Theory, Path Integrals and Reinforcement Learning, AIP Conference Proceedings, 2007. • H. J. Kappen, Path Integrals and Symmetry Breaking for Optimal Control Theory, Journal of Statistical MechanicsL Theory and Experiment, 2005. • E. A. Theodorou, Nonlinear Stochastic Control and Information Theoretic Dualities: Connections, Interdependencies and Thermodynamic Interpretations, Entropy, 2015. • G. Williams, P. Drews, B. Goldfain, J. M. Rehg, E. A. Theodorou, Information Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving, arXiv, 2017. • C. Shalizi, Diffusions and the Wiener Process, http://www.stat.cmu.edu/~cshalizi/754/notes/lecture-17.pdf, accessed on Nov. 10, 2017. 37

Editor's Notes

  1. 20pt \begin{aligned} & \underset{u(t_0 \rightarrow t_f)}{\text{minimize}} & & J\left(t_0, x_0, u(t_0 \rightarrow t_f)\right)\\ & \text{subject to} & & x(t_0) = x_0 \\ & & & \dot x(t) = f\left(t,x(t), u(t)\right) \end{aligned} J = \phi\left(x(t_f)\right) + \int_{t_0}^{t_f} c\left(t,x(t),u(t)\right) dt u(t_0 \rightarrow t_f): [t_0, t_f] \rightarrow U
  2. 16pt \begin{aligned} \dot{x^*}(t) &= H_\lambda\left(t,x^*(t),u^*(t),\lambda(t)\right)\\ -\dot\lambda(t) &= H_x\left(t,x^*(t),u^*(t),\lambda(t)\right)\\ u^*(t) &= \arg\min_u H\left(t,x^*(t),u(t),\lambda(t)\right) \end{aligned} In the presence of Wiener noise, the PMP formalism can be generalized and yeilds a set of coupled stochastic differential equations, but they become difficult to solve due to the boundary conditions at initial and final time. In contrast, the inclusion of noise in the HJB framework is mathematically quite straightforward.
  3. 20 pts V(t,x) \triangleq \min_{u(t \rightarrow t_f)} J\left(t,x,u(t \rightarrow t_f)\right) 16 pts \begin{aligned} V(t,x) &= \min_{u(t \rightarrow t_f)} \left(\phi(x(t_f)) + \int_t^{t+dt} c(\tau,x(\tau),u(\tau))d\tau + \int_{t+dt}^{t_f} c(\tau,x(\tau),u(\tau))d\tau\right) \\ &= \min_{u(t \rightarrow t+dt)} \left(\int_t^{t+dt} c(\tau,x(\tau),u(\tau))d\tau + \min_{u({t+dt}\rightarrow t_f)} \left(\phi(x(t_f)) + \int_{t+dt}^{t_f} c(\tau,x(\tau),u(\tau))d\tau\right)\right) \end{aligned}
  4. 20 pts V(t+dt,x(t+dt)) = V(t,x) + V_t(t,x)dt + V_x(t,x)dx + o(dt) 16 pts V(t,x) = \min_{u(t \rightarrow t+dt)} \left(\int_t^{t+dt} c(\tau,x(\tau),u(\tau))d\tau + V(t,x) + V_t(t,x)dt + V_x(t,x)f(t,x,u)dt + o(dt) \right) 20 pts -V_t(t,x) = \min_{u} \left(c(t,x,u) + V_x(t,x)f(t,x,u) \right)
  5. 20 pts V(t_f, x({t_f})) = \phi(x(t_f)) 20 pts u^*(t,x) = \arg\min_u \left(c(t,x,u) + V_x(t,x) f(t,x,u)\right)
  6. 20 pts \begin{aligned} & \underset{u(t_0 \rightarrow t_f)}{\text{minimize}} & & \mathbb{E}_{x_0 \rightarrow x(t_f)} \left[J\left(t_0, x_0, u(t_0 \rightarrow t_f)\right)\right]\\ & \text{subject to} & & x(t_0) = x_0 \\ & & & dx = f\left(t,x(t), u(t)\right)dt + b(x(t))dw_t \end{aligned}
  7. I
  8. dx = f\left(t,x(t), u(t)\right)dt + b(x(t))dw_t
  9. Image URL: https://github.com/matthewfieger/wiener_process
  10. 20 pts dw_t \sim \mathcal{N}(0,dt)
  11. 16 pts w' = \lim_{t\rightarrow t_0} \frac{w_t - w_{t_0}}{t - t_0} \forall \epsilon > 0 \exist \delta > 0 t\in [t_0 - \delta, \t_0 + \delta] \Longrightarrow |\frac{w_t - w_{t_0}} \forall ~\epsilon > 0 ~\exists \delta > 0 ~~t \in [t_0 - \delta, t_0 + \delta] \Longrightarrow \left\lvert\frac{w_t - w_{t_0}}{t - t_0} - w'\right\lvert \leq \epsilon w_t - w_{t_0} \sim \mathcal{N}(0,t-t_0) \frac{w_t - w_{t_0}}{t - t_0} \sim \mathcal{N}(-w',\frac{1}{t-t_0})
  12. 20 pts V(t,x) \triangleq \min_{u(t \rightarrow t_f)} J\left(t,x,u(t \rightarrow t_f)\right) 16 pts \begin{aligned} V(t,x) &= \min_{u(t \rightarrow t_f)} \left(\phi(x(t_f)) + \int_t^{t+dt} c(\tau,x(\tau),u(\tau))d\tau + \int_{t+dt}^{t_f} c(\tau,x(\tau),u(\tau))d\tau\right) \\ &= \min_{u(t \rightarrow t+dt)} \left(\int_t^{t+dt} c(\tau,x(\tau),u(\tau))d\tau + \min_{u({t+dt}\rightarrow t_f)} \left(\phi(x(t_f)) + \int_{t+dt}^{t_f} c(\tau,x(\tau),u(\tau))d\tau\right)\right) \end{aligned}
  13. 20 pts V(t,x) = \min_{u(t \rightarrow t + dt)}\mathbb{E}_{x \rightarrow x(t_f)} \left[ \int_t^{t+dt} c\left(\tau,x(\tau),u(\tau)\right) d\tau + V\left(t+dt,x(t+dt)\right)\right]
  14. 20 pts \mathbb{E}\left[V(t+dt,x(t+dt))\right] = V(t,x) + V_t(t,x)dt + V_x(t,x)\mathbb{E}[dx] + \frac{1}{2} V_{x^2}(t,x)\mathbb{E}[dx^2] + o(dt) \mathbb{E}[dx^2] = f^2dt^2 + 2fb\mathbb{E}[dw_t]dt + b^2\mathbb{E}[dw_t^2]
  15. 20 pts \mathbb{E}\left[V(t+dt,x(t+dt))\right] = V(t,x) + V_t(t,x)dt + V_x(t,x) f(t,x(t),u(t)) + \frac{1}{2} V_{x^2}(t,x)\mathbb{E}[dx^2] + o(dt) \mathbb{E}[dx^2] = f^2dt^2 + 2fb\mathbb{E}[dw_t]dt + b^2\mathbb{E}[dw_t^2] \begin{aligned} dx &= \mu dt + \sigma dw_t \\ df(t,x) &= \left(f_t + f_x\mu + \frac{1}{2}\sigma^2 f_{x^2}\right)dt + \sigma f_x dw_t \end{aligned}
  16. -V_t(t,x) = \min_u\left(c(t,x,u) + V_x(t,x) f(t,x,u) + \frac{1}{2} V_{x^2}(t,x)b^2(x)\right)
  17. 20 pts -V_t(t,x) = \min_u\left(c(x,u,t) + V_x^{\mathrm{T}}(t,x) f(t,x,u) + \frac{1}{2} \mathrm{tr}\left(V_{xx}(t,x)B(x)B(x)^\mathrm{T}\right)\right) LQ control problem. A notable exception is when b is linear in x and c is quadratic in x and u. The solution for V(t,x) is quadratic in x with time-varying coefficients. These coefficients satisfy coupled ODEs (Riccati equations) that can be solved efficiently.
  18. A notable exception is when b is linear in x and c is quadratic in x and u. The solution for V(t,)
  19. 20 pts x \in \Omega J(\cdot): \Omega \rightarrow \mathbb{R} \lambda > 0 -\lambda \log\left(\mathbb{E}_{x\sim p} \left[\exp(-\frac{1}{\lambda} J(x))\right]\right) \leq \mathbb{E}_{x\sim q} \left[J(x)\right] + \lambda \mathbb{D}_{\mathrm{KL}}(q \mid\mid p)
  20. 16 pts \log\left(\mathbb{E}_{x\sim p} \left[\exp(-\frac{1}{\lambda} J(x))\right]\right) = \log\left(\mathbb{E}_{x\sim q} \left[\exp(-\frac{1}{\lambda} J(x)) \frac{p(x)}{q(x)}\right]\right) \begin{aligned} \log\left(\mathbb{E}_{x\sim q} \left[\exp(-\frac{1}{\lambda} J(x)) \frac{p(x)}{q(x)}\right]\right) &\geq \mathbb{E}_{x \sim q} \left[ \log\left(\exp(-\frac{1}{\lambda} J(x)) \frac{p(x)}{q(x)}\right)\right] \\ &= -\frac{1}{\lambda} \mathbb{E}_{x\sim q} \left[J(x)\right] - \mathbb{D}_{\mathrm{KL}} (q \sim \sim p) \end{aligned}
  21. \mathbb{E}[f(X)] \geq f(\mathbb{E}[X])
  22. \begin{aligned} & \underset{q(x;u)}{\text{minimize}} & & \mathbb{E}_{x \sim q} \left[J(x)\right] + \lambda \mathbb{D}_{\mathrm{KL}} (q \mid\mid p)\\ & \text{subject to} & & \int q(x;u) dx = 1\\ & & & \forall x > 0 ~ q(x;u) \geq 0 \\ \end{aligned}