Mean-field Methods in Estimation & Control

1. CSLCOORDINATED SCIENCE LABORATORY Mean-ﬁeld Methods in Estimation & Control Prashant G. Mehta Coordinated Science Laboratory Department of Mechanical Science and Engineering University of Illinois at Urbana-Champaign EE Seminar, Harvard Apr. 1, 2011

2. Introduction Kuramoto model Background: Synchronization of coupled oscillators dθi(t) = ωi + κ N N ∑ j=1 sin(θj(t)−θi(t)) dt +σ dξi(t), i = 1,...,N ωi taken from distribution g(ω) over [1−γ,1+γ] γ — measures the heterogeneity of the population κ — measures the strength of coupling [10] Y. Kuramoto, 1975; [15] Strogatz et al., J. Stat. Phy., 1991 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 2 / 38

3. Introduction Kuramoto model Background: Synchronization of coupled oscillators dθi(t) = ωi + κ N N ∑ j=1 sin(θj(t)−θi(t)) dt +σ dξi(t), i = 1,...,N ωi taken from distribution g(ω) over [1−γ,1+γ] γ — measures the heterogeneity of the population κ — measures the strength of coupling 1- 1+1 [10] Y. Kuramoto, 1975; [15] Strogatz et al., J. Stat. Phy., 1991 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 2 / 38

4. Introduction Kuramoto model Background: Synchronization of coupled oscillators dθi(t) = ωi + κ N N ∑ j=1 sin(θj(t)−θi(t)) dt +σ dξi(t), i = 1,...,N ωi taken from distribution g(ω) over [1−γ,1+γ] γ — measures the heterogeneity of the population κ — measures the strength of coupling [10] Y. Kuramoto, 1975; [15] Strogatz et al., J. Stat. Phy., 1991 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 2 / 38

5. Introduction Kuramoto model Background: Synchronization of coupled oscillators dθi(t) = ωi + κ N N ∑ j=1 sin(θj(t)−θi(t)) dt +σ dξi(t), i = 1,...,N ωi taken from distribution g(ω) over [1−γ,1+γ] γ — measures the heterogeneity of the population κ — measures the strength of coupling 0 0.1 0.2 0.1 0.15 0.2 0.25 0.3 Locking Incoherence κ κ < κc(γ) R γ Synchrony Incoherence [10] Y. Kuramoto, 1975; [15] Strogatz et al., J. Stat. Phy., 1991 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 2 / 38

6. Introduction Kuramoto model Movies of incoherence and synchrony solution −1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 −1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 Incoherence Synchrony P. G. Mehta (Illinois) Harvard Apr. 1, 2011 3 / 38

7. Introduction Oscillator examples Motivation: Oscillators in biology Oscillators in biology Coupled oscillators Images taken from google search. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 4 / 38

8. Introduction Oscillator examples Hodgkin-Huxley type Neuron model C dV dt = −gT ·m2 ∞(V)·h·(V −ET ) −gh ·r ·(V −Eh)−...... dh dt = h∞(V)−h τh(V) dr dt = r∞(V)−r τr(V) 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 −150 −100 −50 0 50 100 Voltage time Neural spike train [6] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 5 / 38

9. Introduction Oscillator examples Hodgkin-Huxley type Neuron model C dV dt = −gT ·m2 ∞(V)·h·(V −ET ) −gh ·r ·(V −Eh)−...... dh dt = h∞(V)−h τh(V) dr dt = r∞(V)−r τr(V) 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 −150 −100 −50 0 50 100 Voltage time Neural spike train [6] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 5 / 38

10. Introduction Oscillator examples Hodgkin-Huxley type Neuron model C dV dt = −gT ·m2 ∞(V)·h·(V −ET ) −gh ·r ·(V −Eh)−...... dh dt = h∞(V)−h τh(V) dr dt = r∞(V)−r τr(V) 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 −150 −100 −50 0 50 100 Voltage time Neural spike train −100 −50 0 50 100 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 Vh r Limit cyle r h v [6] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 5 / 38

11. Introduction Oscillator examples Hodgkin-Huxley type Neuron model C dV dt = −gT ·m2 ∞(V)·h·(V −ET ) −gh ·r ·(V −Eh)−...... dh dt = h∞(V)−h τh(V) dr dt = r∞(V)−r τr(V) 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 −150 −100 −50 0 50 100 Voltage time Neural spike train −100 −50 0 50 100 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 Vh r Limit cyle r h v Normal form reduction −−−−−−−−−−−−−→ ˙θi = ωi +ui ·Φ(θi) [6] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 5 / 38

12. Introduction Role of Neural Rhythms Question (Fundamental question in Neuroscience) Why is synchrony (neural rhythms) useful? Does it have a functional role? 1 Synchronization Phase transition in controlled system (motivated by coupled oscillators) H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, “Synchronization of Coupled Oscillators is a Game,” TAC 2 Neuronal computations Bayesian inference Neural circuits as particle ﬁlters (Lee & Mumford) T. Yang, P. G. Mehta and S. P. Meyn, “ A Control-oriented Approach for Particle Filtering,” ACC 2011, CDC 2011 3 Learning Synaptic plasticity via long term potentiation (Hebbian learning) “Neurons that ﬁre together wire together” H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, “ Learning in Mean-Field Oscillator Games,” CDC 2010 Destexhe & Marder, Nature, 2004; Kopell et al., Neuroscience, 2009; Lee & Mumford, J. Opt. Soc, 2003 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 6 / 38

16. Part I Mean-ﬁeld Control

17. Collaborators Huibing Yin Sean P. Meyn Uday V. Shanbhag “Synchronization of coupled oscillators is a game,” IEEE TAC “Learning in Mean-Field Oscillator Games,” CDC 2010 “On the Efﬁciency of Equilibria in Mean-ﬁeld Oscillator Games,” ACC 2011 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 8 / 38

18. Derivation of model Problem statement Oscillators dynamic game Dynamics of ith oscillator dθi = (ωi +ui(t))dt +σ dξi, i = 1,...,N, t ≥ 0 ui(t) — control 1- 1+1 ith oscillator seeks to minimize ηi(ui;u−i) = lim T→∞ 1 T T 0 E[ c(θi;θ−i) cost of anarchy + 1 2 Ru2 i cost of control ]ds θ−i = (θj)j=i R — control penalty c(·) — cost function c(θi;θ−i) = 1 N ∑ j=i c• (θi,θj), c• ≥ 0 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 9 / 38

19. Derivation of model Problem statement Oscillators dynamic game Dynamics of ith oscillator dθi = (ωi +ui(t))dt +σ dξi, i = 1,...,N, t ≥ 0 ui(t) — control 1- 1+1 ith oscillator seeks to minimize ηi(ui;u−i) = lim T→∞ 1 T T 0 E[ c(θi;θ−i) cost of anarchy + 1 2 Ru2 i cost of control ]ds θ−i = (θj)j=i R — control penalty c(·) — cost function c(θi;θ−i) = 1 N ∑ j=i c• (θi,θj), c• ≥ 0 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 9 / 38

20. Derivation of model Problem statement Oscillators dynamic game Dynamics of ith oscillator dθi = (ωi +ui(t))dt +σ dξi, i = 1,...,N, t ≥ 0 ui(t) — control ith oscillator seeks to minimize ηi(ui;u−i) = lim T→∞ 1 T T 0 E[ c(θi;θ−i) cost of anarchy + 1 2 Ru2 i cost of control ]ds θ−i = (θj)j=i R — control penalty c(·) — cost function c(θi;θ−i) = 1 N ∑ j=i c• (θi,θj), c• ≥ 0 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 9 / 38

21. Derivation of model Mean-field model Mean-field model derivation dθi = (ωi +ui(t))dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds Influence Influence Mass 1 Mean-field approximation c(θi;θ−i(t)) = 1 N ∑ j=i c• (θi,θj) N→∞ −−−−−−→ ¯c(θi,t) 2 Optimal control of single oscillator Decentralized control structure [8] M. Huang, P. Caines, and R. Malhame, IEEE TAC, 2007 [HCM]; [13] Lasry & Lions, Japan. J. Math, 2007; P. G. Mehta (Illinois) Harvard Apr. 1, 2011 10 / 38

22. Derivation of model Derivation steps Single oscillator with given cost Dynamics of the oscillator dθi = (ωi +ui(t))dt +σ dξi, t ≥ 0 The cost function is assumed known ηi(ui; ¯c) = lim T→∞ 1 T T 0 E c(θi;θ−i) + 1 2 Ru2 i (s) ds ⇑ ¯c(θi(s),s) HJB equation: ∂thi +ωi∂θ hi = 1 2R (∂θ hi)2 − ¯c(θ,t)+η∗ i − σ2 2 ∂2 θθ hi Optimal control law: u∗ i (t) = ϕi(θ,t) = − 1 R ∂θ hi(θ,t) [1] D. P. Bertsekas (1995); [14] S. P. Meyn, IEEE TAC, 1997 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 11 / 38

27. Derivation of model Derivation steps Single oscillator with optimal control Dynamics of the oscillator dθi(t) = ωi − 1 R ∂θ hi(θi,t) dt +σ dξi(t) Kolmogorov forward equation for pdf p(θ,t,ωi) FPK: ∂t p+ωi∂θ p = 1 R ∂θ [p(∂θ h)]+ σ2 2 ∂2 θθ p P. G. Mehta (Illinois) Harvard Apr. 1, 2011 12 / 38

28. Derivation of model Derivation steps Mean-ﬁeld Approximation HJB equation for population ∂th+ω∂θ h = 1 2R (∂θ h)2 − ¯c(θ,t)+η(ω)− σ2 2 ∂2 θθ h h(θ,t,ω) Population density ∂t p+ω∂θ p = 1 R ∂θ [p(∂θ h)]+ σ2 2 ∂2 θθ p p(θ,t,ω) Enforce cost consistency ¯c(θ,t) = Ω 2π 0 c• (θ,ϑ)p(ϑ,t,ω)g(ω)dϑ dω ≈ 1 N ∑ j=i c• (θ,ϑ) P. G. Mehta (Illinois) Harvard Apr. 1, 2011 13 / 38

29. Derivation of model Derivation steps Mean-ﬁeld model dθi = (ωi +ui(t))dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds c(θi;θ−i) = 1 N ∑ j=i c• (θi,θj(t)) N→∞ −−−→ ¯c(θi,t) Letting N → ∞ and assume c(θi,θ−i) → ¯c(θi,t) HJB: ∂th+ω∂θ h = 1 2R (∂θ h)2 − ¯c(θ,t) +η∗ − σ2 2 ∂2 θθ h ⇒ h(θ,t,ω) FPK: ∂t p+ω∂θ p = 1 R ∂θ [p( ∂θ h )]+ σ2 2 ∂2 θθ p ⇒ p(θ,t,ω) ¯c(ϑ,t) = Ω 2π 0 c• (ϑ,θ) p(θ,t,ω) g(ω)dθ dω [8] Huang et al., IEEE TAC, 2007; [13] Lasry & Lions, Japan. J. Math, 2007; [16] Weintraub et al., NIPS, 2006; P. G. Mehta (Illinois) Harvard Apr. 1, 2011 14 / 38

30. Derivation of model Derivation steps Mean-ﬁeld model dθi = (ωi +ui(t))dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds Influence Influence Mass Letting N → ∞ and assume c(θi,θ−i) → ¯c(θi,t) HJB: ∂th+ω∂θ h = 1 2R (∂θ h)2 − ¯c(θ,t) +η∗ − σ2 2 ∂2 θθ h ⇒ h(θ,t,ω) FPK: ∂t p+ω∂θ p = 1 R ∂θ [p( ∂θ h )]+ σ2 2 ∂2 θθ p ⇒ p(θ,t,ω) ¯c(ϑ,t) = Ω 2π 0 c• (ϑ,θ) p(θ,t,ω) g(ω)dθ dω [8] Huang et al., IEEE TAC, 2007; [13] Lasry & Lions, Japan. J. Math, 2007; [16] Weintraub et al., NIPS, 2006; P. G. Mehta (Illinois) Harvard Apr. 1, 2011 14 / 38

34. Solutions of PDE model ε-Nash equilibrium Solution of PDEs gives ε-Nash equilibrium Optimal control law uo i = − 1 R ∂θ h(θ(t),t,ω) ω=ωi Theorem For the control law uo i = − 1 R ∂θ h(θ(t),t,ω) ω=ωi , we have the ε-Nash equilibrium property ηi(uo i ;uo −i) ≤ ηi(ui;uo −i)+O( 1 √ N ), i = 1,...,N, for any adapted control ui. So, we look for solutions of PDEs. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 15 / 38

37. Solutions of PDE model Synchronization Main result: Synchronization as a solution of the game Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γγ Incoherence R > Rc(γ) Synchrony Incoherence dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds 1- 1+1 Yin et al., ACC 2010 Strogatz et al., J. Stat. Phy., 1991 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 16 / 38

38. Solutions of PDE model Synchronization Main result: Synchronization as a solution of the game Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γγ Incoherence R > Rc(γ) Synchrony Incoherence dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds 0 0.1 0.2 0.1 0.15 0.2 0.25 0.3 Locking Incoherence κ κ < κc(γ) R γ Synchrony Incoherence dθi = ωi + κ N N ∑ j=1 sin(θj −θi) dt +σ dξi Yin et al., ACC 2010 Strogatz et al., J. Stat. Phy., 1991 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 16 / 38

39. Solutions of PDE model Synchronization Incoherence solution (PDE) Incoherence solution h(θ,t,ω) = h0(θ) := 0 p(θ,t,ω) = p0(θ) := 1 2π h(θ,t,ω) = 0 ⇒ ∂th+ω∂θ h = 1 2R (∂θ h)2 − ¯c(θ,t)+η∗ − σ2 2 ∂2 θθ h ∂t p+ω∂θ p = 1 R ∂θ [p(∂θ h)]+ σ2 2 ∂2 θθ p ¯c(θ,t) = Ω 2π 0 c• (θ,ϑ)p(ϑ,t,ω)g(ω)dϑ dω P. G. Mehta (Illinois) Harvard Apr. 1, 2011 17 / 38

40. Solutions of PDE model Synchronization Incoherence solution (PDE) Incoherence solution h(θ,t,ω) = h0(θ) := 0 p(θ,t,ω) = p0(θ) := 1 2π h(θ,t,ω) = 0 ⇒ ∂th+ω∂θ h = 1 2R (∂θ h)2 − ¯c(θ,t)+η∗ − σ2 2 ∂2 θθ h p(θ,t,ω) = 1 2π ⇒ ∂t p+ω∂θ p = 1 R ∂θ [p(∂θ h)]+ σ2 2 ∂2 θθ p ¯c(θ,t) = Ω 2π 0 c• (θ,ϑ)p(ϑ,t,ω)g(ω)dϑ dω P. G. Mehta (Illinois) Harvard Apr. 1, 2011 17 / 38

41. Solutions of PDE model Synchronization Incoherence solution (Finite population) Closed-loop dynamics dθi = (ωi + ui =0 )dt +σ dξi(t) Average cost ηi = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i =0 ]dt = lim T→∞ 1 N ∑ j=i 1 T T 0 E[1 2 sin2 θi(t)−θj(t) 2 ]dt = 1 N ∑ j=i 2π 0 E[1 2 sin2 θi(t)−ϑ 2 ] 1 2π dϑ = N −1 N η0 −1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 ε-Nash property ηi(uo i ;uo −i) ≤ ηi(ui;uo −i)+O( 1 √ N ), i = 1,...,N. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 18 / 38

42. Solutions of PDE model Synchronization Incoherence solution (Finite population) Closed-loop dynamics dθi = (ωi + ui =0 )dt +σ dξi(t) Average cost ηi = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i =0 ]dt = lim T→∞ 1 N ∑ j=i 1 T T 0 E[1 2 sin2 θi(t)−θj(t) 2 ]dt = 1 N ∑ j=i 2π 0 E[1 2 sin2 θi(t)−ϑ 2 ] 1 2π dϑ = N −1 N η0 −1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 ε-Nash property ηi(uo i ;uo −i) ≤ ηi(ui;uo −i)+O( 1 √ N ), i = 1,...,N. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 18 / 38

43. Analysis of phase transition Numerics Bifurcation diagram Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γγ Incoherence R > Rc(γ) Synchrony Incoherence R−1/2 η(ω) 0. 1 0.15 0. 2 0.25 0. 3 0.35 0. 1 0.15 0. 2 0.25 ω = 0.95 ω = 1 ω = 1.05 R > Rc η(ω) = η0 R < Rc η(ω) < η0 dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds 1 Convergence theory (as N → ∞) 2 Bifurcation analysis 3 Analytical and numerical computations Yin et al., IEEE TAC, To appear; [17] Yin et al., ACC2010 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 19 / 38

44. Analysis of phase transition Numerics Bifurcation diagram Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γγ Incoherence R > Rc(γ) Synchrony Incoherence R−1/2 η(ω) 0. 1 0.15 0. 2 0.25 0. 3 0.35 0. 1 0.15 0. 2 0.25 ω = 0.95 ω = 1 ω = 1.05 R > Rc η(ω) = η0 R < Rc η(ω) < η0 incoherence soln. dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds 1 Convergence theory (as N → ∞) 2 Bifurcation analysis 3 Analytical and numerical computations Yin et al., IEEE TAC, To appear; [17] Yin et al., ACC2010 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 19 / 38

45. Analysis of phase transition Numerics Bifurcation diagram Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γγ Incoherence R > Rc(γ) Synchrony Incoherence R−1/2 η(ω) 0. 1 0.15 0. 2 0.25 0. 3 0.35 0. 1 0.15 0. 2 0.25 ω = 0.95 ω = 1 ω = 1.05 R > Rc η(ω) = η0 R < Rc η(ω) < η0 synchrony soln. dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds 1 Convergence theory (as N → ∞) 2 Bifurcation analysis 3 Analytical and numerical computations Yin et al., IEEE TAC, To appear; [17] Yin et al., ACC2010 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 19 / 38

46. Analysis of phase transition Numerics Bifurcation diagram Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γγ Incoherence R > Rc(γ) Synchrony Incoherence R−1/2 η(ω) 0. 1 0.15 0. 2 0.25 0. 3 0.35 0. 1 0.15 0. 2 0.25 ω = 0.95 ω = 1 ω = 1.05 R > Rc η(ω) = η0 R < Rc η(ω) < η0 dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds 1 Convergence theory (as N → ∞) 2 Bifurcation analysis 3 Analytical and numerical computations Yin et al., IEEE TAC, To appear; [17] Yin et al., ACC2010 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 19 / 38

47. Part II Mean-ﬁeld Estimation

48. Collaborators Tao Yang Sean P. Meyn “A Control-oriented Approach for Particle Filtering,” ACC 2011 “Feedback Particle Filter with Mean-ﬁeld Coupling,” submitted to CDC 2011 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 21 / 38

49. Problem Definition Filtering problem Signal, observation processes: dθ(t) = ω dt +σB dB(t) mod 2π (1) dZ(t) = h(θ(t))dt +σW dW(t) (2) - Particle filter: dθi(t) = ω dt +σB dBi(t)+Ui(t) mod 2π, i = 1,...,N. (3) Objective: To choose control Ui(t) for the purpose of filtering. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 22 / 38

52. Problem Definition Filtering problem Signal, observation processes: dX(t) = a(X(t))dt +σB dB(t) (1) dZ(t) = h(X(t))dt +σW dW(t) (2) X(t) ∈ R – state, Z(t) ∈ R – observation a(·),h(·) – C1 functions B(t),W(t) –ind. standard Wiener processes, σB ≥ 0,σW > 0 X(0) ∼ p0(·) Define Z t := σ{Z(s),0 ≤ s ≤ t} Objective: estimate the posterior distribution p∗ of X(t) given Z t . Solution approaches: Linear dynamics: Kalman filter (R. E. Kalman, 1960) Nonlinear dynamics: Kushner’s equation (H. J. Kushner, 1964) Numerical Methods: Particle filtering (N. J. Gordon et. al, 1993) P. G. Mehta (Illinois) Harvard Apr. 1, 2011 23 / 38

55. Existing Approach Kalman Filter Linear case: Kalman Filter Linear Gaussian system: dX(t) = aX(t)dt +σB dB(t) (1) dZ(t) = hX(t)dt +σW dW(t) (2) Kalman ﬁlter: p∗ = N(µ(t),Σ(t)) dµ(t) = aµ(t)dt + hΣ(t) σ2 W (dZ(t)−hµ(t)dt) innovation error dΣ(t) dt = 2aΣ(t)+σ2 B − h2Σ2(t) σ2 W [9] R. E. Kalman, Trans. ASME, Ser. D: J. Basic Eng.,1961 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 24 / 38

56. Existing Approach Kalman Filter Linear case: Kalman Filter Linear Gaussian system: dX(t) = aX(t)dt +σB dB(t) (1) dZ(t) = hX(t)dt +σW dW(t) (2) Kalman ﬁlter: p∗ = N(µ(t),Σ(t)) dµ(t) = aµ(t)dt + hΣ(t) σ2 W (dZ(t)−hµ(t)dt) innovation error dΣ(t) dt = 2aΣ(t)+σ2 B − h2Σ2(t) σ2 W [9] R. E. Kalman, Trans. ASME, Ser. D: J. Basic Eng.,1961 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 24 / 38

57. Existing Approach Nonlinear case: Kushner’s equation Nonlinear case: K-S equation Signal, observation processes: dX(t) = a(X(t))dt +σB dB(t), (1) dZ(t) = h(X(t))dt +σW dW(t) (2) B(t),W(t) are independent standard Wiener processes. Kushner’s equation: the posterior distribution p∗ is a solution of a stochastic PDE: dp∗ = L † (p∗ )dt +(h− ˆh)(σ2 W )−1 (dZ(t)− ˆhdt)p∗ where ˆh = E[h(X(t))|Z t ] = h(x)p∗ (x,t)dx L † (p∗ ) = − ∂(p∗ ·a(x)) ∂x + 1 2 σ2 B ∂2 p∗ ∂x2 No closed-form solution in general. Closure problem. [11] H. J. Kushner, SIAM J. Control, 1964 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 25 / 38

60. Existing Approach Nonlinear case: Kushner’s equation Numerical methods: Particle ﬁlter Idea: Approximate posterior p∗ in terms of particles {Xi(t)}N i=1 so p∗ ≈ 1 N N ∑ i=1 δXi(t)(·) Algorithm outline Initialization Xi(0) ∼ p∗ 0(·). At each time step: Importance sampling (for weight update) Resampling (for variance reduction) [5], N. J. Gordon, D. J. Salmond, and A. F. M. Smith, IEEE Proceedings on Radar and Signal Processing, 1993 [4], A. Doucet, N. Freitas, N. Gordon, Springer-Verlag, 2001 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 26 / 38

61. Existing Approach Nonlinear case: Kushner’s equation Numerical methods: Particle ﬁlter Idea: Approximate posterior p∗ in terms of particles {Xi(t)}N i=1 so p∗ ≈ 1 N N ∑ i=1 δXi(t)(·) Algorithm outline Initialization Xi(0) ∼ p∗ 0(·). At each time step: Importance sampling (for weight update) Resampling (for variance reduction) [5], N. J. Gordon, D. J. Salmond, and A. F. M. Smith, IEEE Proceedings on Radar and Signal Processing, 1993 [4], A. Doucet, N. Freitas, N. Gordon, Springer-Verlag, 2001 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 26 / 38

62. A Control-Oriented Approach to Particle Filtering Overview Mean-field filtering Signal, observation processes: dX(t) = a(X(t))dt +σB dB(t) (1) dZ(t) = h(X(t))dt +σW dW(t) (2) Controlled system (N particles): dXi(t) = a(Xi(t))dt +σB dBi(t)+ Ui(t) mean field control , i = 1,...,N (3) {Bi(t)}N i=1 are ind. standard Wiener Processes. Objective: Choose control Ui(t) such that the empirical distribution of the population approximates the posterior distribution p∗ as N → ∞. [7] M. Huang, P. E. Caines, and R. P. Malhame, IEEE TAC, 2007 [18] H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, IEEE TAC, To be appeared [12] J. Lasry and P. Lions, Japanese Journal of Mathematics, 2007 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 27 / 38

65. A Control-Oriented Approach to Particle Filtering Example: Linear case Example: Linear case Controlled system: for i = 1,...N: dXi(t) = aXi(t)dt +σB dBi(t)+ hΣ(t) σ2 W [dZ(t)−h Xi(t)+ µ(t) 2 dt] Ui(t) (3) where µ(t) ≈ ¯µ(N) (t) = 1 N N ∑ i=1 Xi(t) ←− sample mean Σ(t) ≈ ¯Σ(N) (t) = 1 N −1 N ∑ i=1 (Xi(t)− µ(t))2 ←− sample variance P. G. Mehta (Illinois) Harvard Apr. 1, 2011 28 / 38

66. A Control-Oriented Approach to Particle Filtering Example: Linear case Example: Linear case Feedback particle filter: dXi(t) = aXi(t)dt +σB dBi(t)+ h¯Σ(N)(t) σ2 W [dZ(t)−h Xi(t)+ ¯µ(N)(t) 2 dt] (3) Mean-field model: Let p denote cond. dist. of Xi(t) given Z t with p(x,0) = N(µ(0),Σ(0)). Then p = N(µ(t),Σ(t)) where dµ(t) = aµ(t)dt + hΣ(t) σ2 W (dZ(t)−hµ(t)) dΣ(t) dt = 2aΣ(t)+σ2 B − h2Σ2(t) σ2 W Same as Kalman filter! As N → ∞, the empirical distribution approximates the posterior p∗ P. G. Mehta (Illinois) Harvard Apr. 1, 2011 29 / 38

67. A Control-Oriented Approach to Particle Filtering Example: Linear case Comparison plots mse = 1 T T 0 Σ (N) t −Σt Σt 2 dt 10 1 10 2 10 3 10 −3 10 −2 10 −1 10 0 N Error FPF −0.5 0 0.5 BPF −0.5 0 Bootstrap (BPF) Feedback (FPF) a ∈ [−1,1],h = 3,σB = 1,σW = 0.5, dt = 0.01,T = 50 and N ∈ [20,50,100,200,500,1000]. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 30 / 38

68. Methodology Methodology: Variational problem Continuous-discrete time problem. ....... Signal, observ processes: dX(t) = a(X(t))dt +σB dB(t) Z(tn) = h(X(tn))+W(tn) Particle ﬁlter Filter: dXi(t) = a(Xi(t)dt +σB dBi(t) Control: Xi(tn) = Xi(t− n )+u(Xi(t− n )) control Belief map. p∗ n(x): cond. pdf of X(t)|Z t Bayes rule pn(x): cond. pdf of Xi(t)|Z t Variational problem: K-L Metric: KL(pn p∗ n) = Epn [ln pn p∗ n ] Control design: u∗ (x) = arg minuKL(pn p∗ n). P. G. Mehta (Illinois) Harvard Apr. 1, 2011 31 / 38

71. Feedback particle ﬁlter Feedback particle ﬁlter(FPF) Problem: dX(t) = a(X(t))dt +σB dB(t), (1) dZ(t) = h(X(t))dt +σW dW(t) (2) FPF: dXi(t) = a(Xi(t))dt +σB dBi(t) +v(Xi(t))dIi(t)+ 1 2 σ2 W v(Xi(t))v (Xi(t))dt (3) Innovation error dIi(t)=:dZ(t)− 1 2 (h(Xi(t))+ ˆh)dt Population prediction ˆh = 1 N N ∑ i=1 h(Xi(t)). Euler-Lagrange equation: − ∂ ∂x 1 p(x,t) ∂ ∂x {p(x,t)v(x,t)} = 1 σ2 W h (x), (4) with boundary condition lim x→±∞ v(x,t)p(x,t) = 0. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 32 / 38

74. Consistency result Consistency result p∗ denotes cond. pdf of X(t) given Z t . dp∗ = L † (p∗ )dt +(h− ˆh)(σ2 W )−1 (dZ(t)− ˆhdt)p∗ p denotes cond. pdf of Xi(t) given Z t dp = L † pdt − ∂ ∂x (vp) dZ(t)− ∂ ∂x (up) dt + σ2 W 2 ∂2 ∂x2 pv2 dt Theorem Consider the two evolution equations for p and p∗ , deﬁned according to the solution of the Kolmogorov forward equation and the K-S equation, respectively. Suppose that the control function v(x,t) is obtained according to (4). Then, provided p(x,0) = p∗ (x,0), we have for all t ≥ 0, p(x,t) = p∗ (x,t) P. G. Mehta (Illinois) Harvard Apr. 1, 2011 33 / 38

75. Linear Gaussian case Linear Gaussian case Linear system: dX(t) = aX(t)dt +σB dB(t) (1) dZ(t) = hX(t)dt +σW dW(t) (2) p(x,t) = 1 2πΣ(t) exp − (x− µ(t))2 2Σ(t) E-L equation: − ∂ ∂x 1 p ∂ ∂x {pv} = h σ2 W ⇒ v(x,t) = hΣ(t) σ2 W FPF: dXi(t) = aXi(t)dt +σB dBi(t)+ hΣ(t) σ2 W [dZ(t)−h Xi(t)+ µ(t) 2 dt] (3) P. G. Mehta (Illinois) Harvard Apr. 1, 2011 34 / 38

79. Example Example: Filtering of nonlinear oscillator Nonlinear oscillator: dθ(t) = ω dt +σB dB(t) mod 2π (1) dZ(t) = 1+cosθ(t) 2 h(θ(t)) dt +σW dW(t) (2) - Controlled system (N particles): dθi(t) = ω dt +σB dBi(t)+v(θi(t))[dZ(t)− 1 2 (h(θi(t))+ ˆh)dt] + 1 2 σ2 W vv dt mod 2π, i = 1,...,N. (3) where v(θi(t)) is obtained via the solution of the E-L equation: − ∂ ∂θ 1 p(θ,t) ∂ ∂θ {p(θ,t)v(θ,t)} = − sinθ σ2 W (4) P. G. Mehta (Illinois) Harvard Apr. 1, 2011 35 / 38

80. Example Example: Filtering of nonlinear oscillator Nonlinear oscillator: dθ(t) = ω dt +σB dB(t) mod 2π (1) dZ(t) = 1+cosθ(t) 2 h(θ(t)) dt +σW dW(t) (2) - Controlled system (N particles): dθi(t) = ω dt +σB dBi(t)+v(θi(t))[dZ(t)− 1 2 (h(θi(t))+ ˆh)dt] + 1 2 σ2 W vv dt mod 2π, i = 1,...,N. (3) where v(θi(t)) is obtained via the solution of the E-L equation: − ∂ ∂θ 1 p(θ,t) ∂ ∂θ {p(θ,t)v(θ,t)} = − sinθ σ2 W (4) P. G. Mehta (Illinois) Harvard Apr. 1, 2011 35 / 38

81. Example Example: Filtering of nonlinear oscillator Fourier form of p(θ,t): p(θ,t) = 1 2π +Ps(t)sinθ +Pc(t)cosθ. Approx. solution of E-L equation (4) (using a perturbation method): v(θ,t) = 1 2σ2 W −sinθ + π 2 [Pc(t)sin2θ −Ps(t)cos2θ] , v (θ,t) = 1 2σ2 W {−cosθ +π[Pc(t)cos2θ +Ps(t)sin2θ]} where Pc(t) ≈ ¯P (N) c (t) = 1 πN N ∑ j=1 cosθj(t), Ps(t) ≈ ¯P (N) s (t) = 1 πN N ∑ j=1 sinθj(t). Reminiscent of coupled oscillators (Winfree) P. G. Mehta (Illinois) Harvard Apr. 1, 2011 36 / 38

84. Example Simulation Results Signal, observation processes: dθ(t) = 1dt +0.5dB(t) (1) dZ(t) = 1+cosθ(t) 2 dt +0.4dW(t) (2) Particle system (N = 100): dθi(t) = 1dt +σB dBi(t)+U(θi(t); ¯P (N) c (t), ¯P (N) s (t)) mod 2π (3) P. G. Mehta (Illinois) Harvard Apr. 1, 2011 37 / 38

85. Example Simulation Results 0 2 4 0 2 4 0.01 0.02 0.03 0.04 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 t t x π 2π0 π 2π (a) (b) (c)First harmonic PDF estimate Second harmonic Bounds State Conditional mean est. θ(t) ¯θN (t) P. G. Mehta (Illinois) Harvard Apr. 1, 2011 38 / 38

86. Thank you! Website: http://www.mechse.illinois.edu/research/mehtapg Collaborators Huibing Yin Tao Yang Sean Meyn Uday Shanbhag “Synchronization of coupled oscillators is a game,” IEEE TAC, ACC 2010 “Learning in Mean-Field Oscillator Games,” CDC 2010 “On the Efﬁciency of Equilibria in Mean-ﬁeld Oscillator Games,” ACC 2011 “A Control-oriented Approach for Particle Filtering,” ACC2011, CDC2011

87. Part III Learning in mean-ﬁeld game

88. Learning Q-function approximation Approx. DP Optimality equation min ui {c(θ;θ−i(t))+ 1 2 Ru2 i +Dui hi(θ,t) =: Hi(θ,ui;θ−i(t)) } = η∗ i Optimal control law Kuramoto law u∗ i = − 1 R ∂θ hi(θ,t) u (Kur) i = − κ N ∑ j=i sin(θi −θj(t)) Parameterization: H (Ai,φi) i (θ,ui;θ−i(t)) = c(θ;θ−i(t))+ 1 2 Ru2 i +(ωi −1+ui)AiS(φi) + σ2 2 AiC(φi) where S(φ) (θ,θ−i) = 1 N ∑ j=i sin(θ −θj −φ), C(φ) (θ,θ−i) = 1 N ∑ j=i cos(θ −θj −φ) Approx. optimal control: u (Ai,φi) i = arg min ui {H (Ai,φi) i (θ,ui;θ−i(t))} = − Ai R S(φi) (θ,θ−i) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC 2009 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 41 / 38

92. Learning Stochastic approx. Learning algorithm Bellman error: Pointwise: L (Ai,φi) (θ,t) = min ui {H (Ai,φi) i }−η (A∗ i ,φ∗ i ) i Stochastic approx. algorithm ˜e(Ai,φi) = 2 ∑ k=1 | L (Ai,φi) , ˜ϕk(θ) |2 dAi dt = −ε d˜e(Ai,φi) dAi , dφi dt = −ε d˜e(Ai,φi) dφi Convergence analysis: Ai(t) → A∗ , φi(t) → φ∗ 100 200 300 400 500 −1 0 1 2 3 10 15 20 25 −1 0 1 2 3 Synchrony Incoherence Yin et al., CDC 2010 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 42 / 38

93. Learning Stochastic approx. Learning algorithm Bellman error: Pointwise: L (Ai,φi) (θ,t) = min ui {H (Ai,φi) i }−η (A∗ i ,φ∗ i ) i Stochastic approx. algorithm ˜e(Ai,φi) = 2 ∑ k=1 | L (Ai,φi) , ˜ϕk(θ) |2 dAi dt = −ε d˜e(Ai,φi) dAi , dφi dt = −ε d˜e(Ai,φi) dφi Convergence analysis: Ai(t) → A∗ , φi(t) → φ∗ 100 200 300 400 500 −1 0 1 2 3 10 15 20 25 −1 0 1 2 3 Synchrony Incoherence Yin et al., CDC 2010 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 42 / 38

94. Learning Bifurcation diagram Comparison of average cost dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds P. G. Mehta (Illinois) Harvard Apr. 1, 2011 43 / 38

95. Learning Bifurcation diagram Comparison of average cost dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds Learning P. G. Mehta (Illinois) Harvard Apr. 1, 2011 43 / 38

96. Learning Bifurcation diagram Comparison of average cost dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds Learning Mean-field P. G. Mehta (Illinois) Harvard Apr. 1, 2011 43 / 38

97. Part IV Apendix

98. Analysis of phase transition Incoherence solution Overview of the steps HJB: ∂th+ω∂θ h = 1 2R (∂θ h)2 − ¯c(θ,t) +η∗ − σ2 2 ∂2 θθ h ⇒ h(θ,t,ω) FPK: ∂t p+ω∂θ p = 1 R ∂θ [p( ∂θ h )]+ σ2 2 ∂2 θθ p ⇒ p(θ,t,ω) ¯c(ϑ,t) = Ω 2π 0 c• (ϑ,θ) p(θ,t,ω) g(ω)dθ dω Assume c• (ϑ,θ) = c• (ϑ −θ) = 1 2 sin2 ϑ −θ 2 Incoherence solution h(θ,t,ω) = h0(θ) := 0 p(θ,t,ω) = p0(θ) := 1 2π P. G. Mehta (Illinois) Harvard Apr. 1, 2011 45 / 38

99. Analysis of phase transition Bifurcation analysis Linearization and spectra Linearized PDE (about incoherence solution) ∂ ∂t ˜z(θ,t,ω) = −ω∂θ ˜h− ¯c− σ2 2 ∂2 θθ ˜h −ω∂θ ˜p+ 1 2πR∂2 θθ ˜h+ σ2 2 ∂2 θθ ˜p =: LR ˜z(θ,t,ω) Spectrum of the linear operator 1 Continuous spectrum {S(k) }+∞ k=−∞ S(k) := λ ∈ C λ = ± σ2 2 k2 −kωi for all ω ∈ Ω 2 Discrete spectrum Characteristic eqn: 1 8R Ω g(ω) (λ − σ2 2 +ωi)(λ + σ2 2 +ωi) dω +1 = 0. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 46 / 38

100. Analysis of phase transition Bifurcation analysis Linearization and spectra Linearized PDE (about incoherence solution) ∂ ∂t ˜z(θ,t,ω) = −ω∂θ ˜h− ¯c− σ2 2 ∂2 θθ ˜h −ω∂θ ˜p+ 1 2πR∂2 θθ ˜h+ σ2 2 ∂2 θθ ˜p =: LR ˜z(θ,t,ω) Spectrum of the linear operator 1 Continuous spectrum {S(k) }+∞ k=−∞ S(k) := λ ∈ C λ = ± σ2 2 k2 −kωi for all ω ∈ Ω 2 Discrete spectrum Characteristic eqn: 1 8R Ω g(ω) (λ − σ2 2 +ωi)(λ + σ2 2 +ωi) dω +1 = 0. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 46 / 38

101. Analysis of phase transition Bifurcation analysis Linearization and spectra Linearized PDE (about incoherence solution) ∂ ∂t ˜z(θ,t,ω) = −ω∂θ ˜h− ¯c− σ2 2 ∂2 θθ ˜h −ω∂θ ˜p+ 1 2πR∂2 θθ ˜h+ σ2 2 ∂2 θθ ˜p =: LR ˜z(θ,t,ω) Spectrum of the linear operator 1 Continuous spectrum {S(k) }+∞ k=−∞ S(k) := λ ∈ C λ = ± σ2 2 k2 −kωi for all ω ∈ Ω −0.2 −0.1 0 0.1 0.2 0.3 −3 −2 −1 0 1 2 3 real imag γ = 0.1 R decreases k=2 k=2 k=1 k=1 2 Discrete spectrum Characteristic eqn: 1 8R Ω g(ω) (λ − σ2 2 +ωi)(λ + σ2 2 +ωi) dω +1 = 0. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 46 / 38

102. Analysis of phase transition Bifurcation analysis Linearization and spectra Linearized PDE (about incoherence solution) ∂ ∂t ˜z(θ,t,ω) = −ω∂θ ˜h− ¯c− σ2 2 ∂2 θθ ˜h −ω∂θ ˜p+ 1 2πR∂2 θθ ˜h+ σ2 2 ∂2 θθ ˜p =: LR ˜z(θ,t,ω) Spectrum of the linear operator 1 Continuous spectrum {S(k) }+∞ k=−∞ S(k) := λ ∈ C λ = ± σ2 2 k2 −kωi for all ω ∈ Ω −0.2 −0.1 0 0.1 0.2 0.3 −3 −2 −1 0 1 2 3 real imag γ = 0.1 R decreases k=2 k=2 k=1 k=1 2 Discrete spectrum Characteristic eqn: 1 8R Ω g(ω) (λ − σ2 2 +ωi)(λ + σ2 2 +ωi) dω +1 = 0. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 46 / 38

103. Analysis of phase transition Bifurcation analysis Bifurcation diagram (Hamiltonian Hopf) Characteristic eqn: 1 8R Ω g(ω) (λ − σ2 2 +ωi)(λ + σ2 2 +ωi) dω +1 = 0. Stability proof [3] Dellnitz et al., Int. Series Num. Math., 1992 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 47 / 38

104. Analysis of phase transition Bifurcation analysis Bifurcation diagram (Hamiltonian Hopf) Characteristic eqn: 1 8R Ω g(ω) (λ − σ2 2 +ωi)(λ + σ2 2 +ωi) dω +1 = 0. Stability proof −0.2 −0.1 0 0.1 0.2 -0.6 -0.8 -1 -1.2 -1.4 real imag (a) Cont.spectrum;ind.ofR Disc.spectrum;fn.ofR Bifurcation point [3] Dellnitz et al., Int. Series Num. Math., 1992 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 47 / 38

105. Analysis of phase transition Bifurcation analysis Bifurcation diagram (Hamiltonian Hopf) Characteristic eqn: 1 8R Ω g(ω) (λ − σ2 2 +ωi)(λ + σ2 2 +ωi) dω +1 = 0. Stability proof −0.2 −0.1 0 0.1 0.2 -0.6 -0.8 -1 -1.2 -1.4 real imag (a) Cont.spectrum;ind.ofR Disc.spectrum;fn.ofR Bifurcation point 0 0.05 0.1 0.15 0.2 15 20 25 30 35 40 45 50 Incoherence R > R R c(γ γ ) (c)) Synchrony 0.05 [3] Dellnitz et al., Int. Series Num. Math., 1992 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 47 / 38

106. Analysis of phase transition Numerics Numerical solution of PDEs Incoherence; R = 60 incoherence P. G. Mehta (Illinois) Harvard Apr. 1, 2011 48 / 38

107. Analysis of phase transition Numerics Numerical solution of PDEs Incoherence; R = 60 incoherence Synchrony; R = 10 synchrony synchrony P. G. Mehta (Illinois) Harvard Apr. 1, 2011 48 / 38

108. Analysis of phase transition Numerics Bifurcation diagram Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γγ Incoherence R > Rc(γ) Synchrony Incoherence R−1/2 η(ω) 0. 1 0.15 0. 2 0.25 0. 3 0.35 0. 1 0.15 0. 2 0.25 ω = 0.95 ω = 1 ω = 1.05 R > Rc η(ω) = η0 R < Rc η(ω) < η0 dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds P. G. Mehta (Illinois) Harvard Apr. 1, 2011 49 / 38

109. Analysis of phase transition Numerics Bifurcation diagram Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γγ Incoherence R > Rc(γ) Synchrony Incoherence R−1/2 η(ω) 0. 1 0.15 0. 2 0.25 0. 3 0.35 0. 1 0.15 0. 2 0.25 ω = 0.95 ω = 1 ω = 1.05 R > Rc η(ω) = η0 R < Rc η(ω) < η0 incoherence soln. dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds P. G. Mehta (Illinois) Harvard Apr. 1, 2011 49 / 38

110. Analysis of phase transition Numerics Bifurcation diagram Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γγ Incoherence R > Rc(γ) Synchrony Incoherence R−1/2 η(ω) 0. 1 0.15 0. 2 0.25 0. 3 0.35 0. 1 0.15 0. 2 0.25 ω = 0.95 ω = 1 ω = 1.05 R > Rc η(ω) = η0 R < Rc η(ω) < η0 synchrony soln. dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds P. G. Mehta (Illinois) Harvard Apr. 1, 2011 49 / 38

111. Analysis of phase transition Numerics Bifurcation diagram with another cost c• (θ,ϑ) = 0.25(1−cos(θ −ϑ)) 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 R−1/2 0 1 2 3 4 5 6 7 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 θ p R = 23.8359 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 50 / 38

112. Analysis of phase transition Numerics Bifurcation diagram with another cost c• (θ,ϑ) = 0.25(1−cos(θ −ϑ)) 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 R−1/2 0 1 2 3 4 5 6 7 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 θ p R = 23.8359 0 1 2 3 4 5 6 7 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 θ p R = 23.8359 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 50 / 38

113. Analysis of phase transition Numerics Bifurcation diagram with another cost c• (θ,ϑ) = 0.25(1−cos(θ −ϑ)) 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 R−1/2 0 1 2 3 4 5 6 7 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 θ p R = 23.8359 c• (θ,ϑ) = 1−cos(θ −ϑ)−cos3(θ −ϑ) 0 0.5 1 1.5 0 0.2 0.4 0.6 0.8 1 R−1/2 0 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 P. G. Mehta (Illinois) Harvard Apr. 1, 2011 51 / 38

114. Analysis of phase transition Numerics Bifurcation diagram with another cost c• (θ,ϑ) = 0.25(1−cos(θ −ϑ)) 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 R−1/2 0 1 2 3 4 5 6 7 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 θ p R = 23.8359 c• (θ,ϑ) = 1−cos(θ −ϑ)−cos3(θ −ϑ) 0 1 2 3 4 5 6 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 θ p P. G. Mehta (Illinois) Harvard Apr. 1, 2011 51 / 38

115. Feedback particle ﬁlter Numerical method: particle ﬁlter Basic algorithm (discretized time): Step 0: Initialization. For i = 1,...,N, sample x (i) 0 ∼ p0|0(x) and set n = 0. Step 1: Importance Sampling step. For i = 1,...,N, sample ˜x (i) n ∼ pN n−1|n−1K(xn) = (1/N) N ∑ k=1 K(xn|x (k) n−1). For i = 1,...,N, evaluate the normalized importance weight w (i) n w (i) n ∝ g zn|˜x (i) n ; N ∑ i=1 w (i) n = 1 pN n|n(x) = N ∑ i=1 w (i) n δ (i) ˜xn (x) Step 2: Resampling For i = 1,...,N, sample x (i) n ∼ pN n|n(x). P. G. Mehta (Illinois) Harvard Apr. 1, 2011 52 / 38

116. Feedback particle ﬁlter Calculation of KL divergence Recall the deﬁnition of K-L divergence for densities, KL(pn ˆp∗ n) = R pn(s)ln pn(s) ˆp∗ n(s) ds. We make a co-ordinate transformation s = x+u(x) and express the K-L divergence as: KL(pn ˆp∗ n) = R p− n (x) |1+u (x)| ln( p− n (x) |1+u (x)| ˆp∗ n(x+u(x)|zn) )|1+u (x)|dx P. G. Mehta (Illinois) Harvard Apr. 1, 2011 53 / 38

117. Feedback particle ﬁlter Derivation of Euler-Lagrange equation Denote: L (x,u,u ) = −p− n (x) ln|1+u (x)| +ln(p− n (x+u)pZ|X (zn|x+u)) . (4) The optimization problem thus is a calculus of variation problem: min u L (x,u,u )dx. The minimizer is obtained via the analysis of ﬁrst variation given by the well-known Euler-Lagrange equation: ∂L ∂u = d dx ∂L ∂u , Explicitly substituting the expression (4) for L , we obtain the expecting result. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 54 / 38

118. Feedback particle ﬁlter Simulation Results When particle ﬁlter does not work well: Bounds 0 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 t t First harmonic PDF estimate Second harmonic x π 2π0 π 2π 0 2 4 0 2 4 0 0.01 0.02 0.03 0.04 (a) (b) (c) State Conditional mean est. θ(t) ¯θN (t) P. G. Mehta (Illinois) Harvard Apr. 1, 2011 55 / 38

119. Feedback particle filter Derivation of FPK for FPF Controlled particle system: Controlled system (N particles): dXi(t) = a(Xi(t))dt + Ui(t) mean field control +σB dBi(t), i = 1,...,N (3) Using ansatz: Ui(t) = u(Xi(t))dt +v(Xi(t))dZ(t). Particle system becomes: dXi(t) = (a(Xi(t))+u(Xi(t))) dt +σB dBi(t)+v(Xi(t))dZ(t) i = 1,...,N (3) Then the cond. distribution of Xi(t) given the filteration Z t , p(x,t), satisfies the forward equation: dp = L † pdt − ∂ ∂x (vp) dZt − ∂ ∂x (up) dt +σ2 W 1 2 ∂2 ∂x2 pv2 dt, Note after going through some short calculation, we have u(x,t) = v(x,t) − 1 2 (h(x)+ ˆh)+ 1 2 σ2 W v (x,t) , (4) P. G. Mehta (Illinois) Harvard Apr. 1, 2011 56 / 38

120. Feedback particle ﬁlter Derivation of FPF for linear case The density p is a function of the stochastic process µt. Using Itô’s formula, dp(x,t) = ∂ p ∂µ dµt + ∂ p ∂Σ dΣt + 1 2 ∂2 p ∂µ2 dµ2 t , where ∂ p ∂µ = x− µt Σt p, ∂ p ∂Σ = 1 2Σt (x− µt)2 Σt −1 p, and ∂2 p ∂µ2 = 1 Σt (x− µt)2 Σt −1 p. Substituting these into the forward equation, we obtain a quadratic equation Ax2 +Bx = 0, where A = dΣt − 2aΣt +σ2 B − h2Σ2 t σ2 W dt, B = dµt − aµt dt + hΣt σ2 W (dZt −hµt dt) . This leads to the Kalman ﬁlter. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 57 / 38

121. Feedback particle ﬁlter Derivation of E-L equation In the continuous-time case, the control is of the form: Ui t = u(Xi t ,t)dt +v(Xi t ,t)dZt. (5) Substituting this in the E-L BVP for the continuous-discrete time case, we arrive at the formal equation: ∂ ∂x p(x,t) 1+u dt +v dZt =p(x,t) ∂ ∂ ˜u ln p(x+udt +vdZt,t) +ln pdZ|X (dZt | x+udt +vdZt) , (6) where pdZ|X (dZt|·) = 1 2πσ2 W exp − (dZt −h(·))2 2σ2 W and ˜u = udt +vdZt. For notational ease, we use primes to denote partial derivatives with respect to x: p is used to denote p(x,t), p := ∂ p ∂x (x,t), p := ∂2 p ∂x2 (x,t), u := ∂u ∂x (x,t), v := ∂v ∂x (x,t) etc. Note that the time t is ﬁxed. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 58 / 38

122. Feedback particle filter Derivation of E-L equation cont.1 Step 1: The three terms in (6) are simplified as: ∂ ∂x p 1+u dt +v dZt = p − f1 dt −(p v + pv )dZt p ∂ ∂ ˜u ln p(x+udt +vdZt) = p + f2 dt +(p v− p 2v p )dZt p ∂ ∂ ˜u ln pdZ|X (dZt|x+udt +vdZt) = p σ2 W [h dZt +(h v−hh )dt] where we have used Itô’s rules dZ2 t = σ2 W dt, dZt dt = 0 etc. f1 = (p u + pu )−σ2 W (p v 2 +2pv v ), f2 = (p u− p 2u p )+σ2 W v2 1 2 p − 3p p 2p + p 3 p2 . Collecting terms in O(dZt) and O(dt), after some simplification, leads to the following ODEs: P. G. Mehta (Illinois) Harvard Apr. 1, 2011 59 / 38

123. Feedback particle ﬁlter Derivation of E-L equation cont.2 E (v) = 1 σ2 W h (x) (7) E (u) = − 1 σ2 W h(x)h (x)+h (x)v+σ2 W G(x,t) (8) where E (v) = − ∂ ∂x 1 p(x,t) ∂ ∂x {p(x,t)v(x,t)} , and G = −2v v −(v )2 (ln p) + 1 2 v2 (ln p) . P. G. Mehta (Illinois) Harvard Apr. 1, 2011 60 / 38

124. Feedback particle ﬁlter Derivation of E-L equation cont.3 Step 2. Suppose (u,v) are admissible solutions of the E-L BVP (7)-(8). Then it is claimed that −(pv) = h− ˆh σ2 W p (9) −(pu) = − (h− ˆh)ˆh σ2 W p− 1 2 σ2 W (pv2 ) . (10) Recall that admissible here means lim x→±∞ p(x,t)u(x,t) = 0, lim x→±∞ p(x,t)v(x,t) = 0. (11) To show (9), integrate (7) once to obtain −(pv) = 1 σ2 W hp+Cp, P. G. Mehta (Illinois) Harvard Apr. 1, 2011 61 / 38

125. Feedback particle ﬁlter Derivation of E-L equation cont.4 where the constant of integration C = − ˆh σ2 W is obtained by integrating once again between −∞ to ∞ and using the boundary conditions for v (11). This gives (9). To show (10), we denote its right hand side as R and claim R p = − hh σ2 W +h v+σ2 W G. (12) The equation (10) then follows by using the ODE (8) together with the boundary conditions for u (11). The veriﬁcation of the claim involves a straightforward calculation, where we use (7) to obtain expressions for h and v . The details of this calculation are omitted on account of space. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 62 / 38

126. Feedback particle ﬁlter Derivation of E-L equation cont.5 Step 3. The E-L equation for v is given by (7). After going through some short calculation, we have u(x,t) = v(x,t) − 1 2 (h(x)+ ˆh)+ 1 2 σ2 W v (x,t) , (13) P. G. Mehta (Illinois) Harvard Apr. 1, 2011 63 / 38

127. Bibliography Dimitri P. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientiﬁc, Belmont, Massachusetts, 1995. Eric Brown, Jeff Moehlis, and Philip Holmes. On the phase reduction and response dynamics of neural oscillator populations. Neural Computation, 16(4):673–715, 2004. M. Dellnitz, J.E. Marsden, I. Melbourne, and J. Scheurle. Generic bifurcations of pendula. Int. Series Num. Math., 104:111–122, 1992. A. Doucet, N. de Freitas, and N. Gordon. Sequential Monte-Carlo Methods in Practice. Springer-Verlag, April 2001. N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 63 / 38

128. Bibliography IEE Proceedings F Radar and Signal Processing, 140(2):107–113, 1993. J. Guckenheimer. Isochrons and phaseless sets. J. Math. Biol., 1:259–273, 1975. M. Huang, P. E. Caines, and R. P. Malhame. Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε-Nash equilibria. 52(9):1560–1571, 2007. Minyi Huang, Peter E. Caines, and Roland P. Malhame. Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε-nash equilibria. IEEE transactions on automatic control, 52(9):1560–1571, 2007. R. E. Kalman. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 63 / 38

129. Bibliography A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1):35–45, 1960. Y. Kuramoto. International Symposium on Mathematical Problems in Theoretical Physics, volume 39 of Lecture Notes in Physics. Springer-Verlag, 1975. H. J. Kushner. On the differential equations satisfied by conditional probability densities of markov process. SIAM J. Control, 2:106–119, 1964. J. Lasry and P. Lions. Mean field games. Japanese Journal of Mathematics, 2(2):229–260, 2007. Jean-Michel Lasry and Pierre-Louis Lions. Mean field games. Japan. J. Math., 2:229–260, 2007. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 63 / 38

130. Bibliography Sean P. Meyn. The policy iteration algorithm for average reward markov decision processes with general state space. IEEE Transactions on Automatic Control, 42(12):1663–1680, December 1997. S. H. Strogatz and R. E. Mirollo. Stability of incoherence in a population of coupled oscillators. Journal of Statistical Physics, 63:613–635, May 1991. G. Y. Weintraub, L. Benkard, and B. Van Roy. Oblivious equilibrium: A mean ﬁeld approximation for large-scale dynamic games. In Advances in Neural Information Processing Systems, volume 18. MIT Press, 2006. Huibing Yin, Prashant G. Mehta, Sean P. Meyn, and Uday V. Shanbhag. Synchronization of coupled oscillators is a game. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 63 / 38

131. Bibliography In Proc. of 2010 American Control Conference, pages 1783–1790, Baltimore, MD, 2010. Mehta P. G. Meyn S. P. Yin, H. and U. V. Shanbhag. Synchronization of coupled oscillators is a game. IEEE Trans. Automat. Control. P. G. Mehta (Illinois) Harvard Apr. 1, 2011 63 / 38

Mean-field Methods in Estimation & Control

Recommended

Recommended

More Related Content

Similar to Mean-field Methods in Estimation & Control

Similar to Mean-field Methods in Estimation & Control (20)

Recently uploaded

Recently uploaded (20)

Mean-field Methods in Estimation & Control