Maryland 2010

1. CSLCOORDINATED SCIENCE LABORATORY Synchronization of coupled oscillators is a game Prashant G. Mehta1 1Coordinated Science Laboratory Department of Mechanical Science and Engineering University of Illinois at Urbana-Champaign University of Maryland, March 4, 2010 Acknowledgment: AFOSR, NSF

2. Huibing Yin Sean P. Meyn Uday V. Shanbhag H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, “Synchronization of coupled oscillators is a game,” ACC 2010 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 2 / 69

3. Millennium bridge Video of London Millennium bridge from youtube [11] S. H. Strogatz et al., Nature, 2005 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 3 / 69

4. Classical Kuramoto model dθi(t) = � ωi + κ N N ∑ j=1 sin(θj(t)−θi(t)) � dt +σ dξi(t), i = 1,...,N ωi taken from distribution g(ω) over [1−γ,1+γ] γ — measures the heterogeneity of the population κ — measures the strength of coupling [6] Y. Kuramoto (1975) P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 4 / 69

5. Classical Kuramoto model dθi(t) = � ωi + κ N N ∑ j=1 sin(θj(t)−θi(t)) � dt +σ dξi(t), i = 1,...,N ωi taken from distribution g(ω) over [1−γ,1+γ] γ — measures the heterogeneity of the population κ — measures the strength of coupling 1- 1+1 [6] Y. Kuramoto (1975) P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 4 / 69

6. Classical Kuramoto model dθi(t) = � ωi + κ N N ∑ j=1 sin(θj(t)−θi(t)) � dt +σ dξi(t), i = 1,...,N ωi taken from distribution g(ω) over [1−γ,1+γ] γ — measures the heterogeneity of the population κ — measures the strength of coupling [6] Y. Kuramoto (1975) P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 4 / 69

7. Classical Kuramoto model dθi(t) = � ωi + κ N N ∑ j=1 sin(θj(t)−θi(t)) � dt +σ dξi(t), i = 1,...,N ωi taken from distribution g(ω) over [1−γ,1+γ] γ — measures the heterogeneity of the population κ — measures the strength of coupling 0 0.1 0.2 0.1 0.15 0.2 0.25 0.3 Locking Incoherence κ κ < κc(γ) γ Synchrony Incoherence [6] Y. Kuramoto (1975) P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 4 / 69

8. Movies of incoherence and synchrony solution Incoherence Synchrony P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 5 / 69

9. Problem statement Dynamics of ith oscillator dθi = (ωi +ui(t))dt +σ dξi, i = 1,...,N, t ≥ 0 ui(t) — control 1- 1+1 ith oscillator seeks to minimize ηi(ui;u−i) = lim T→∞ 1 T � T 0 E[ c(θi;θ−i) � �� cost of anarchy + 1 2Ru2 i � �� cost of control ]ds θ−i = (θj)j�=i R — control penalty c(·) — cost function c(θi;θ−i) = 1 N ∑ j�=i c• (θi,θj), c• ≥ 0 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 6 / 69

12. 1 Motivation Why a game? Why Oscillators? 2 Problems and results Problem statement Main results 3 Derivation of model Overview Derivation steps PDE model 4 Analysis of phase transition Incoherence solution Bifurcation analysis Numerics 5 Learning Q-function approximation Steepest descent algorithm

13. Motivation Why a game? Quiz In the video you just watched, why were the individuals walking strangely? A. To show respect to the Queen. B. Anarchists in the crowd were trying to destabilize the bridge. C. They were stepping to the beat of the soundtrack "Walk Like an Egyptian." D. The individuals were trying to maintain their balance. P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 8 / 69

18. Motivation Why a game? “Rational irrationality” “—behavior that, on the individual level, is perfectly reasonable but that, when aggregated in the marketplace, produces calamity.” Examples Millennium bridge Financial market John Cassidy, “Rational Irrationality: The real reason that capitalism is so crash-prone,” The New Yorker, 2009 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 9 / 69

19. Motivation Why a game? “Rational irrationality” “—behavior that, on the individual level, is perfectly reasonable but that, when aggregated in the marketplace, produces calamity.” Examples Millennium bridge Financial market John Cassidy, “Rational Irrationality: The real reason that capitalism is so crash-prone,” The New Yorker, 2009 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 9 / 69

21. Motivation Why Oscillators? Hodgkin-Huxley type Neuron model C dV dt = −gT ·m2 ∞(V)·h·(V −ET) −gh ·r ·(V −Eh)−...... dh dt = h∞(V)−h τh(V) dr dt = r∞(V)−r τr(V) 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 −150 −100 −50 0 50 100 Voltage time Neural spike train [4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 11 / 69

22. Motivation Why Oscillators? Hodgkin-Huxley type Neuron model C dV dt = −gT ·m2 ∞(V)·h·(V −ET) −gh ·r ·(V −Eh)−...... dh dt = h∞(V)−h τh(V) dr dt = r∞(V)−r τr(V) 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 −150 −100 −50 0 50 100 Voltage time Neural spike train [4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 11 / 69

23. Motivation Why Oscillators? Hodgkin-Huxley type Neuron model C dV dt = −gT ·m2 ∞(V)·h·(V −ET) −gh ·r ·(V −Eh)−...... dh dt = h∞(V)−h τh(V) dr dt = r∞(V)−r τr(V) 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 −150 −100 −50 0 50 100 Voltage time Neural spike train −100 −50 0 50 100 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 Vh r Limit cyle r h v [4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 11 / 69

24. Motivation Why Oscillators? Hodgkin-Huxley type Neuron model C dV dt = −gT ·m2 ∞(V)·h·(V −ET) −gh ·r ·(V −Eh)−...... dh dt = h∞(V)−h τh(V) dr dt = r∞(V)−r τr(V) 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 −150 −100 −50 0 50 100 Voltage time Neural spike train −100 −50 0 50 100 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 Vh r Limit cyle r h v Normal form reduction −−−−−−−−−−−−−→ ˙θi = ωi +ui ·Φ(θi) [4] J. Guckenheimer, J. Math. Biol., 1975; [2] J. Moehlis et al., Neural Computation, 2004 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 11 / 69

26. Problems and results Problem statement Finite oscillator model Dynamics of ith oscillator dθi = (ωi +ui(t))dt +σ dξi, i = 1,...,N, t ≥ 0 ui(t) — control 1- 1+1 ith oscillator seeks to minimize ηi(ui;u−i) = lim T→∞ 1 T � T 0 E[ c(θi;θ−i) � �� cost of anarchy + 1 2Ru2 i � �� cost of control ]ds θ−i = (θj)j�=i R — control penalty c(·) — cost function c(θi;θ−i) = 1 N ∑ j�=i c• (θi,θj), c• ≥ 0 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 13 / 69

28. Problems and results Main results 1. Synchronization is a solution of game Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γ Incoherence R > Rc(γ) Synchrony Incoherence dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T � T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds 1- 1+1 Yin et al., ACC 2010 Strogatz et al., J. Stat. Phy., 1992 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 15 / 69

29. Problems and results Main results 1. Synchronization is a solution of game Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γ Incoherence R > Rc(γ) Synchrony Incoherence dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T � T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds 0 0.1 0.2 0.1 0.15 0.2 0.25 0.3 Locking Incoherence κ κ < κc(γ) γ Synchrony Incoherence dθi = � ωi + κ N N ∑ j=1 sin(θj −θi) � dt +σ dξi Yin et al., ACC 2010 Strogatz et al., J. Stat. Phy., 1992 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 15 / 69

30. Problems and results Main results 2. Kuramoto control is approximately optimal −0.2 0 0.2 0.4 0.6 ω = 1 Kuramoto Population Density Control laws 0 π 2π θ ui = − A∗ i R 1 N ∑ j�=i sin(θ −θj(t)) 0 50 100 150 200 250 300 2 2.5 3 3.5 4 4.5 5 5.5 6 t k = 0.01; R = 1000 A i A* Learning algorithm: dAi dt = −ε ... Yin et.al. CDC 2010 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 16 / 69

32. Derivation of model Overview Overview of model derivation dθi = (ωi +ui(t))dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T � T 0 E[¯c(θi,t)+ 1 2 Ru2 i ]ds Influence Influence Mass 1 Mean-ﬁeld approximation Assumption: c(θi;θ−i(t)) = 1 N ∑ j�=i c• (θi,θj) N→∞ −−−−−−→ ¯c(θ,t) 2 Optimal control of single oscillator Decentralized control structure [5] M. Huang, P. Caines, and R. Malhame, IEEE TAC, 2007 [HCM] P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 18 / 69

34. Derivation of model Derivation steps Single oscillator with given cost Dynamics of the oscillator dθi = (ωi +ui(t))dt +σ dξi, t ≥ 0 The cost function is assumed known ηi(ui; ¯c) = lim T→∞ 1 T � T 0 E[ c(θi;θ−i) + 1 2Ru2 i (s)]ds ⇑ ¯c(θi(s),s) HJB equation: ∂thi +ωi∂θ hi = 1 2R (∂θ hi)2 − ¯c(θ,t)+η∗ i − σ2 2 ∂2 θθ hi Optimal control law: u∗ i (t) = ϕi(θ,t) = − 1 R ∂θ hi(θ,t) [1] D. P. Bertsekas (1995); [9] S. P. Meyn, IEEE TAC, 1997 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 20 / 69

35. Derivation of model Derivation steps Single oscillator with optimal control Dynamics of the oscillator dθi(t) = � ωi − 1 R ∂θ hi(θi,t) � dt +σ dξi(t) Fokker-Planck equation for pdf p(θ,t,ωi) FPK: ∂tp+ωi∂θ p = 1 R ∂θ [p(∂θ h)]+ σ2 2 ∂2 θθ p [7] A. Lasota and M. C. Mackey, “Chaos, Fractals and Noise,” Springer 1994 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 21 / 69

36. Derivation of model Derivation steps Mean-ﬁeld Approximation HJB equation for population ∂th+ω∂θ h = 1 2R (∂θ h)2 − ¯c(θ,t)+η(ω)− σ2 2 ∂2 θθ h h(θ,t,ω) Population density ∂tp+ω∂θ p = 1 R ∂θ [p(∂θ h)]+ σ2 2 ∂2 θθ p p(θ,t,ω) Enforce cost consistency ¯c(θ,t) = � Ω � 2π 0 c• (θ,ϑ)p(ϑ,t,ω)g(ω)dϑ dω ≈ 1 N ∑ j�=i c• (θ,ϑ) P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 22 / 69

38. Derivation of model PDE model Summary HJB: ∂th+ω∂θ h = 1 2R (∂θ h)2 − ¯c(θ,t) +η∗ − σ2 2 ∂2 θθ h ⇒ h(θ,t,ω) FPK: ∂tp+ω∂θ p = 1 R ∂θ [p( ∂θ h )]+ σ2 2 ∂2 θθ p ⇒ p(θ,t,ω) Mean-ﬁeld approx.: ¯c(ϑ,t) = � Ω � 2π 0 c• (ϑ,θ) p(θ,t,ω) g(ω)dθ dω 1 Bellman’s optimality principle (H,J,B) 2 Propagation of chaos (F,P,K, Mckean, Vlasov,. . . ) 3 Mean-ﬁeld approximation (Boltzmann, Kac,. . . ) 4 Connection to Nash game (Weintraub, HCM, Altman,. . . ) P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 24 / 69

43. Derivation of model PDE model 1. Solution of PDE gives ε-Nash equilibrium Optimal control law uo i = − 1 R ∂θ h(θ(t),t,ω) � � ω=ωi ε-Nash property (as N → ∞) ηi(uo i ;uo −i) ≤ ηi(ui;uo −i)+O( 1 √ N ), i = 1,...,N. So, we look for solutions of PDEs. P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 25 / 69

46. Derivation of model PDE model 2. Incoherence solution (PDE) Incoherence solution h(θ,t,ω) = h0(θ) := 0 p(θ,t,ω) = p0(θ) := 1 2π incoherence h(θ,t,ω) = 0 ⇒ ∂th+ω∂θ h = 1 2R (∂θ h)2 − ¯c(θ,t)+η∗ − σ2 2 ∂2 θθ h ∂tp+ω∂θ p = 1 R ∂θ [p(∂θ h)]+ σ2 2 ∂2 θθ p ¯c(θ,t) = � Ω � 2π 0 c• (θ,ϑ)p(ϑ,t,ω)g(ω)dϑ dω P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 26 / 69

47. Derivation of model PDE model 2. Incoherence solution (PDE) Incoherence solution h(θ,t,ω) = h0(θ) := 0 p(θ,t,ω) = p0(θ) := 1 2π incoherence h(θ,t,ω) = 0 ⇒ ∂th+ω∂θ h = 1 2R (∂θ h)2 − ¯c(θ,t)+η∗ − σ2 2 ∂2 θθ h p(θ,t,ω) = 1 2π ⇒ ∂tp+ω∂θ p = 1 R ∂θ [p(∂θ h)]+ σ2 2 ∂2 θθ p ¯c(θ,t) = � Ω � 2π 0 c• (θ,ϑ)p(ϑ,t,ω)g(ω)dϑ dω P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 26 / 69

48. Derivation of model PDE model 2. Incoherence solution (PDE) Assume c• (ϑ,θ) = c• (ϑ −θ) = 1 2 sin2 � ϑ −θ 2 � Incoherence solution h(θ,t,ω) = h0(θ) := 0 p(θ,t,ω) = p0(θ) := 1 2π Optimal control u = − 1 R ∂θ h = 0 Average cost ¯c(θ,t) = � Ω � 2π 0 1 2 sin2 � θ −ϑ 2 � 1 2π g(ω)dϑ dω η∗ (ω) = ¯c(θ,t) = 1 4 =: η0 for all ω ∈ Ω incoherence soln. No cost of control P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 27 / 69

49. Derivation of model PDE model 2. Incoherence solution (Finite population) Closed-loop dynamics dθi = (ωi + ui �� =0 )dt +σ dξi(t) Average cost ηi = lim T→∞ 1 T � T 0 E[c(θi;θ−i)+ 1 2 Ru2 i � �� =0 ]dt = lim T→∞ 1 N ∑ j�=i 1 T � T 0 E[1 2 sin2 � θi(t)−θj(t) 2 � ]dt = 1 N ∑ j�=i � 2π 0 E[1 2 sin2 � θi(t)−ϑ 2 � ] 1 2π dϑ = N −1 N η0 incoherence ε-Nash property ηi(uo i ;uo −i) ≤ ηi(ui;uo −i)+O( 1 √ N ), i = 1,...,N. P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 28 / 69

50. Derivation of model PDE model 2. Incoherence solution (Finite population) Closed-loop dynamics dθi = (ωi + ui �� =0 )dt +σ dξi(t) Average cost ηi = lim T→∞ 1 T � T 0 E[c(θi;θ−i)+ 1 2 Ru2 i � �� =0 ]dt = lim T→∞ 1 N ∑ j�=i 1 T � T 0 E[1 2 sin2 � θi(t)−θj(t) 2 � ]dt = 1 N ∑ j�=i � 2π 0 E[1 2 sin2 � θi(t)−ϑ 2 � ] 1 2π dϑ = N −1 N η0 incoherence ε-Nash property ηi(uo i ;uo −i) ≤ ηi(ui;uo −i)+O( 1 √ N ), i = 1,...,N. P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 28 / 69

51. Derivation of model PDE model 3. Synchronization is a solution of game Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γ Incoherence R > Rc(γ) Synchrony Incoherence R−1/ 2 η(ω) 0. 1 0.15 0. 2 0.25 0. 3 0.35 0. 1 0.15 0. 2 0.25 ω= 0.95 ω= 1 ω= 1.05 R > Rc η(ω) = η0 R < R c η(ω) < η0 c dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T � T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds η(ω) = min ui ηi(ui;uo −i) 0 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t = 38.24 Synchrony solution of Yin et al., “Synchronization of oscillators is a game,” ACC2010 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 29 / 69

52. Derivation of model PDE model 3. Synchronization is a solution of game Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γ Incoherence R > Rc(γ) Synchrony Incoherence R−1/ 2 η(ω) 0. 1 0.15 0. 2 0.25 0. 3 0.35 0. 1 0.15 0. 2 0.25 ω= 0.95 ω= 1 ω= 1.05 R > Rc η(ω) = η0 R < R c η(ω) < η0 c incoherence soln. dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T � T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds η(ω) = min ui ηi(ui;uo −i) 0 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t = 38.24 Synchrony solution of Yin et al., “Synchronization of oscillators is a game,” ACC2010 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 29 / 69

53. Derivation of model PDE model 3. Synchronization is a solution of game Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γ Incoherence R > Rc(γ) Synchrony Incoherence R−1/ 2 η(ω) 0. 1 0.15 0. 2 0.25 0. 3 0.35 0. 1 0.15 0. 2 0.25 ω= 0.95 ω= 1 ω= 1.05 R > Rc η(ω) = η0 R < R c η(ω) < η0 c synchrony soln. dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T � T 0 E[c(θi;θ−i)+ 1 2 Ru2 i ]ds η(ω) = min ui ηi(ui;uo −i) 0 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t = 38.24 Synchrony solution of Yin et al., “Synchronization of oscillators is a game,” ACC2010 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 29 / 69

55. Analysis of phase transition Incoherence solution Overview of the steps HJB: ∂th+ω∂θ h = 1 2R (∂θ h)2 − ¯c(θ,t) +η∗ − σ2 2 ∂2 θθ h ⇒ h(θ,t,ω) FPK: ∂tp+ω∂θ p = 1 R ∂θ [p( ∂θ h )]+ σ2 2 ∂2 θθ p ⇒ p(θ,t,ω) ¯c(ϑ,t) = � Ω � 2π 0 c• (ϑ,θ) p(θ,t,ω) g(ω)dθ dω Assume c• (ϑ,θ) = c• (ϑ −θ) = 1 2 sin2 � ϑ −θ 2 � Incoherence solution h(θ,t,ω) = h0(θ) := 0 p(θ,t,ω) = p0(θ) := 1 2π P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 31 / 69

57. Analysis of phase transition Bifurcation analysis Linearization and spectra Linearized PDE (about incoherence solution) ∂ ∂t ˜z(θ,t,ω) = � −ω∂θ ˜h− ¯c− σ2 2 ∂2 θθ ˜h −ω∂θ ˜p+ 1 2πR ∂2 θθ ˜h+ σ2 2 ∂2 θθ ˜p � =: LR˜z(θ,t,ω) Spectrum of the linear operator 1 Continuous spectrum {S(k) }+∞ k=−∞ S(k) :=� λ ∈ C � �λ = ± σ2 2 k2 −kωi for all ω ∈ Ω � 2 Discrete spectrum Characteristic eqn: 1 8R � Ω g(ω) (λ − σ2 2 +ωi)(λ + σ2 2 +ωi) dω +1 = 0. P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 33 / 69

58. Analysis of phase transition Bifurcation analysis Linearization and spectra Linearized PDE (about incoherence solution) ∂ ∂t ˜z(θ,t,ω) = � −ω∂θ ˜h− ¯c− σ2 2 ∂2 θθ ˜h −ω∂θ ˜p+ 1 2πR ∂2 θθ ˜h+ σ2 2 ∂2 θθ ˜p � =: LR˜z(θ,t,ω) Spectrum of the linear operator 1 Continuous spectrum {S(k) }+∞ k=−∞ S(k) :=� λ ∈ C � �λ = ± σ2 2 k2 −kωi for all ω ∈ Ω � 2 Discrete spectrum Characteristic eqn: 1 8R � Ω g(ω) (λ − σ2 2 +ωi)(λ + σ2 2 +ωi) dω +1 = 0. P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 33 / 69

59. Analysis of phase transition Bifurcation analysis Linearization and spectra Linearized PDE (about incoherence solution) ∂ ∂t ˜z(θ,t,ω) = � −ω∂θ ˜h− ¯c− σ2 2 ∂2 θθ ˜h −ω∂θ ˜p+ 1 2πR ∂2 θθ ˜h+ σ2 2 ∂2 θθ ˜p � =: LR˜z(θ,t,ω) Spectrum of the linear operator 1 Continuous spectrum {S(k) }+∞ k=−∞ S(k) :=� λ ∈ C � �λ = ± σ2 2 k2 −kωi for all ω ∈ Ω � −0.2 −0.1 0 0.1 0.2 0.3 −3 −2 −1 0 1 2 3 real imag γ = 0.1 R decreases k=2 k=2 k=1 k=1 2 Discrete spectrum Characteristic eqn: 1 8R � Ω g(ω) (λ − σ2 2 +ωi)(λ + σ2 2 +ωi) dω +1 = 0. P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 33 / 69

60. Analysis of phase transition Bifurcation analysis Linearization and spectra Linearized PDE (about incoherence solution) ∂ ∂t ˜z(θ,t,ω) = � −ω∂θ ˜h− ¯c− σ2 2 ∂2 θθ ˜h −ω∂θ ˜p+ 1 2πR ∂2 θθ ˜h+ σ2 2 ∂2 θθ ˜p � =: LR˜z(θ,t,ω) Spectrum of the linear operator 1 Continuous spectrum {S(k) }+∞ k=−∞ S(k) :=� λ ∈ C � �λ = ± σ2 2 k2 −kωi for all ω ∈ Ω � −0.2 −0.1 0 0.1 0.2 0.3 −3 −2 −1 0 1 2 3 real imag γ = 0.1 R decreases k=2 k=2 k=1 k=1 2 Discrete spectrum Characteristic eqn: 1 8R � Ω g(ω) (λ − σ2 2 +ωi)(λ + σ2 2 +ωi) dω +1 = 0. P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 33 / 69

61. Analysis of phase transition Bifurcation analysis Bifurcation diagram (Hamiltonian Hopf) Characteristic eqn: 1 8R � Ω g(ω) (λ − σ2 2 +ωi)(λ + σ2 2 +ωi) dω +1 = 0. Stability proof [3] Dellnitz et al., Int. Series Num. Math., 1992 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 34 / 69

62. Analysis of phase transition Bifurcation analysis Bifurcation diagram (Hamiltonian Hopf) Characteristic eqn: 1 8R � Ω g(ω) (λ − σ2 2 +ωi)(λ + σ2 2 +ωi) dω +1 = 0. Stability proof −0.2 −0.1 0 0.1 0.2 -0.6 -0.8 -1 -1.2 -1.4 real imag (a) Cont.spectrum;ind.ofR Disc.spectrum;fn.ofR Bifurcation point [3] Dellnitz et al., Int. Series Num. Math., 1992 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 34 / 69

63. Analysis of phase transition Bifurcation analysis Bifurcation diagram (Hamiltonian Hopf) Characteristic eqn: 1 8R � Ω g(ω) (λ − σ2 2 +ωi)(λ + σ2 2 +ωi) dω +1 = 0. Stability proof −0.2 −0.1 0 0.1 0.2 -0.6 -0.8 -1 -1.2 -1.4 real imag (a) Cont.spectrum;ind.ofR Disc.spectrum;fn.ofR Bifurcation point 0 0.05 0.1 0.15 0.2 15 20 25 30 35 40 45 50 Incoherence R > R R c(γ γ ) (c) Synchrony 0.05 [3] Dellnitz et al., Int. Series Num. Math., 1992 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 34 / 69

65. Analysis of phase transition Numerics Numerical solution of PDEs Incoherence; R = 60 incoherence incoherence P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 36 / 69

66. Analysis of phase transition Numerics Numerical solution of PDEs Incoherence; R = 60 incoherence incoherence Synchrony; R = 10 synchrony synchrony P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 36 / 69

67. Analysis of phase transition Numerics Bifurcation diagram Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γ Incoherence R > Rc(γ) Synchrony Incoherence R−1/2 η(ω) 0. 1 0.15 0. 2 0.25 0. 3 0.35 0. 1 0.15 0. 2 0.25 ω = 0.95 ω = 1 ω = 1.05 R > Rc η(ω) = η0 R < Rc η(ω) < η0 dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T � T 0 E[c(θi;θ−i)+ 1 2Ru2 i ]ds P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 37 / 69

68. Analysis of phase transition Numerics Bifurcation diagram Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γ Incoherence R > Rc(γ) Synchrony Incoherence R−1/2 η(ω) 0. 1 0.15 0. 2 0.25 0. 3 0.35 0. 1 0.15 0. 2 0.25 ω = 0.95 ω = 1 ω = 1.05 R > Rc η(ω) = η0 R < Rc η(ω) < η0 incoherence soln. dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T � T 0 E[c(θi;θ−i)+ 1 2Ru2 i ]ds P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 37 / 69

69. Analysis of phase transition Numerics Bifurcation diagram Locking 0 0.1 0.2 0.15 0.2 0.25 R−1/ 2 γ Incoherence R > Rc(γ) Synchrony Incoherence R−1/2 η(ω) 0. 1 0.15 0. 2 0.25 0. 3 0.35 0. 1 0.15 0. 2 0.25 ω = 0.95 ω = 1 ω = 1.05 R > Rc η(ω) = η0 R < Rc η(ω) < η0 synchrony soln. dθi = (ωi +ui)dt +σ dξi ηi(ui;u−i) = lim T→∞ 1 T � T 0 E[c(θi;θ−i)+ 1 2Ru2 i ]ds P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 37 / 69

71. Learning Q-function approximation Comparison to Kuramoto law Control law u = ϕ(θ,t,ω) −0.2 0 0.2 0.4 0.6 ω = 0.95 ω = 1 ω = 1.05 Population Density Control laws 0 π 2π θ Equivalent control law in Kuramoto oscillator u (Kur) i = κ N N ∑ j=1 sin(θj(t)−θi) N→∞ ≈ κ0 sin(ϑ0 +t −θi) P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 39 / 69

72. Learning Q-function approximation Comparison to Kuramoto law Control law u = ϕ(θ,t,ω) −0.2 0 0.2 0.4 0.6 ω = 0.95 ω = 1 ω = 1.05 Kuramoto Population Density Control laws 0 π 2π θ Equivalent control law in Kuramoto oscillator u (Kur) i = κ N N ∑ j=1 sin(θj(t)−θi) N→∞ ≈ κ0 sin(ϑ0 +t −θi) P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 39 / 69

73. Learning Q-function approximation Optimality equation min ui {c(θ;θ−i(t))+ 1 2 Ru2 i +Dui hi(θ,t) � �� =: Hi(θ,ui;θ−i(t)) } = η∗ i Optimal control law Kuramoto law u∗ i = − 1 R ∂θ hi(θ,t) u (Kur) i = − κ N ∑ j�=i sin(θi −θj(t)) Parameterization: H (Ai,φi) i (θ,ui;θ−i(t)) = c(θ;θ−i(t))+ 1 2 Ru2 i +(ωi −1+ui)AiS(φi) + σ2 2 AiC(φi) where S(φ) (θ,θ−i) = 1 N ∑ j�=i sin(θ −θj −φ), C(φ) (θ,θ−i) = 1 N ∑ j�=i cos(θ −θj −φ) Approx. optimal control: u (Ai,φi) i = argmin ui {H (Ai,φi) i (θ,ui;θ−i(t))} = − Ai R S(φi) (θ,θ−i) P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 40 / 69

78. Learning Steepest descent algorithm Bellman error: Pointwise: L (Ai,φi) (θ,t) = min ui {H (Ai,φi) i }−η (A∗ i ,φ∗ i ) i Simple gradient descent algorithm ˜e(Ai,φi) = 2 ∑ k=1 |�L (Ai,φi) , ˜ϕk(θ)�|2 dAi dt = −ε d˜e(Ai,φi) dAi , dφi dt = −ε d˜e(Ai,φi) dφi (∗) Theorem (Convergence) Assume population is in synchrony. The ith oscillator updates according to (∗). Then Ai(t) → A∗ = 1 2σ2 The pointwise Bellman error L (Ai,0) (θ,t) = ε(R)cos2(θ −t) where ε(R) = 1 16Rσ4 P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 42 / 69

79. Learning Steepest descent algorithm Phase transition Suppose all oscillators use approx. optimal control law: ui = − A∗ R 1 N ∑ j�=i sin(θi −θj(t)) then the phase transition boundary is Rc(γ) = � 1 2σ4 if γ = 0 1 4σ2γ tan−1 � 2γ σ2 � if γ > 0 0 50 100 150 200 250 300 2 2.5 3 3.5 4 4.5 5 5.5 6 t k = 0.01; R = 1000 A i A * 0 0.05 0.1 0.15 0.2 15 20 25 30 35 40 45 50 γ R PDE Learning Incoherence Synchrony P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 43 / 69

80. Thank you! Website: http://www.mechse.illinois.edu/research/mehtapg Huibing Yin Sean P. Meyn Uday V. Shanbhag H. Yin, P. G. Mehta, S. P. Meyn and U. V. Shanbhag, “Synchronization of coupled oscillators is a game,” ACC 2010

81. Bibliography Dimitri P. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientiﬁc, Belmont, Massachusetts, 1995. Eric Brown, Jeff Moehlis, and Philip Holmes. On the phase reduction and response dynamics of neural oscillator populations. Neural Computation, 16(4):673–715, 2004. M. Dellnitz, J.E. Marsden, I. Melbourne, and J. Scheurle. Generic bifurcations of pendula. Int. Series Num. Math., 104:111–122, 1992. J. Guckenheimer. Isochrons and phaseless sets. J. Math. Biol., 1:259–273, 1975. Minyi Huang, Peter E. Caines, and Roland P. Malhame. P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 69 / 69

82. Bibliography Large-population cost-coupled LQG problems with nonuniform agents: Individual-mass behavior and decentralized ε-nash equilibria. IEEE transactions on automatic control, 52(9):1560–1571, 2007. Y. Kuramoto. International Symposium on Mathematical Problems in Theoretical Physics, volume 39 of Lecture Notes in Physics. Springer-Verlag, 1975. Andrzej Lasota and Michael C. Mackey. Chaos, Fractals and Noise. Springer, 1994. P. Mehta and S. Meyn. Q-learning and Pontryagin’s Minimum Principle. To appear, 48th IEEE Conference on Decision and Control, December 16-18 2009. Sean P. Meyn. P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 69 / 69

83. Bibliography The policy iteration algorithm for average reward markov decision processes with general state space. IEEE Transactions on Automatic Control, 42(12):1663–1680, December 1997. S. H. Strogatz and R. E. Mirollo. Stability of incoherence in a population of coupled oscillators. Journal of Statistical Physics, 63:613–635, May 1991. Steven H. Strogatz, Daniel M. Abrams, Bruno Eckhardt, and Edward Ott. Theoretical mechanics: Crowd synchrony on the millennium bridge. Nature, 438:43–44, 2005. P. G. Mehta (UIUC) Univ. of Maryland Mar. 4, 2010 69 / 69

Maryland 2010

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Maryland 2010