• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Control Techniques for Complex Systems
 

Control Techniques for Complex Systems

on

  • 5,137 views

The systems & control research community has developed a range of tools for understanding and controlling complex systems. Some of these techniques are model-based: Using a simple model we obtain ...

The systems & control research community has developed a range of tools for understanding and controlling complex systems. Some of these techniques are model-based: Using a simple model we obtain insight regarding the structure of effective policies for control. The talk will survey how this point of view can be applied to approach resource allocation problems, such as those that will arise in the next-generation energy grid. We also show how insight from this kind of analysis can be used to construct architectures for reinforcement learning algorithms used in a broad range of applications.

Much of the talk is a survey from a recent book by the author with a similar title,
Control Techniques for Complex Networks. Cambridge University Press, 2007.
https://netfiles.uiuc.edu/meyn/www/spm_files/CTCN/CTCN.html

Statistics

Views

Total Views
5,137
Views on SlideShare
4,078
Embed Views
1,059

Actions

Likes
3
Downloads
48
Comments
1

6 Embeds 1,059

http://www.meyn.ece.ufl.edu 591
http://www.sustain2010.com 288
http://www.slideshare.net 146
http://www.techgig.com 28
url_unknown 3
http://www.google.com 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • thank you very much
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Control Techniques for Complex Systems Control Techniques for Complex Systems Presentation Transcript

    • Control Techniques for Complex Systems Department of Electrical & Computer Engineering University of Florida Sean P. Meyn Coordinated Science Laboratory and the Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign, USA April 21, 2011 1 / 26
    • Outline Control Techniques Markov Chains FOR and Complex Networks Stochastic Stability P n (x, · ) − π f →0 sup Ex [SτC (f )] < ∞ C π(f ) < ∞1 Control Techniques ∆V (x) ≤ −f (x) + bIC (x) Sean Meyn S. P. Meyn and R. L. Tweedie2 Complex Networks3 Architectures for Adaptation & Learning4 Next Steps 2 / 26
    • Control Techniques System model d α = µ σ −Cα + . . . dt d q = 1 µ I −1 (C − . . . 2 dt d θ=q dt ???Control Techniques? 3 / 26
    • Control TechniquesTypical steps to control design Obtain simple model that captures System modelessential structure d dt α = µ σ −Cα + . . . d – An equilibrium model if the goal is regulation dt q = 1 µ I −1 (C − . . . 2 d θ=q dt ??? 4 / 26
    • Control TechniquesTypical steps to control design Obtain simple model that captures System modelessential structure d dt α = µ σ −Cα + . . . d – An equilibrium model if the goal is regulation dt q = 1 µ I −1 (C − . . . 2 d θ=q dt ??? Obtain feedback design, using dynamic programming, LQG, loop shaping, ... Design for performance and reliability Test via simulations and experiments, and refine design 4 / 26
    • Control TechniquesTypical steps to control design Obtain simple model that captures System modelessential structure d dt α = µ σ −Cα + . . . d – An equilibrium model if the goal is regulation dt q = 1 µ I −1 (C − . . . 2 d θ=q dt ??? Obtain feedback design, using dynamic programming, LQG, loop shaping, ... Design for performance and reliability Test via simulations and experiments, and refine design If these steps fail, we may have to re-engineer the system (e.g., introduce new sensors), and start over. 4 / 26
    • Control TechniquesTypical steps to control design Obtain simple model that captures System modelessential structure d dt α = µ σ −Cα + . . . d – An equilibrium model if the goal is regulation dt q = 1 µ I −1 (C − . . . 2 d θ=q dt ??? Obtain feedback design, using dynamic programming, LQG, loop shaping, ... Design for performance and reliability Test via simulations and experiments, and refine design If these steps fail, we may have to re-engineer the system (e.g., introduce new sensors), and start over. This point of view is unique to control 4 / 26
    • Control TechniquesTypical steps to scheduling Inventory model: Controlled work-release, controlled routing, uncertain demandA simplified model of a semiconductormanufacturing facilitySimilar demand-driven models can be used demand 1to model allocation of locational reservesin a power grid demand 2 5 / 26
    • Control TechniquesTypical steps to scheduling Inventory model: Controlled work-release, controlled routing, uncertain demandA simplified model of a semiconductormanufacturing facilitySimilar demand-driven models can be used demand 1to model allocation of locational reservesin a power grid demand 2 Obtain simple model – Frequently based on simple statistics to obtain a Markov model Obtain feedback design based on heuristics, or dynamic programming Performance evaluation via computation (e.g., Neuts’ matrix-geometric methods) 5 / 26
    • Control TechniquesTypical steps to scheduling Inventory model: Controlled work-release, controlled routing, uncertain demandA simplified model of a semiconductormanufacturing facility.Similar demand-driven models can be used demand 1to model allocation of locational reservesin a power grid demand 2Difficulty : A Markov model is not simple enough! Obtain simple model – Frequently based on exponential statistics to obtain a Markov model Obtain feedback design based on heuristics, or dynamic programming Performance evaluation via computation (e.g., Neut’s matrix-geometric methods)With the 16 buffers truncated to 0 ≤ x ≤ 10, 6 / 26
    • Control TechniquesTypical steps to scheduling Inventory model: Controlled work-release, controlled routing, uncertain demandA simplified model of a semiconductormanufacturing facility.Similar demand-driven models can be used demand 1to model allocation of locational reservesin a power grid demand 2Difficulty : A Markov model is not simple enough! Obtain simple model – Frequently based on exponential statistics to obtain a Markov model Obtain feedback design based on heuristics, or dynamic programming Performance evaluation via computation (e.g., Neut’s matrix-geometric methods)With the 16 buffers truncated to 0 ≤ x ≤ 10, policy synthesis reduces to a linear program of dimension 1116 ! 6 / 26
    • Control TechniquesControl-theoretic approach to scheduling d dt q = Bu + α Inventory model: Controlled work-release, controlled routing, uncertain demand q: Queue length evolves on R16 . + u: Scheduling/routing decisions — demand 1 Convex relaxation demand 2 α: Mean exogenous arrivals of work B: Captures network topology 7 / 26
    • Control TechniquesControl-theoretic approach to scheduling d dt q = Bu + α Inventory model: Controlled work-release, controlled routing, uncertain demand q: Queue length evolves on R16 . + u: Scheduling/routing decisions — demand 1 Convex relaxation demand 2 α: Mean exogenous arrivals of work B: Captures network topologyControl-theoretic approach to scheduling:Dimension reduced from a linear program of dimension 1116 ... to an HJB equation of dimension 16 7 / 26
    • Control TechniquesControl-theoretic approach to scheduling d dt q = Bu + α Inventory model: Controlled work-release, controlled routing, uncertain demand q: Queue length evolves on R16 . + u: Scheduling/routing decisions — demand 1 Convex relaxation demand 2 α: Mean exogenous arrivals of work B: Captures network topologyControl-theoretic approach to scheduling:Dimension reduced from a linear program of dimension 1116 ... to an HJB equation of dimension 16 Does this solve the problem? 7 / 26
    • Complex Networks Uncongested Congested Highly CongestedComplex Networks 8 / 26
    • Complex Networks Uncongested Congested Highly CongestedComplex Networks First, a review of some control theory... 8 / 26
    • Complex NetworksDynamic Programming EquationsDeterministic model x = f (x, u) ˙ 9 / 26
    • Complex NetworksDynamic Programming EquationsDeterministic model x = f (x, u) ˙Controlled generator dDu h (x) = dt h(x(t)) t=0 x(0)=x u(0)=u 9 / 26
    • Complex NetworksDynamic Programming EquationsDeterministic model x = f (x, u) ˙Controlled generator dDu h (x) = dt h(x(t)) t=0 = f (x, u) · h (x) x(0)=x u(0)=u 9 / 26
    • Complex NetworksDynamic Programming EquationsDeterministic model x = f (x, u) ˙Controlled generator dDu h (x) = dt h(x(t)) t=0 = f (x, u) · h (x) x(0)=x u(0)=uMinimal total cost: ∞ J ∗ (x) = inf c(x(t), u(t)) dt , x(0) = x U 0HJB Equation: min c(x, u) + Du J ∗ (x) = 0 u 9 / 26
    • Complex NetworksDynamic Programming EquationsDiffusion model dX = f (X, U )dt + σ(X)dNControlled generator d Du h (x) = E[h(X(t))] t=0 dt x(0)=x u(0)=u 2 = f (x, u) · h (x) + 1 trace σ(x)T 2 h (x)σ(x) 10 / 26
    • Complex NetworksDynamic Programming EquationsDiffusion model dX = f (X, U )dt + σ(X)dNControlled generator d Du h (x) = E[h(X(t))] t=0 dt x(0)=x u(0)=u 2 = f (x, u) · h (x) + 1 trace σ(x)T 2 h (x)σ(x)Minimal average cost: T 1 η ∗ = inf lim c(X(t), U (t)) dt U T →∞ T 0 10 / 26
    • Complex NetworksDynamic Programming EquationsDiffusion model dX = f (X, U )dt + σ(X)dNControlled generator d Du h (x) = E[h(X(t))] t=0 dt x(0)=x u(0)=u 2 = f (x, u) · h (x) + 1 trace σ(x)T 2 h (x)σ(x)Minimal average cost: T 1 η ∗ = inf lim c(X(t), U (t)) dt U T →∞ T 0ACOE (Average Cost Optimality Equation): min c(x, u) + Du h∗ (x) = η ∗ u h∗ is the relative value function 10 / 26
    • Complex NetworksDynamic Programming EquationsMDP model X(t + 1) − X(t) = f (X(t), U (t), N (t + 1))Controlled generator Du h (x) = E[h(X(1)) − h(X(0))] = E[h(x + f (x, u, N ))] − h(x) 11 / 26
    • Complex NetworksDynamic Programming EquationsMDP model X(t + 1) − X(t) = f (X(t), U (t), N (t + 1))Controlled generator Du h (x) = E[h(X(1)) − h(X(0))] = E[h(x + f (x, u, N ))] − h(x)Minimal average cost: T −1 ∗ 1 η = inf lim c(X(t), U (t)) U T →∞ T 0ACOE (Average Cost Optimality Equation): min c(x, u) + Du h∗ (x) = η ∗ u h∗ is the relative value function 11 / 26
    • Complex NetworksApproximate Dynamic ProgrammingODE model from the MDP model, X(t + 1) − X(t) = f (X(t), U (t), N (t + 1))Mean drift: f (x, u) = E[X(t + 1) − X(t) | X(t) = x, U (t) = u] 12 / 26
    • Complex NetworksApproximate Dynamic ProgrammingODE model from the MDP model, X(t + 1) − X(t) = f (X(t), U (t), N (t + 1))Mean drift: f (x, u) = E[X(t + 1) − X(t) | X(t) = x, U (t) = u]Fluid Model: x(t) = f (x(t), u(t)) ˙ 12 / 26
    • Complex NetworksApproximate Dynamic ProgrammingODE model from the MDP model, X(t + 1) − X(t) = f (X(t), U (t), N (t + 1))Mean drift: f (x, u) = E[X(t + 1) − X(t) | X(t) = x, U (t) = u]Fluid Model: x(t) = f (x(t), u(t)) ˙First-order Taylor series approximation: Du h (x) = E[h(x + f (x, u, N ))] − h(x) ≈ f (x, u) · h (x) 12 / 26
    • Complex NetworksApproximate Dynamic ProgrammingODE model from the MDP model, X(t + 1) − X(t) = f (X(t), U (t), N (t + 1))Mean drift: f (x, u) = E[X(t + 1) − X(t) | X(t) = x, U (t) = u]Fluid Model: x(t) = f (x(t), u(t)) ˙First-order Taylor series approximation: Du h (x) = E[h(x + f (x, u, N ))] − h(x) ≈ f (x, u) · h (x) A second-order Taylor series expansion leads to a Diffusion Model. 12 / 26
    • Complex NetworksADP for Stochastic NetworksConclusions as of April 21, 2011 Stochastic Model: Q(t + 1) − Q(t) = B(t + 1)U (t) + A(t + 1) d Fluid Model: q(t) = Bu(t) + α Cost c(x, u) = |x| dt Relative value function h∗ Total cost value function J ∗ 13 / 26
    • Complex NetworksADP for Stochastic NetworksConclusions as of April 21, 2011 Stochastic Model: Q(t + 1) − Q(t) = B(t + 1)U (t) + A(t + 1) d Fluid Model: q(t) = Bu(t) + α Cost c(x, u) = |x| dt Relative value function h∗ Total cost value function J ∗ Inventory model: Controlled work-release, controlled routing, uncertain demand q: Queue length evolves on R16 . + u: Scheduling/routing decisions — demand 1 Convex relaxation α: Mean exogenous arrivals of work demand 2 B: Captures network topology 13 / 26
    • Complex NetworksADP for Stochastic NetworksConclusions as of April 21, 2011 Stochastic Model: Q(t + 1) − Q(t) = B(t + 1)U (t) + A(t + 1) d Fluid Model: q(t) = Bu(t) + α Cost c(x, u) = |x| dt Relative value function h∗ Total cost value function J ∗Key conclusions – analytical Stability of q implies stochastic stability of Q Dai, Dai & M. 1995 h∗ (x) ≈ J ∗ (x) for large |x| M. 1996–2011 In many cases, the translation of the optimal policy for q is approximately optimal, with logarithmic regret M. 2005 & 2009 14 / 26
    • Complex NetworksADP for Stochastic NetworksConclusions as of April 21, 2011 Stochastic Model: Q(t + 1) − Q(t) = B(t + 1)U (t) + A(t + 1) d Fluid Model: q(t) = Bu(t) + α Cost c(x, u) = |x| dt Relative value function h∗ Total cost value function J ∗Key conclusions – engineering Stability of q implies stochastic stability of Q Simple decentralized policies based on q Tassiulas, 1995 – Workload relaxation for model reduction M. 2003 –, following “heavy traffic” theory: Laws, Kelly, Harrison, Dai, ... Intuition regarding structure of good policies 15 / 26
    • Complex NetworksADP for Stochastic NetworksWorkload Relaxations R STO R∗ Inventory model: Controlled work-release, controlled routing, 50 uncertain demand w2 demand 1 0 demand 2 -20 -20 0 50 w1Workload process: W evolves on R2Relaxation: Only lower bounds on rates are preservedEffective cost: c(w) is the minimum of c(x), over all x consistent w. ¯ 16 / 26
    • Complex NetworksADP for Stochastic NetworksWorkload Relaxations R STO R∗ Inventory model: Controlled work-release, controlled routing, 50 uncertain demand w2 demand 1 0 demand 2 -20 -20 0 50 w1Workload process: W evolves on R2Relaxation: Only lower bounds on rates are preservedEffective cost: c(w) is the minimum of c(x), over all x consistent w. ¯Optimal policy for fluid relaxation: Non-idling on region R∗Optimal policy for stochastic relaxation: Introduce hedging 16 / 26
    • Complex NetworksADP for Stochastic NetworksPolicy translation R STO R∗ Inventory model: Controlled work-release, controlled routing, 50 uncertain demand w2 demand 1 0 demand 2 -20 -20 0 50 w1 Complete Policy Synthesis1. Optimal control of relaxation2. Translation to physical system: 2a. Achieve the approximation c(Q(t)) ≈ c(W (t)) ¯ 2b. Address boundary constraints ignored in fluid approximations 17 / 26
    • Complex NetworksADP for Stochastic NetworksPolicy translation R STO R∗ Inventory model: Controlled work-release, controlled routing, 50 uncertain demand w2 demand 1 0 demand 2 -20 -20 0 50 w1 Complete Policy Synthesis1. Optimal control of relaxation2. Translation to physical system: 2a. Achieve the approximation c(Q(t)) ≈ c(W (t)) ¯ 2b. Address boundary constraints ignored in fluid approximations achieved using safety stocks. 17 / 26
    • Architectures for Adaptation & Learning Singular Perturbations Mean-Field Games Workload Relaxations 1(individual state) (ensemble state) q1 q5 q2 q6 Agent 5 q 13 q 15 0 barely controllable q3 q7 Station 1 Station 2 d1 q8 Agent 4 q 16 q 14 q4 q9 q 12 -1 4 d2 0 1 2 3 4 5 6 7 8 9 10 x 10 Station 5 q 11 µ 10a q 10 µ 10b Station 4 Station 3 Fluid model R STO R∗ 50 w2 12.6 Di usion model Average Cost 12.4 Standard VIA 1 Initialized with quadratic Optimal policy 0.06 12.2 Initialized with optimal uid value function 12 0.05 0 11.8 0.04 11.6 0 0.03 -20 11.4 -20 0 50 0.02 11.2 w1 0.01 11 50 100 150 200 250 300 Iteration n −1 −1 0 1 Adaptation & Learning 18 / 26
    • Architectures for Adaptation & LearningReinforcement LearningApproximating a value function: Q-learningACOE Equation: min c(x, u) + Du h∗ (x) = η ∗ u h∗ : Relative value function η ∗ : Minimal average cost 19 / 26
    • Architectures for Adaptation & LearningReinforcement LearningApproximating a value function: Q-learningACOE Equation: min c(x, u) + Du h∗ (x) = η ∗ u h∗ : Relative value function η ∗ : Minimal average cost “Q-function”: Q∗ (x, u) = c(x, u) + Du h∗ (x) Watkins 1989 ... “Machine Intelligence Lab”@ece.ufl.edu 19 / 26
    • Architectures for Adaptation & LearningReinforcement LearningApproximating a value function: Q-learningACOE Equation: min c(x, u) + Du h∗ (x) = η ∗ u h∗ : Relative value function η ∗ : Minimal average cost “Q-function”: Q∗ (x, u) = c(x, u) + Du h∗ (x) Watkins 1989 ... “Machine Intelligence Lab”@ece.ufl.eduQ-Learning: Given parameterized family {Qθ : θ ∈ Rd }.Qθ is an approximation of the Q-function, or Hamiltonian Mehta & M. 2009 19 / 26
    • Architectures for Adaptation & LearningReinforcement LearningApproximating a value function: Q-learningACOE Equation: min c(x, u) + Du h∗ (x) = η ∗ u h∗ : Relative value function η ∗ : Minimal average cost “Q-function”: Q∗ (x, u) = c(x, u) + Du h∗ (x) Watkins 1989 ... “Machine Intelligence Lab”@ece.ufl.eduQ-Learning: Given parameterized family {Qθ : θ ∈ Rd }.Qθ is an approximation of the Q-function, or Hamiltonian Mehta & M. 2009Compute θ∗ based on observations — without using a system model. 19 / 26
    • Architectures for Adaptation & LearningReinforcement LearningApproximating a value function: TD-learningValue functions: For a given policy U (t) = φ(X(t)), T 1 η = lim c(X(t), U (t)) dt T →∞ T 0Poisson’s equation: h is again called a relative value function, c(x, u) + Du h (x) =η u=φ(x) 20 / 26
    • Architectures for Adaptation & LearningReinforcement LearningApproximating a value function: TD-learningValue functions: For a given policy U (t) = φ(X(t)), T 1 η = lim c(X(t), U (t)) dt T →∞ T 0Poisson’s equation: h is again called a relative value function, c(x, u) + Du h (x) =η u=φ(x)TD-Learning: Given parameterized family {hθ : θ ∈ Rd }. min{ h − hθ : θ ∈ Rd } Sutton 1988, Tsitsiklis & Van Roy, 1997 20 / 26
    • Architectures for Adaptation & LearningReinforcement LearningApproximating a value function: TD-learningValue functions: For a given policy U (t) = φ(X(t)), T 1 η = lim c(X(t), U (t)) dt T →∞ T 0Poisson’s equation: h is again called a relative value function, c(x, u) + Du h (x) =η u=φ(x)TD-Learning: Given parameterized family {hθ : θ ∈ Rd }. min{ h − hθ : θ ∈ Rd } Sutton 1988, Tsitsiklis & Van Roy, 1997Compute θ∗ based on observations — without using a system model. 20 / 26
    • Architectures for Adaptation & LearningReinforcement LearningApproximating a value function: How do we choose a basis? 21 / 26
    • Architectures for Adaptation & LearningReinforcement LearningApproximating a value function: How do we choose a basis?Basis selection: hθ (x) = θi ψi (x)ψ1 : Linearizeψ2 : Fluid model with relaxationψ3 : Diffusion model with relaxationψ4 : Mean-field game 21 / 26
    • Architectures for Adaptation & LearningReinforcement LearningApproximating a value function: How do we choose a basis?Basis selection: hθ (x) = θi ψi (x)ψ1 : Linearizeψ2 : Fluid model with relaxationψ3 : Diffusion model with relaxationψ4 : Mean-field gameExamples: Decentralized control, nonlinear control, processor speed-scaling 1 1 Optimal policy 0.06 Approximate relative value function h 15 ∗ Fluid value function J 0.05 ∗ Relative value function h 0.04 10 0 0 0.03 0.02 5 Agent 4 0.01 -1 −1 4 −1 0 1 0 0 5 10 x 10 0 5 Mean-Field Game Linearization Fluid Model 21 / 26
    • Next Steps Nodal Power Prices in NZ: $/MWh 100March 25: 50 0 4am 9am 2pm 7pm Otahuhu 20,000 StratfordMarch 26: 10,000 0 http://www.electricityinfo.co.nz/ 4am 9am 2pm 7pm Next Steps 22 / 26
    • Next StepsComplex SystemsMainly energy 23 / 26
    • Next StepsComplex SystemsMainly energyEntropic Grid: Advances in systems theory... Complex systems: Model reduction specialized to tomorrow’s grid Short term operations and long-term planning Resource allocation: Controlling supply, storage, and demand Resource allocation with shared constraints. Statistics and learning: For planning and forecasting Both rare and common events 23 / 26
    • Next StepsComplex SystemsMainly energyEntropic Grid: Advances in systems theory... Complex systems: Model reduction specialized to tomorrow’s grid Short term operations and long-term planning Resource allocation: Controlling supply, storage, and demand Resource allocation with shared constraints. Statistics and learning: For planning and forecasting Both rare and common events Economics for an Entropic Grid: Incorporate dynamics and uncertainty in a strategic setting. How to create policies to protect participants on both sides of the market, while creating incentives for R&D on renewable energy? 23 / 26
    • Next StepsComplex SystemsMainly energyHow to create policies to protect participants on both sides of the market,while creating incentives for R&D on renewable energy?Our community must consider long-term planning and policy, along withtraditional systems operations 24 / 26
    • Next StepsComplex SystemsMainly energyHow to create policies to protect participants on both sides of the market,while creating incentives for R&D on renewable energy?Our community must consider long-term planning and policy, along withtraditional systems operationsPlanning and Policy, includes Markets & Competition 24 / 26
    • Next StepsComplex SystemsMainly energyHow to create policies to protect participants on both sides of the market,while creating incentives for R&D on renewable energy?Our community must consider long-term planning and policy, along withtraditional systems operationsPlanning and Policy, includes Markets & Competition Evolution? 24 / 26
    • Next StepsComplex SystemsMainly energyHow to create policies to protect participants on both sides of the market,while creating incentives for R&D on renewable energy?Our community must consider long-term planning and policy, along withtraditional systems operationsPlanning and Policy, includes Markets & Competition Evolution? Too slow! 24 / 26
    • Next StepsComplex SystemsMainly energyHow to create policies to protect participants on both sides of the market,while creating incentives for R&D on renewable energy?Our community must consider long-term planning and policy, along withtraditional systems operationsPlanning and Policy, includes Markets & Competition Evolution? Too slow! What we need is Intelligent Design 24 / 26
    • Next StepsConclusionsThe control community has created many techniques for understandingcomplex systems, and a valuable philosophy for thinking about controldesign 25 / 26
    • Next StepsConclusionsThe control community has created many techniques for understandingcomplex systems, and a valuable philosophy for thinking about controldesignIn particular, stylized models can have great value: Insight in formulation of control policies Analysis of closed loop behavior, such as stability via ODE methods Architectures for learning algorithms Building bridges between OR, CS, and control disciplines The ideas surveyed here arose from partnerships with researchers in mathematics, economics, computer science, and operations research. 25 / 26
    • Next StepsConclusionsThe control community has created many techniques for understandingcomplex systems, and a valuable philosophy for thinking about controldesignIn particular, stylized models can have great value: Insight in formulation of control policies Analysis of closed loop behavior, such as stability via ODE methods Architectures for learning algorithms Building bridges between OR, CS, and control disciplines The ideas surveyed here arose from partnerships with researchers in mathematics, economics, computer science, and operations research.Besides the many technical open questions, my hope is to extend theapplication of these ideas to long-range planning, especially in applicationsto sustainable energy. 25 / 26
    • Next StepsReferences S. P. Meyn. Control Techniques for Complex Networks. Cambridge University Press, Cambridge, 2007. S. P. Meyn and R. L. Tweedie. Markov chains and stochastic stability. Second edition, Cambridge University Press – Cambridge Mathematical Library, 2009. S. Meyn. Stability and asymptotic optimality of generalized MaxWeight policies. SIAM J. Control Optim., 47(6):3259–3294, 2009. V. S. Borkar and S. P. Meyn. The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim., 38(2):447–469, 2000. S. P. Meyn. Sequencing and routing in multiclass queueing networks. Part II: Workload relaxations. SIAM J. Control Optim., 42(1):178–217, 2003. P. G. Mehta and S. P. Meyn. Q-learning and Pontryagin’s minimum principle. In Proc. of the 48th IEEE Conf. on Dec. and Control, pp. 3598–3605, Dec. 2009. W. Chen, D. Huang, A. A. Kulkarni, J. Unnikrishnan, Q. Zhu, P. Mehta, S. Meyn, and A. Wierman. Approximate dynamic programming using fluid and diffusion approximations with applications to power management. In Proc. of the 48th IEEE Conf. on Dec. and Control, pp. 3575–3580, Dec. 2009. 26 / 26