Tools for Discrete Time Control; Application to Power Systems

323 views

Published on

3 main algorithms from the state of the art:
- Model Predictive Control
- Stochastic Dynamic Programming
- Direct Policy Search
==> and our proposal, a modified Direct Policy Search
termed Direct Value Search

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
323
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Tools for Discrete Time Control; Application to Power Systems

  1. 1. I do not speak Chinese ! ! ! ● And my English is extremely French (when native English speakers listen to my English, they sometimes believe that they suddenly, by miracle, understand French)
  2. 2. I do not speak Chinese ! ! ! ● ● And my English is extremely French (when native English speakers listen to my English, they sometimes believe that they suddenly, by miracle, understand French) For the moment if I gave a talk in Chinese it would be boring, with only: ● hse-hse ● nirao ● pukachi
  3. 3. I do not speak Chinese ! ! ! ● ● ● And my English is extremely French (when native English speakers listen to my English, they sometimes believe that they suddenly, by miracle, understand French) For the moment if I gave a talk in Chinese it would be boring, with only: ● hse-hse ● nirao ● pukachi Interrupt me as much as you want for facilitating understanding :-)
  4. 4. High-Scale Power Systems: Simulation & Optimization Olivier Teytaud + Inria-Tao + Artelys TAO project-team INRIA Saclay Île-de-France O. Teytaud, Research Fellow, olivier.teytaud@inria.fr http://www.lri.fr/~teytaud/
  5. 5. Ilab METIS www.lri.fr/~teytaud/metis.html ● Metis = Tao + Artelys TAO tao.lri.fr, Machine Learning & Optimization ● Joint INRIA / CNRS / Univ. Paris-Sud team ● 12 researchers, 17 PhDs, 3 post-docs, 3 engineers ● Artelys www.artelys.com SME - France / US / Canada - 50 persons ==> collaboration through common platform ● ● Activities ● Optimization (uncertainties, sequential) ● Application to power systems O. Teytaud, Research Fellow, olivier.teytaud@inria.fr http://www.lri.fr/~teytaud/
  6. 6. Importantly, it is not a lie. ● ● ● ● It is a tradition, in research institutes, to claim some links with industry I don't claim that having such links is necessary or always a great achievement in itself But I do claim that in my case it is true that I have links with industry My four students here in Taiwan, and others in France, all have real salaries based on industrial fundings.
  7. 7. All in one slide Consider an electric system. Decisions = ● Strategic decisions (a few time steps): ● building a nuclear power plant ● build a Spain-Marocco connection ● build a wind farm ● tactical decisions (many time steps): ● switching on hydroelectricity plant #7 ● switching on thermal plant #4 ● .... Based on Simulations of the tactical level Depends on the strategic level
  8. 8. A bit more precisely: the strategic level Brute force approach for strategic level: ● ● I simulate ● each possible strategic decision (e.g. 20000); ● 1000 times; ● each of them with optimal tactical decisions ==> 20 000 optimizations, 1000 simulations each I choose the best one. Better: More simulations on the best strategic decisions. However, this talk will not focus on that part.
  9. 9. A bit more precisely: the tactical level Brute force approach for tactical level: ● ● Simplify ● Replace each random process by expectation ● Optimize decisions deterministically But reality is stochastic: ● Water inflows ● Wind farms Better: optimizing a policy (i.e. reactive, closed-loop)
  10. 10. Specialization on Power Systems ● Planning/control (tactical level) ● Pluriannual planning: evaluate marginal costs of hydroelectricity ● Taking into account stochasticity and uncertainties ==> IOMCA (ANR) ● High scale investment studies (e.g. Europe+North Africa) ● Long term (2030 - 2050) ● Huge (non-stochastic) uncertainties ● Investments: interconnections, storage, smart grids, power plants... ==> POST (ADEME) ● Moderate scale (Cities, Factories) (tactical level simpler) ● Master plan optimization ● Stochastic uncertainties ==> Citines project (FP7)
  11. 11. Example: interconnection studies (demand levelling, stabilized supply)
  12. 12. The POST project – supergrids simulation and optimization Mature technology:HVDC links (high-voltage direct current) European subregions: - Case 1 : electric corridor France / Spain / Marocco - Case 2 : south-west (France/Spain/Italiy/Tunisia/Marocco) - Case 3 : maghreb – Central West Europe ==> towards a European supergrid Related ideas in Asia
  13. 13. Tactical level: unit commitment at the scale of a coutry: looks like a game ● Many time steps. ● Many power plants. ● Some of them have stocks (hydroelectricity). ● Many constraints (rules). ● Uncertainties (water inflows, temperature, …) ==> make decisions: ● When should I switch on ? (for each PP) ● At which power ?
  14. 14. Investment decisions through simulations ● Issues – – – ● Methods – – ● Demand varying in time, limited previsibility Transportation introduces constraints Renewable ==> variability ++ Markovian assumptions ==> wrong Simplified models ==> Model error >> optimization error Our approach ● Machine Learning on top of Mathematical Programming
  15. 15. Hybridization reinforcement learning / mathematical programming ● Math programming (mathematicians doing discrete-time control) – – – ● Nearly exact solutions for a simplified problem High-dimensional constrained action space But small state space & not anytime Reinforcement learning (artificially intelligent people doing discrete-time control :-) ) – – – – Unstable Small model bias Small / simple action space But high dimensional state space & anytime
  16. 16. Now the technical part Model Predictive Control,
  17. 17. Now the technical part Model Predictive Control, Stochastic Dynamic Programming,
  18. 18. Now the technical part Model Predictive Control, Stochastic Dynamic Programming, Direct Policy Search
  19. 19. Now the technical part Model Predictive Control, Stochastic Dynamic Programming, Direct Policy Search, and Direct Value Search (new) (3/4 of this talk is about the state of the art, only 1/4 our work)
  20. 20. Now the technical part Model Predictive Control, Stochastic Dynamic Programming, Direct Policy Search and Direct Value Search combining Direct Policy Search and Stochastic Dynamic Programming (3/4 of this talk is about the state of the art, only 1/4 our work)
  21. 21. Many optimization tools (SDP, MPC): ● Strong constraints on forecasts ● Strong constraints on model structure. Direct Policy Search ● Arbitrary forecasts, arbitrary structure ● But not scalable / # decision variables. → merge: Direct Value Search Jean-Joseph.Christophe@inria.fr Jeremie.Decock@inria.fr Pierre.Defreminville@artelys.com Olivier.Teytaud@inria.fr
  22. 22. ● Stochastic Dynamic Optimization ● Classical solutions: Bellman (old & new) ● ● Markov Chains ● Overfitting ● ● Anticipativity SDP, SDDP Alternate solution: Direct Policy Search ● ● ● No problem with anticipativity Scalability issue The best of both worlds: Direct Value Search
  23. 23. Random process Random values Stochastic Control Controller commands with memory Observation Cost System State ● ● For an optimal representation, you need access to the whole archive or to forecasts (generative model / probabilistic forecasts) (Astrom 1965)
  24. 24. ● Stochastic Dynamic Optimization ● Classical solutions: Bellman (old & new) ● ● Markov Chains ● Overfitting ● ● Anticipativity (dirty solution) SDP, SDDP Alternate solution: Direct Policy Search ● ● ● No problem with anticipativity Scalability issue The best of both worlds: Direct Value Search
  25. 25. ● Anticipative solutions: ● Maximum over strategic decisions ● Of average over random processes ● ● Of optimized decisions, given random processes & strategic decisions Pros/Cons ● ● ● Much simpler (deterministic optimization) But in real life you can not guess November rains in January Rather optimistic decisions
  26. 26. MODEL PREDICTIVE CONTROL ● Anticipative solutions: ● Maximum over strategic decisions ● Of pessimistic forecasts (e.g. quantile) ● ● Of optimized decisions, given forecasts & strategic decisions Pros/Cons ● ● ● Much simpler (deterministic optimization) But in real life you can not guess November rains in January Not so optimistic, convenient, simple
  27. 27. MODEL PREDICTIVE CONTROL ● Anticipative solutions: ● Maximum over strategic decisions ● Of pessimistic forecasts (e.g. quantile) ● ● Of optimized decisions, given forecasts & strategic decisions Ok, we have done one Pros/Cons of the four targets: Much simpler (deterministic optimization) model predictive But in real life you can not guess November control. rains in January ● ● ● Not so optimistic, convenient, simple
  28. 28. ● Stochastic Dynamic Optimization ● Classical solutions: Bellman (old & new) ● ● Markov Chains ● Overfitting ● ● Anticipativity (dirty solution) SDP, SDDP Alternate solution: Direct Policy Search ● ● ● No problem with anticipativity Scalability issue The best of both worlds: Direct Value Search
  29. 29. Markov solution Representation as a Markov process (a tree): This is the representation of the random process. Let us see how to represent the rest.
  30. 30. How to solve, simple case, binary stock, one day It is December 30th and I have water er at ) w 0 se t = I u os (c No more water, december 31st I do not use I have water, december 31st
  31. 31. How to solve, simple case, binary stock, one day It is December 30th and I have water: Future Cost = 0 er at ) w 0 se t = I u os (c No more water, december 31st I do not use I have water, december 31st
  32. 32. How to solve, simple case, binary stock, 3 days, no random process 2 1 1 2 3 2 2 2 3 1 4 3 3 3 3 2 3
  33. 33. How to solve, simple case, binary stock, 3 days, no random process 1 1 2 2 2 3 2 2 2 2 3 1 4 3 3 2 3 2 3 3
  34. 34. How to solve, simple case, binary stock, 3 days, no random process 4 3 1 2 2 1 2 2 2 5 3 2 4 1 2 3 4 3 3 3 7 3 2 6 2 3
  35. 35. This was deterministic ● How to add a random process ? ● Just multiply nodes :-)
  36. 36. How to solve, simple case, binary stock, 3 days, random parts 2 2 1 1 23 2 4 22 2 5 3 1 4 2 2 3 73 36 3 3 3 4 4 3 2 2 1 23 2 4 22 2 5 3 1 4 2 2 3 73 36 3 3 1 o ba Pr 1/ 3 ility b Pro bab ilit y 2/ 3 2 2 1 23 2 4 22 2 5 3 1 4 2 2 3 73 36 3 3 4 1 3
  37. 37. Markov solution: ok you have understood stochastic dynamic programming (Bellman) Representation as a Markov process (a tree): This is the representation of the random process. In each node, there are the state-nodes with decision-edges.
  38. 38. Markov solution: ok you have understood stochastic dynamic programming (Bellman) Representation as a Markov process (a tree): Ok, we have done the 2nd of the four targets: This is the representation stochastic dynamic of the random process. programming In each node, there are the state-nodes with decision-edges.
  39. 39. Markov solution Representation as a Markov process (a tree): Optimize decisions for each state. This means you are not cheating. But difficult to use. Strategy optimized for very specific forecasting models Might be ok for your problem ?
  40. 40. ● Stochastic Dynamic Optimization ● Classical solutions: Bellman (old & new) ● ● Markov Chains ● Overfitting ● ● Anticipativity (dirty solution) SDP, SDDP Alternate solution: Direct Policy Search ● ● ● No problem with anticipativity Scalability issue The best of both worlds: Direct Value Search
  41. 41. Overfitting ● Representation as a Markov process (a tree): How do you actually make decisions when the random values are not exactly those observed ? (heuristics...) ● ● ● Check on random realizations which have not been used for building the tree. Does it work correctly ? Overfitting = when it works only on scenarios used in the optimization process.
  42. 42. ● Stochastic Dynamic Optimization ● Classical solutions: Bellman (old & new) ● ● Markov Chains ● Overfitting ● ● Anticipativity (dirty solution) SDP, SDDP Alternate solution: Direct Policy Search ● ● ● No problem with anticipativity Scalability issue The best of both worlds: Direct Value Search
  43. 43. SDP / SDDP Stochastic (Dual) Dynamic Programming ● Representation of the controller with Linear Progamming (value function as piecewise linear)
  44. 44. SDP / SDDP Stochastic (Dual) Dynamic Programming ● Representation of the controller with Linear Progamming (value function as piecewise linear) ● → ok for 100 000 decision variables per time step (tenths of time steps, hundreds of plants, several decisions each)
  45. 45. SDP / SDDP Stochastic (Dual) Dynamic Programming ● Representation of the controller with Linear Progamming (value function as piecewise linear) ● ● → ok for 100 000 decision variables per time step but solving by expensive SDP/SDDP (curse of dimensionality, exp. in state variables)
  46. 46. SDP / SDDP Stochastic (Dual) Dynamic Programming ● Representation of the controller with Linear Progamming (value function as piecewise linear) ● ● ● → ok for 100 000 decision variables per time step but solving by expensive SDP/SDDP Constraints ● Needs LP approximation: ok for you ?
  47. 47. SDP / SDDP Stochastic (Dual) Dynamic Programming ● Representation of the controller with Linear Progamming (value function as piecewise linear) ● ● ● → ok for 100 000 decision variables per time step but solving by expensive SDP/SDDP Constraints ● Needs LP approximation: ok for you ? ● SDDP requires convex Bellman values: ok for you ?
  48. 48. SDP / SDDP Stochastic (Dual) Dynamic Programming ● Representation of the controller with Linear Progamming (value function as piecewise linear) ● ● ● → ok for 100 000 decision variables per time step but solving by expensive SDP/SDDP Constraints ● Needs LP approximation: ok for you ? ● SDDP requires convex Bellman values: ok for you ? ● Needs Markov random processes: ok for you ? (possibly after some random process extension...)
  49. 49. SDP / SDDP Stochastic (Dual) Dynamic Programming ● Representation of the controller with Linear Progamming (value function as piecewise linear) ● ● ● → ok for 100 000 decision variables per time step but solving by expensive SDP/SDDP Constraints ● Needs LP approximation: ok for you ? ● SDDP requires convex Bellman values: ok for you ? ● Needs Markov random processes: ok for you ? (possibly after some random process extension...) ● Goal keep scalability but get rid of SDP/SDDP solving
  50. 50. Summary ● ● ● Most classical solution = SDP and variants Or MPC (model-predictive control), replacing the stochastic parts by deterministic pessimistic forecasts Statistical modelization is “cast” into a tree model & (probabilistic) forecasting modules are essentially lost
  51. 51. ● Stochastic Dynamic Optimization ● Classical solutions: Bellman (old & new) ● ● Markov Chains ● Overfitting ● ● Anticipativity (dirty solution) SDP, SDDP Alternate solution: Direct Policy Search ● ● ● No problem with anticipativity But scalability issue The best of both worlds: Direct Value Search
  52. 52. Direct Policy Search ● ● ● ● Requires a parametric controller Principle: optimize the parameters on simulations Unusual in large scale Power Systems (we will see why) Usual in other areas (finance, evolutionary robotics)
  53. 53. Random process Random values Stochastic Control Controller commands with memory State Cost System State Optimize the controller thanks to a simulator: ● ● ● Command = Controller(w,state,forecasts) Simulate( w ) = stochastic loss with parameter w w* = argmin [Simulate(w)]
  54. 54. Random process Random values Stochastic Control Controller commands with memory State ● ● ● Cost System Ok, State we have done the 3rd of the four targets: Optimize the controller thanks to a simulator: Direct policy search. Command = Controller(w,state,forecasts) Simulate( w ) = stochastic loss with parameter w w* = argmin [Simulate(w)]
  55. 55. Direct Policy Search (DPS) ● Requires a parametric controller
  56. 56. Direct Policy Search (DPS) ● Requires a parametric controller e.g. neural network Controller(w,x) = W3+W2.tanh(W1.x+W0)
  57. 57. Direct Policy Search (DPS) ● Requires a parametric controller e.g. neural network Controller(w,x) = W3+W2.tanh(W1.x+W0) ● Noisy Black-Box Optimization
  58. 58. Direct Policy Search (DPS) ● Requires a parametric controller e.g. neural network Controller(w,x) = W3+W2.tanh(W1.x+W0) ● Noisy Black-Box Optimization ● Advantages: non-linear ok, forecasts included
  59. 59. Direct Policy Search (DPS) ● Requires a parametric controller e.g. neural network Controller(w,x) = W3+W2.tanh(W1.x+W0) ● Noisy Black-Box Optimization ● ● Advantages: non-linear ok, forecasts included Issue: too slow hundreds of parameters for even 20 decision variables (depends on structure)
  60. 60. Direct Policy Search (DPS) ● Requires a parametric controller e.g. neural network Controller(w,x) = W3+W2.tanh(W1.x+W0) ● Noisy Black-Box Optimization ● ● Advantages: non-linear ok, forecasts included Issue: too slow hundreds of parameters for even 20 decision variables (depends on structure) ● Idea: a special structure for DPS (inspired from SDP)
  61. 61. Direct Policy Search (DPS) ● Requires a parametric controller e.g. neural network Controller(w,x) = W3+W2.tanh(W1.x+W0) ● Noisy Black-Box Optimization ● ● Strategy optimized given the real Advantages: non-linear ok, forecasts included forecasting module you have Issue: too slow hundreds of parameters for even 20 decision variables (forecasts are inputs) (depends on structure) ● Idea: a special structure for DPS (inspired from SDP)
  62. 62. ● Stochastic Dynamic Optimization ● Classical solutions: Bellman (old & new) ● ● Markov Chains ● Overfitting ● ● Anticipativity (dirty solution) SDP, SDDP Alternate solution: Direct Policy Search ● ● ● No problem with anticipativity Scalability issue The best of both worlds: Direct Value Search
  63. 63. Direct Value Search SDP representation in DPS Controller(state) = argmin Cost(decision) + V(next state) ● V(nextState) = alpha x NextState ● LP alpha = NeuralNetwork(w,state) Not LP (or a more sophisticated LP) ==> given w, decision making solved as a LP ==> non-linear mapping for choosing the parameters of the LP from the current state
  64. 64. Direct Value Search SDP representation in DPS Controller(state) = argmin Cost(decision) + V(next state) ● V(nextState) = alpha x NextState ● LP alpha = NeuralNetwork(w,state) Not LP (or a more sophisticated LP) Drawback: requires the optimization of w ( = noisy black-box optimization problem)
  65. 65. Summary: the best of both worlds Controller(w,state) The Structure of the Controller (fast, scalable by structure) ● V(w,state,.) is non-linear ● Optimize Cost(dec) + V(w,state,nextState) is LP Simul(w) ● Do a simulation with w ● A simulator (you can put anything you want in it, even if it is not linear, nothing Markovian...) Return the cost DirectValueSearch ● optimize w* = argmin simul(w) ● Return Controller with w* The optimization (will do its best, given the simulator and the structure)
  66. 66. Summary: the best of both worlds Controller(w,state) ● V(w,state,.) is non-linear ● Optimize Cost(dec) + V(w,state,nextState) is LP Simul(w) ● Do a simulation with w ● Return the cost 3 optimizers: ● SAES DirectValueSearch ● ● optimize w* = argmin simul(w) ● Return Controller with w* ● Fabian: ● gradient descent ● redundant finite differences Newton version
  67. 67. Ok, we have done the 4th of the four targets: Direct value search.
  68. 68. State of the art in discrete-time control, a few tools: ● Model Predictive Control: For making a decision in a given state: (i) do forecasts (ii) replace random procs -> pessimistic forecasts (iii) Optimize as if deterministic problem ● Stochastic Dynamic Programming: ● ● ● Markov model Compute “cost to go” backwards Direct Policy Search: ● Parametric controller ● Optimized on simulations
  69. 69. Conclusion ● Still rather preliminary (less tested than MPC or SDDP) but promising: ● ● ● ● Forecasts naturally included in optimization Anytime algorithm (the user immediately gets approximate results) No convexity constraints Room for detailed simulations (e.g. with very small time scale, for volatility) ● No random process constraints (not Markov) ● Can handle large state spaces (as DPS) ● Can handle large action spaces (as SDP) ==> can work on the “real” problem, without “cast”
  70. 70. Bibliography ● ● ● ● Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC. D. Bertsekas, 2005. (MPC = deterministic forecasts) Astrom 1965 Renewable energy forecasts ought to be probabilistic! P. Pinson, 2013 (wipfor talk) Training a neural network with a financial criterion rather than a prediction criterion. Y. Bengio, 1997 (quite practical application of direct policy search, convincing experiments)
  71. 71. Questions ?
  72. 72. Appendix
  73. 73. SDP / SDDP Stochastic (Dual) Dynamic Programming ● Representation of the controller ● decision(current state)= argmin Cost(decision) + Bellman(next state) ● Linear programming (LP) if: – – ● For a given current state, next state = LP(decision) Cost(decision) = LP(decision) →100 000 decision variables per time step

×