- Stochastic dynamic programming (SDP) and stochastic dual dynamic programming (SDDP) are algorithms for solving sequential decision making problems under uncertainty.
- They represent the value function and controller as piecewise linear functions that can be encoded in linear programming formulations. This allows solving problems with up to around 100,000 decision variables per time step.
- However, solving the full problem using SDP/SDDP can be computationally expensive due to the "curse of dimensionality" as the number of states increases. The methods also require linear programming approximations and convex value functions.
4. Power systems
● Many jobs in AI for power systems
● Important for the economy and for the world
● Join the force, do Machine Learning for Power
Systems!
5. All in one slide
● Noisy optimization
● Direct Policy Search
● Model Predictive Control + receding horizon
● Reinforcement learning / Markov Decision Processes
● Stochastic (dual) dynamic programming
● Bootstrap, bias correction, sample averrage
approximation
● Non stochastic uncertainties (Wald, Savage)
● + power systems (stability of networks, capacity
markets, domino effect, HVDC, unit commitment,
dispatch, UC by sort)
6. Most important slide of this talk
Something is unclear ?
Something is wrong ?
==> INTERRUPT ME !!!
If there is a problem for you, there is
probably a problem for 50 others :-)
Be a hero! Interrupt this
presentation at least once :-)
8. First, the power systems part
● Maybe you don't care about power systems
● But I promise it's cool and fun :-)
9. Energy matters!
The “30 glorieuses” (45-73) in
some western countries:
● No unemployment
● Growth
● Baby boom
==> stop at 1973 oil crisis
==> correlation economy/energy.
10. Pollution is complicated: numbers
from nextbigfuture (not super recent)
The death toll is not the only criterion, and you can disagree
with these numbers (so hard to evaluate).
Coal more
radioactive
than nuclear
power ?
Will
improve ?
11. Specifying costs: what scientists and
engineers do not decide
● Economical costs
● Ecological cost
● Air
● CO2 (+greenhouse gas)
● Water
● Waste storage
● ...
● Externalities
● Maintenance death
● Faults, quality of service
12. Energy pollutes
Climate change... yes it matters, but
business as usual (will change ?)
Air pollution: kills more than aids + malaria ?
Coal = cheap + huge reserves.
Nuclear power: Chernobyl + Fukushima
13. Electricity is tricky
Too much production ? Frequency increases.
Not enough production (fault) ? Freq. decreases.
==> both can be harmful
==> electricity needs instantaneous equilibrium
==> but some energies are intermittent, volatile
(wind) ou slow (coal, nuclear) and there are faults.
Comparing power plants only from the point of
view of euros per MWh = very approximate
==> study an energy mix, not a single energy source
14. Alternate current
Frequency must be stable.
Some power plants contribute
to real-time stabilization
(frequency++ ==> power--)
(frequency-- ==> power++)
15. Plants paid for … doing “nothing” ?
Intermittent wind or variable demand
+ “prod = demand” constraint
==> need fast/reactive power plants (reserves).
These PPs need more than the market price
==> capacity market (paid for being here “in case”)
==> complex economical model in deregulated
markets
16. Energy is (almost) a collaborative game
Sharing (peak hours, reserves...) is great.
Excellent for renewable energies
Collaboration not that bad in Europe.
==> Towards a European energy mix (solar in the
South, wind in the North).
A possible paradigm:
●Maximize social surplus assuming collaboration
(decision variables = investments).
●Assume that the legislator will take care of
incentives.
17. Optimization & energy
I. The most important question in the universe
II. Examples
III. A key problem: uncertainties
IV. Algorithms
18. Denmark
Wind power
33% local consumption
Implies need for
● connections (++)
● storage
● and/or gas plants
==> sometimes negative prices
==> 8.4 t CO2 per person (France 6.1, Usa 17.2)
Storage: electric vehicles ?
19. China
● Coal, massively
==> air pollution
● PV units production
cheap thanks to no environmental constraints + labor law
● Wind power + long distance DC connections
Imports from countries w/o ecological norms ?
Intoxicate babies in China rather than in
Europe ?
26. Kerguelen: let's be crazy ?
Big surface
Wind 35 km/h frequent,
150 km/h usual,
peaks 200 km/h.
Perfect for wind power.
No consumption around.
==> H2 synthesis ? or move industries there ?
Important place for wild life.
27. Greenland: yet a bit more crazy
Wind power on all shores ?
Connect to America and Europe ?
(different peak hours)
28. Scandinavia
Still good locations
for hydropower.
Big connections to the
rest of Europe ?
Hydro storage convenient for smoothing
intermittent energy sources.
All Europe using storage in Sweden ?
Or H2 storage (not yet technically ok) ?
29. Beautiful problems!
● Definitely important
● All time scales
●Building PP (dozens of years)
●Building connections (dozens of years)
●Hydro planning (years)
●Nuclear planning (months, years)
●Thermal plants (hours, days)
●Faults, reserves (< second to months)
● Nonlinear effects
● Plenty of constraints (non separable!)
● High dimensional:
●action spaces (~10000)
●state spaces (~100)
32. Peak shaving by pumped storage
Expensive
energy
Expensive
energy
Cheap
energy
33. Peak shaving by pumped storage
Pumping
Hydro
power
Hydro
power
34. Also: take care of networks!
● Domino effect
● Overloaded line
● ==> failure
● ==> other lines overloaded
● ==> other lines fail
● ==> Baouuuum!!!!!
35. United States: the 2003 blackout
● Overloaded line + bug (race condition, paral. prog)
● Domino effect! 45 millions people with no
electricity (2 days), plus various damages
37. The POST project – supergrids
simulation and optimization
Mature technology: HVDC links
(high-voltage direct current)
Related ideas in Asia
(more political issues)
38. HVDC might change the world
● Transmission networks are high voltage
alternate current
● But some connections are high voltage direct
current:
– Reduces losses (for long distance)
– Removes the need for frequency stabilization
40. Power systems: decision variables
Decisions =
● Strategic decisions (a few time steps):
●building a nuclear power plant
●build a Spain-Marocco connection
●build a wind farm
● tactical decisions (many time steps):
●switching on hydroPP #7 at 6:00
●switching on thermal PP #4 at 7:15
●....
Based on
simulations
of the
tactical level
Depends on
the
strategic
level
41. Sequential decision making
● Issues
– Demand varying in time, limited previsibility
– Transmission introduces constraints (no “copper plate”)
– Renewable ==> variability ++ (no “deterministic approach”)
● Methods
– Markovian assumptions ==> sometimes wrong!!!!
– Simplified models ==> Model error >> optimization error
● Approaches
● Machine Learning / Mathematical Programming
42. Stochastic Control
Reinforcement learning: black box stochastic control
Implicit assumption “state = observation” ?
Sometimes “state” is a huge unknown thing.
System
Controller
with
memory
commands
State
Cost
State
(known structure ?
or black-box ?)
Random values
Random
process
Observation
( = state ?)
43. Hybridization reinforcement learning /
mathematical programming
● Math programming (mathematicians doing discrete-time
control)
– Nearly exact solutions for a simplified problem
– High-dimensional constrained action space
– But small state space, linearization, Markov & not anytime
==> 99% of what I've seen in industry
● Reinforcement learning (geeks doing DTC)
– Unstable :-( (except DPS)
– Small model bias
– Small / simple action space <== often the main issue
– But high dimensional state space & anytime
44. 3 examples of algorithms
==> Model Predictive Control,
Stochastic Dynamic
Programming,
Direct Policy Search
45. ● Anticipative solutions:
● Replace all random parts by deterministic parts
● Optimize deterministically
● Pros/Cons
● Much simpler (deterministic optimization)
● But in real life you can not guess November
rains in January
● Rather optimistic decisions
MODEL PREDICTIVE CONTROL
46. ● Looks like pure bullshit: 100% deterministic
● Still so convenient:
● So many constraints
● Huge state spaces
● Just having a bug-free simulator is so hard
● So many uncertainties
e.g. paper “Newave vs Odin”: other methods
have worst assumptions (convexity, Markovian
random processes, etc ==> later)
MODEL PREDICTIVE CONTROL
47. Shrinking horizon / receding horizon
1 Assume you know the next 48 hours
2 Optimize the reward over these 48 hours
3 In fact, just apply two hours of decisions
4 Go back to 1 and “t ← t + 2hours”.
==> operational horizon = 2 hours
==> tactical horizon = 48 hours
All effects lasting more than 48 hours are
neglected !!!!
48. Receding horizon + valorization
1 Assume you know the next 48 hours
2 Optimize the reward over these 48 hours +
bonus e.g. 5 euros per MWh in each stock
3 In fact, just apply two hours of decisions
4 Go back to 1 and “t ← t + 2hours”.
==> Much better but sometimes
still plain wrong
49. Receding horizon + constraint
1 Assume you know the next 48 hours
2 Optimize the reward over these 48 hours +
constraint > lower bound given by humans
(history)
3 In fact, just apply two hours of decisions
4 Go back to 1 and “t ← t + 2hours”.
==> not principled (imitation), but convenient
+ polynomial time with a linear model.
50. Receding horizon
+ learnt valorization
1 Assume you know the next 48 hours
2 Optimize the reward over these 48 hours +
learntFunction(currentState,stocks)
3 In fact, just apply two hours of decisions
4 Go back to 1 and “t ← t + 2hours”.
==> still polynomial if learntFunction “linear
programming” as a function of actions
==> I love that (but I might be biased :-) )
(Direct Value Seach)
How to learn that ? By Direct Policy Search!
51. 3 examples of algorithms
Model Predictive Control,
==> Stochastic Dynamic
Programming,
Direct Policy Search
52. How to solve, simple case, three
states, 3 days, no random
process
1 1
2
32
2 2
2
3
2 3
3
3
3
3
4
1
53. How to solve, simple case, three
states, 3 days, no random
process
2
2
2
1 1
2
32
2 2
2
3
2 3
3
3
3
3
4
1
54. How to solve, simple case, three
states, 3 days, no random
process
3
4
6
2
2
2
1 1
2
32
2 2
2
3
2 3
3
3
3
3
4
1
55. How to solve, simple case, three
states, 3 days, no random
process
4
5
7
3
4
6
2
2
2
1 1
2
32
2 2
2
3
2 3
3
3
3
3
4
1
56. This was deterministic
● How to add a random process ?
● Don't believe that the world is limited to
compact MDPs :-)
● Remember Astrom'65 ? Sometimes you need
the history of observations (or latent variables)
in the state ==> can't make optimal decisions
just with current observations.
● Build a huge tree of possible futures and
multiply nodes ?
58. The huge MDP necessary for solving a non-
Markovian problem
Representation as a Markov process (a tree):
This is the representation
of the random process.
In each node, there are the
state-nodes with decision-edges.
59. The huge MDP necessary for solving a non-
Markovian problem
Representation as a Markov process (a tree):
This is the representation
of the random process.
In each node, there are the
state-nodes with decision-edges.
Huge
representation.
Value-based
approaches
untractable.
60. Overfitting
● Representation as a Markov process (a tree):
How do you actually make decisions when the random values
are not exactly those observed ? (heuristics...)
● Check on random realizations which have not been used for
building the tree.
● Does it work correctly ? ===> cross validation
● Overfitting = when it works only on scenarios used in the
optimization process.
(see B. Defourny, D. Ernst and L. Wehenkel, INFORMS Journal on Computing, Vol. 25(3), 2013,
61. SDP / SDDP
Stochastic (Dual) Dynamic Programming
● Representation of the controller with Linear Progamming
(value function as piecewise linear) (often)
62. SDP / SDDP
Stochastic (Dual) Dynamic Programming
● Representation of the controller with Linear Progamming
(value function as piecewise linear)
Maximum of linear functions = can be encoded in linear programming.
==> Each argmax is polynomial.
63. SDP / SDDP
Stochastic (Dual) Dynamic Programming
● Representation of the controller with Linear Progamming
(value function as piecewise linear)
Maximum of linear functions = can be encoded in linear programming.
==> Each argmax is polynomial.
64. SDP / SDDP
Stochastic (Dual) Dynamic Programming
● Representation of the controller with Linear Progamming
(value function as piecewise linear)
Maximum of linear functions = can be encoded in linear programming.
==> Each argmax is polynomial.
noise is multiplied
+ strict convexity
required
65. SDP / SDDP
Stochastic (Dual) Dynamic Programming
● Representation of the controller with Linear Progamming
(value function as piecewise linear)
● → ok for 100 000 decision variables per time step
(tenths of time steps, hundreds of plants, several
decisions each)
66. SDP / SDDP
Stochastic (Dual) Dynamic Programming
● Representation of the controller with Linear Progamming
(value function as piecewise linear)
● → ok for 100 000 decision variables per time step
● but solving by expensive SDP/SDDP (curse of
dimensionality, exp. in state variables)
67. SDP / SDDP
Stochastic (Dual) Dynamic Programming
● Representation of the controller with Linear Progamming
(value function as piecewise linear)
● → ok for 100 000 decision variables per time step
● but solving by expensive SDP/SDDP
● Constraints
● Needs LP approximation: ok for you ?
68. SDP / SDDP
Stochastic (Dual) Dynamic Programming
● Representation of the controller with Linear Progamming
(value function as piecewise linear)
● → ok for 100 000 decision variables per time step
● but solving by expensive SDP/SDDP
● Constraints
● Needs LP approximation: ok for you ?
● SDDP requires convex Bellman values: ok for you ?
69. SDP / SDDP
Stochastic (Dual) Dynamic Programming
● Representation of the controller with Linear Progamming
(value function as piecewise linear)
● → ok for 100 000 decision variables per time step !!!
● but slow in terms of state variables (exponential)
● Constraints
● Needs LP approximation: ok for you ?
● SDDP requires convex Bellman values: ok for you ?
● Needs Markov random processes: ok for you ?
(possibly after some random process extension...)
70. Summary
● Most classical solution = SDP and variants
● Or MPC (model-predictive control), replacing
the stochastic parts by deterministic pessimistic
forecasts
71. 3 examples of algorithms
Model Predictive Control,
Stochastic Dynamic
Programming,
==> Direct Policy Search
72. Direct Policy Search
● Requires a parametric controller
● Principle: optimize the parameters on
simulations (= simulation-based optim)
● Unusual in large scale Power Systems
(we will see why)
● Usual in other areas (evolutionary robotics)
73. Stochastic Control by DPS
System
Controller
with
memory
commands
State
Cost
State
Random values
Random
process
Optimize the controller thanks to a simulator:
● Command = Controller(w,state,forecasts)
● Simulate( w ) = stochastic loss with parameter w
● w* = argmin [Simulate(w)] <== noisy optimization
Parameters
74. Stochastic Control by DPS
System
Controller
with
memory
commands
State
Cost
State
Random values
Random
process
So simple.
Does not work under this simple form, when you have
large scale action spaces and/or many constraints.
Still, nice representations can make it relevant.
Parameters
76. Direct Policy Search (DPS)
● Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
77. Direct Policy Search (DPS)
● Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
● Noisy Black-Box Optimization
78. Direct Policy Search (DPS)
● Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
● Noisy Black-Box Optimization
● Advantages: non-linear ok, forecasts included
79. Direct Policy Search (DPS)
● Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
● Noisy Black-Box Optimization
● Advantages: non-linear ok, forecasts included
● Issue: too slow
hundreds of parameters for even 20 decision variables
(depends on structure)
80. Direct Policy Search (DPS)
● Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
● Noisy Black-Box Optimization
● Advantages: non-linear ok, forecasts included
● Issue: too slow
hundreds of parameters for even 20 decision variables
(depends on structure)
● Idea: a special structure for DPS (inspired from SDP)
Strategy optimized given the real
forecasting module you have, given
arbitrarily precise simulations
(forecasts are inputs)
81. Direct Policy Search (DPS)
● Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
● Noisy Black-Box Optimization
● Advantages: non-linear ok, forecasts included
● Issue: too slow
hundreds of parameters for even 20 decision variables
(depends on structure)
● Idea: a special structure for DPS (inspired from SDP)
Strategy optimized given the real
forecasting module you have, given
arbitrarily precise simulations
(forecasts are inputs)
Great for fine-tuning:
1. Optimize by other approach (MPC ?)
2. Fine tune by DPS
82. Noisy optimization
Two very different frameworks:
● We have a generative model (or a huge
sample)
● The problem is computational
● Gradient-based optimization, or black box
● We have a finite sample (e.g. 8 samples)
● The problem is statistical
● Let us compute the optimum on average on the
sample
● Is it really a good solution ?
Interesting papers
in recent Nips / Icml
Also results
from the 50s and 60s
83. Noise-free optimization
● Hessian + Gradient ==> apply Newton
H ( x(n+1) – x(n) ) = - g(n)
i.e. minimum of second order Taylor approximation
● Only gradient: quasi-Newton
guess the Hessian, thanks to e.g. BFGS
● No gradient
Evolutionary algorithms / pattern search methods
Finite differences
84. Noise-free optimization
● Hessian + Gradient ==> apply Newton
H ( x(n+1) – x(n) ) = - g(n)
i.e. minimum of second order Taylor approximation
● Only gradient: quasi-Newton
==> guess the Hessian, thanks to e.g. BFGS
● No gradient
Evolutionary algorithms / pattern search methods
Finite differences
Log || x(n) - x*|| ~ -Cn
Very assumption
dependent
distance(n+1)=O(distance(n)2
)
distance(n+1)/distance(n)=o(1)
85. Noisy black-box optimization
= request f(x) and get e.g. f(x,random)
Finite differences with noise (3rd deriv. ≠0):
● Dupac 57: log distance ~ -2/3 log(n)
86. Noisy black-box optimization
= request f(x) and get e.g. f(x,random)
Finite differences with noise (3rd deriv. ≠0):
● Dupac 57: log distance ~ -2/3 log(n)
● Fabian 67: log distance ~ - log(n) with sophisticated
finite differences and assuming “many” derivatives
87. Noisy black-box optimization
= request f(x) and get e.g. f(x,random)
Finite differences with noise (3rd deriv. ≠0):
● Dupac 57: log distance ~ -2/3 log(n)
● Fabian 67: log distance ~ - log(n) with sophisticated
finite differences and assuming “many” derivatives
exist
● Spall 00: log distance ~ -2/3 log(n) with better
dependency in the dimension and simpler algorithm
88. Noisy black-box optimization
= request f(x) and get e.g. f(x,random)
Finite differences with noise (3rd deriv. ≠0):
● Dupac 57: log distance ~ -2/3 log(n)
● Fabian 67: log distance ~ - log(n) with sophisticated
finite differences and assuming “many” derivatives
● Spall 00: log distance ~ -2/3 log(n) with better
dependency in the dimension and simpler algorithm
● Recent works: evolutionary algorithms with
resamplings ==> -1/2 log(n)
89. Noisy black-box optimization
= request f(x) and get e.g. f(x,random)
Finite differences with noise (3rd deriv. ≠0):
● Dupac 57: log distance ~ -2/3 log(n)
● Fabian 67: log distance ~ - log(n) with sophisticated
finite differences and assuming “many” derivatives
● Spall 00: log distance ~ -2/3 log(n) with better
dependency in the dimension and simpler algorithm
● Recent works: evolutionary algorithms with
resamplings ==> -1/2 log(n)
● Shamir 2012: non-asymptotically, log distance ~
-1/2 log(n) (or -log(n) with quadratic functions)
90. Sample average approximation
Two very different frameworks:
● We have a generative model (or a huge sample)
● The problem is computational
● Gradient-based optimization
● We have a finite sample (e.g. 8 samples...)
● The problem is statistical
● Let us compute the optimum on average on the
sample
● Is it really a good solution ?
91. SAA: sample average
approximation
● I want x* = argmin E f(x)+noise(x)
● But I compute x = argmin g(x)
g(x) = f(x)+noise1(x) + f(x) + noise2(x) + f(x)
+noise3(x)+...+ f(x) + noiseN(x) (SAA)
● E ( noise(x) ) = 0 and noisei i.i.d
●
Then E x ≠ x*, because N is finite: bias
● Bias corr.: evaluate b=Ex-x*, propose x-b ?
92. b=Ex-x* depends on the problem:
how to evaluate it ?
● We want to know the difference between
●
the optimum on average over ∞ i.i.d cases
● the optimum on average over N i.i.d cases
● Efron: let's compute the same difference for
another probability distribution, uniform over the
sample:
●
the opt. on average over ∞ (=N distinct) i.i.d cases
● the opt. on average over N i.i.d cases (among N!)
93. Looks strange, isn't it ?
● Efron and others designed such tools
● “Bootstrap”: find a solution with what you have
● Expensive:
● Compute the optimum x on your sample
● Compute the expected optimum x' on average on
multiple “resamplings”
● Compute b=x-Ex'
● Return x+b = 2x-Ex'
● Many other resampling methods (jackknife,
variants of bootstrap...)
94. Looks strange, isn't it ?
● Efron and others designed such tools
● “Bootstrap”: find a solution with what you have
● Expensive:
● Compute the optimum x on your sample
● Compute the expected optimum x' on average on
multiple “resamplings”
● Compute b=x-Ex'
● Return x+b
● Many other resampling methods (jackknife,
variants of bootstrap...)
A beautiful example of case in which
sophisticated mathematics help
95. State of the art in discrete-time control, a few tools:
● Model Predictive Control:
For making a decision in a given state:
(i) do forecasts
(ii) replace random procs -> pessimistic forecasts
(iii) optimize as if deterministic problem
● Stochastic Dynamic Programming:
● ~Markov model
● Compute “cost to go” backwards
● Direct Policy Search:
● Parametric controller
● Optimized on simulations
● Problems for high-dimensional constrained action spaces
96. State of the art in discrete-time control, a few tools:
MPC SDP DPS
Random Markovian Ok
process
Long Heuristic Ok Ok
term
effects
Constrained Ok Ok
action
spaces
●
Optimal if Deterministic Markovian Good structure
So
convenient
Good for
tuning ?
97. DPS as a fine tuning upper layer
● Bengio '97:
● Handcraft a policy
● Smooth it, replace constants by parameters
● Optimize params by DPS
● Decock et al '13:
● construct a policy as in SDP
● but learn a parametric V by DPS (instead of back.
induction)
Basically, it never fails...
Find suboptimality, counterbalance
it by DPS...
98. Power systems optimization (1 slide)
Consider an electric system.
Decisions =
● Strategic decisions (a few time steps):
●building a nuclear power plant
●build a Spain-Marocco connection
●build a wind farm
● tactical decisions (many time steps):
●switching on hydroPP #7 at 6:00
●switching on thermal PP #4 at 7:15
●....
Based on
simulations
of the
tactical level
Depends on
the
strategic
level
99. Outline
1. Overview
2. Sequential decision making
3. Strategic decisions
4. Conclusions
I have nightmares with
this problem.
No idea how to
tacklle that.
100. Strategic decisions
● Typical situation
● A system, to be optimized by RL / SDP / etc
● Strategic decisions on top of it
● Non stochastic uncertainties
● Tools
● Bandits (including adversarial)
● Wald / Savage / Nash criteria
● Bilevel optimization
● Still quite an open problem
101. Strategic decisions
● Decisions (very simplified):
● 100% renewable + demand-side management
● Plenty of nuclear power
● 100% coal+gas
● Microgrids
● Concentrated solar power (storage with molten salt...)...
● Scenarios
● A Fukushima in Europe ?
● Little improvements of renewable ?
● No more international cooperation ?
● ...new gravity storage ? Flying wheels ? compressed air ?
H2 ? fusion ? Breakthrough in PV ? Putin ?...
102. Strategic decisions
R(d,s) = reward for decision d in scenario s
● On average:
d*=argmaxd
Es
R(d,s) ==> what if we have no probabilities ?
● Worst case: d*=argmaxd
mins
R(d,s)
A bit conservative ? As if Nature decides “against” us and
“after” we have chosen d.
● Regret: R'(d,s) = R(d,s)-maxd'
R(d',s)
& d*=argmind
R'(d,s)
● Nash: d*=argmaxd
mins
R(d,s) (d random; no
“after”) ==> stochastic strategies ? (==> fired)
103. Strategic decisions
● Criteria
● Best (deter.) choice for worst scenario (Wald)
● Best (deter.) choice in terms of regret (Savage)
● Best simultaneous choice (Nash)
● Combination: best simultaneous regret (Nash+Savage)
● Formally:
● Wald: argmax_c min_s Reward(c,s)
● Sav.: argmax_c min_s max_g ( Reward(c,s)-Reward(g,s) )
● Nash-Savage & Nash-Wald: allow stochastic decisions
==> strange fact: optimal policies are then stochastic...
A stochastic nuclear decision ?
104. Strategic decisions
● Assume you have 100 000 000 000 euros for
investing in power plants / networks. What do you
do ?
● Different points of view:
● Too many uncertainties! Minimum investments.
● So many uncertaintiies! Investments everywhere (will
solve unemployment :-) ).
● No idea :-)
● What do you think ? What would you do if your life
was depending on it ?
105. Conclusions
● dynamic optimization problems in
power systems: beautiful + crucial
● 2 levels “strategic + tactical”
● Strategic: >100 decision variables in a
noisy non-linear optimization problem
with non-stochastic uncertainties
● Tactical (dynamic):
– 10 000 constrained action variables
– 100 state variables
– hundreds of time steps
– non-linear effects
– non-Markovian random processes
106. Conclusions
● RL did not invade power systems because of
high dim. constrained action spaces
107. Conclusions
● RL did not invade power systems because of
high dim. constrained action spaces
● Dynamic optimization does not boil down to
MDP solving ==> Markov assumption!
108. Conclusions
● RL did not invade power systems because of
high dim. constrained action spaces
● Dynamic optimization does not boil down to
MDP solving ==> Markov assumption!
● DPS on top of other algorithms ? Tuning in front
of “realistic” simulations.
109. Conclusions
● RL did not invade power systems because of
high dim. constrained action spaces
● Dynamic optimization does not boil down to
MDP solving ==> Markov assumption!
● DPS on top of other algorithms ? Tuning in front
of “realistic” simulations with realistic RP.
● Long term investments:
● Difficult to make decisions (nobody trusts criteria for
non-stochastic uncertainties)
● But negative conclusions matter: “removing X
without adding Y does not work”:
110. Conclusions
Main strengths of the RL (machine learning)
communities:
● Really cares about nonlinear effects (model error)
● Really cares about overfitting (clean cross-validation)
==> should invade power systems
==> at least if high-dim action spaces are handled
( SDP / SDDP great for that )
Our proposal: DPS as a fine-tuning upper layer,
over MPC (which is sooo convenient!).
111. Bibliography
● Dynamic Programming and Suboptimal Control: A Survey from
ADP to MPC. Bertsekas, 2005. (MPC = deterministic forecasts)
● “Newave vs Odin”: why MPC survives in spite of theoretical
shortcomings
● Dallagi et Simovic (EDF R&D) : "Optimisation des actifs
hydrauliques d'EDF : besoins métiers, méthodes actuelles et
perspectives", PGMO (importance of precise simulations)
● Ernst: The Global Grid, 2013 & all his slides/studies on WWW
● Renewable energy forecasts ought to be probabilistic! Pinson,
2013 (wipfor talk)
● Training a neural network with a financial criterion rather than a
prediction criterion. Bengio, 1997
● Direct Model Predictive Control, Decock et al, 2014 (combining
DPS and MPC)
112. Summary :-)
● Noisy optimization = black box stochastic optimization
● Dynamic optimization (DO) = multistage optimization
● Reinforcement learning = black box DO
● Direct Policy Search = RL by parametric optimization
● Model Predictive Control = DPS with simplified model
● Receding horizon = neglect long-term + frequent reoptimize
● (Stoc.) (dual) dynamic prog. = DO backwards in time
● Bootstrap, bias correction <== tricky statistics for sample
average approximation
● Non stochastic uncertainties (Wald, Savage)
● Unit commitment/dispatch = DP for power systems
● UC by sort = unit commitment with marginal costs only
113. All in one slide
● Noisy optimization
● Direct Policy Search
● Model Predictive Control + receding horizon
● Reinforcement learning / Markov Decision Processes
● Stochastic (dual) dynamic programming
● Bootstrap, bias correction, sample averrage
approximation
● Non stochastic uncertainties (Wald, Savage)
● + power systems (stability of networks, capacity markets,
domino effect, HVDC, unit commitment, dispatch, UC by
sort, think of time (storage) & space (transmission) )
116. ● Representation of the controller
● decision(current state)=
argmin Cost(decision) + Bellman(next state)
● Linear programming (LP) if:
– For a given current state, next state = LP(decision)
– Cost(decision) = LP(decision)
● →100 000 decision variables per time step
SDP / SDDP
Stochastic (Dual) Dynamic Programming
117. ● Planning/control (tactical level)
● Pluriannual planning: evaluate marginal costs of hydroelectricity
● Taking into account stochasticity and uncertainties
● Moderate scale (Cities, Factories) (tactical level simpler)
● Master plan optimization
● Stochastic uncertainties
● High scale investment studies (e.g. Europe+North Africa)
● Long term (2030 - 2050)
● Huge (non-stochastic) uncertainties
● Investments: interconnections, storage, smart grids, power plants...
Our activities
118. Energy is expensive (or not ?)
Desertec = hundreds of billions of euros for
renewables in Africa.
Medgrid = transmission network for these
renewables.
==> worth taking time for making decisions