1. TAO - Inria Saclay-IDF
● Machine Learning & Optimization
● Tao-uctsig:
● Sequential decision making
● One permanent, full time + others part time
● Applications to energy management
● Strong collaboration with
● Taiwan
● Artelys (Ilab Metis; joint software)
● Others
O. Teytaud, Research Fellow,
olivier.teytaud@inria.fr
http://www.lri.fr/~teytaud/
2. Power systems, high scale
Production
Network
Feedback mecanism
(smart grids)
em and
D
3. For choosing investments we want
to simulate systems
● Difficulties:
● Demand varying in time, bounded prediction
● Transportation introduces constraints
● Renewable ==> variability ++
● Problems:
● Limited previsibility has an impact ==> anticipative high-level
techniques underestimate the need for storage / smoothing
● Markovian assumptions ==> wrong
● A system which neglects “base ≠ peak” can not be used.
==> Model error >> optimization error
==> Machine Learning on top of Math. Programming
4. Math programming and machine
learning
● Math programming:
● Nearly exact solutions
● High-dimensional constrained action space
● But small state space & not anytime
● Reinforcement learning
● Unstable
● Small / simple action space
● But high dimensional state space & anytime
5. Stochastic dyn. Programming
Huge computation time
Assumes Markovian
Models.
Neglects non-linearities.
● Step 1: compute Bellman's function:
Can work with huge constrained
● Step 2: make decisions: action space
6. Direct Policy Search
● Define a parametric function
● Neural network
● Handcrafted function
● Non-linear optimization
● The best θ is the one which performs best on simulations
==> obtained by non-linear stochastic optimization
● Non-linearities ok, arbitrary stochastic process, large state
space ==> little model bias
● No solution for huge constrained action spaces
7. Math prog & reinforcement learning
● Here, we consider “math prog = heuristic”,
because it's fast but with strong model bias
● Proposals:
DPS-style:
● MCTS (Monte-Carlo Tree Search) + heuristic Little model bias,
arbitrary random
● DPS (Direct Policy Search) + heuristic process, large
state space
● Example Bellman-style;
ok for
large constrained
action spaces Non-linear ~ (θ, xt)
Linear ~ x(t+1)
8. Works in Tao
● Noisy non-linear optimization
● Fabian's algorithm
● Anytime properties (for bilevel problems)
● Evolutionary algorithms
● Reinforcement learning
● MCTS (Monte Carlo Tree Search) on top of heuristics
● DPS (combined with MCTS or heuristics)
● Links with Artelys:
● Joint software
● Experiments
– Non-anticipativity
– Non-linearities