3slides

TAO - Inria Saclay-IDF

● Machine Learning & Optimization
● Tao-uctsig:

● Sequential decision making

● One permanent, full time + others part time

● Applications to energy management

● Strong collaboration with

● Taiwan

● Artelys (Ilab Metis; joint software)

● Others

O. Teytaud, Research Fellow,
olivier.teytaud@inria.fr
http://www.lri.fr/~teytaud/

Power systems, high scale

Production

Network
Feedback mecanism
(smart grids)

em and
D

For choosing investments we want
to simulate systems
● Difficulties:
● Demand varying in time, bounded prediction
● Transportation introduces constraints
● Renewable ==> variability ++
● Problems:
● Limited previsibility has an impact ==> anticipative high-level
techniques underestimate the need for storage / smoothing
● Markovian assumptions ==> wrong
● A system which neglects “base ≠ peak” can not be used.

==> Model error >> optimization error
==> Machine Learning on top of Math. Programming

Math programming and machine
learning

● Math programming:
● Nearly exact solutions
● High-dimensional constrained action space
● But small state space & not anytime

● Reinforcement learning
● Unstable
● Small / simple action space
● But high dimensional state space & anytime

Stochastic dyn. Programming
Huge computation time

Assumes Markovian
Models.

Neglects non-linearities.

● Step 1: compute Bellman's function:

Can work with huge constrained
● Step 2: make decisions: action space

Direct Policy Search
● Define a parametric function

● Neural network
● Handcrafted function

● Non-linear optimization
● The best θ is the one which performs best on simulations
==> obtained by non-linear stochastic optimization
● Non-linearities ok, arbitrary stochastic process, large state
space ==> little model bias
● No solution for huge constrained action spaces

Math prog & reinforcement learning
● Here, we consider “math prog = heuristic”,
because it's fast but with strong model bias
● Proposals:
DPS-style:
● MCTS (Monte-Carlo Tree Search) + heuristic Little model bias,
arbitrary random
● DPS (Direct Policy Search) + heuristic process, large
state space
● Example Bellman-style;
ok for
large constrained
action spaces Non-linear ~ (θ, xt)
Linear ~ x(t+1)

Works in Tao
● Noisy non-linear optimization
● Fabian's algorithm
● Anytime properties (for bilevel problems)
● Evolutionary algorithms
● Reinforcement learning
● MCTS (Monte Carlo Tree Search) on top of heuristics
● DPS (combined with MCTS or heuristics)
● Links with Artelys:
● Joint software
● Experiments
– Non-anticipativity
– Non-linearities

3slides

More Related Content

Viewers also liked

Similar to 3slides

3slides