TAO - Inria Saclay-IDF

● Machine Learning & Optimization
● Tao-uctsig:

    ● Sequential decision making

    ● One permanent, full time + others part time

    ● Applications to energy management

    ● Strong collaboration with

       ● Taiwan

       ● Artelys (Ilab Metis; joint software)

       ● Others

                               O. Teytaud, Research Fellow,
                                  olivier.teytaud@inria.fr
                                 http://www.lri.fr/~teytaud/
Power systems, high scale

                                         Production


                                         Network
Feedback mecanism
   (smart grids)




                                           em and
                                          D
For choosing investments we want
          to simulate systems
●   Difficulties:
       ●   Demand varying in time, bounded prediction
       ●   Transportation introduces constraints
       ●   Renewable ==> variability ++
●   Problems:
       ●   Limited previsibility has an impact ==> anticipative high-level
           techniques underestimate the need for storage / smoothing
       ●   Markovian assumptions ==> wrong
       ●   A system which neglects “base ≠ peak” can not be used.


    ==> Model error >> optimization error
    ==> Machine Learning on top of Math. Programming
Math programming and machine
                learning

●   Math programming:
       ●   Nearly exact solutions
       ●   High-dimensional constrained action space
       ●   But small state space & not anytime

●   Reinforcement learning
       ●   Unstable
       ●   Small / simple action space
       ●   But high dimensional state space & anytime
Stochastic dyn. Programming
                                          Huge computation time

                                           Assumes Markovian
                                                Models.

                                         Neglects non-linearities.

●   Step 1: compute Bellman's function:



                              Can work with huge constrained
●   Step 2: make decisions:            action space
Direct Policy Search
●   Define a parametric function



       ●   Neural network
       ●   Handcrafted function

●   Non-linear optimization
       ●   The best θ is the one which performs best on simulations
                 ==> obtained by non-linear stochastic optimization
       ●   Non-linearities ok, arbitrary stochastic process, large state
           space ==> little model bias
       ●   No solution for huge constrained action spaces
Math prog & reinforcement learning
●   Here, we consider “math prog = heuristic”,
    because it's fast but with strong model bias
●   Proposals:
                                                             DPS-style:
       ●   MCTS (Monte-Carlo Tree Search) + heuristic    Little model bias,
                                                         arbitrary random
       ●   DPS (Direct Policy Search) + heuristic         process, large
                                                            state space
       ●   Example              Bellman-style;
                                    ok for
                              large constrained
                                action spaces     Non-linear ~ (θ, xt)
                                                   Linear ~ x(t+1)
Works in Tao
●   Noisy non-linear optimization
      ●   Fabian's algorithm
      ●   Anytime properties (for bilevel problems)
      ●   Evolutionary algorithms
●   Reinforcement learning
      ●   MCTS (Monte Carlo Tree Search) on top of heuristics
      ●   DPS (combined with MCTS or heuristics)
●   Links with Artelys:
      ●   Joint software
      ●   Experiments
           – Non-anticipativity
           – Non-linearities

3slides

  • 1.
    TAO - InriaSaclay-IDF ● Machine Learning & Optimization ● Tao-uctsig: ● Sequential decision making ● One permanent, full time + others part time ● Applications to energy management ● Strong collaboration with ● Taiwan ● Artelys (Ilab Metis; joint software) ● Others O. Teytaud, Research Fellow, olivier.teytaud@inria.fr http://www.lri.fr/~teytaud/
  • 2.
    Power systems, highscale Production Network Feedback mecanism (smart grids) em and D
  • 3.
    For choosing investmentswe want to simulate systems ● Difficulties: ● Demand varying in time, bounded prediction ● Transportation introduces constraints ● Renewable ==> variability ++ ● Problems: ● Limited previsibility has an impact ==> anticipative high-level techniques underestimate the need for storage / smoothing ● Markovian assumptions ==> wrong ● A system which neglects “base ≠ peak” can not be used. ==> Model error >> optimization error ==> Machine Learning on top of Math. Programming
  • 4.
    Math programming andmachine learning ● Math programming: ● Nearly exact solutions ● High-dimensional constrained action space ● But small state space & not anytime ● Reinforcement learning ● Unstable ● Small / simple action space ● But high dimensional state space & anytime
  • 5.
    Stochastic dyn. Programming Huge computation time Assumes Markovian Models. Neglects non-linearities. ● Step 1: compute Bellman's function: Can work with huge constrained ● Step 2: make decisions: action space
  • 6.
    Direct Policy Search ● Define a parametric function ● Neural network ● Handcrafted function ● Non-linear optimization ● The best θ is the one which performs best on simulations ==> obtained by non-linear stochastic optimization ● Non-linearities ok, arbitrary stochastic process, large state space ==> little model bias ● No solution for huge constrained action spaces
  • 7.
    Math prog &reinforcement learning ● Here, we consider “math prog = heuristic”, because it's fast but with strong model bias ● Proposals: DPS-style: ● MCTS (Monte-Carlo Tree Search) + heuristic Little model bias, arbitrary random ● DPS (Direct Policy Search) + heuristic process, large state space ● Example Bellman-style; ok for large constrained action spaces Non-linear ~ (θ, xt) Linear ~ x(t+1)
  • 8.
    Works in Tao ● Noisy non-linear optimization ● Fabian's algorithm ● Anytime properties (for bilevel problems) ● Evolutionary algorithms ● Reinforcement learning ● MCTS (Monte Carlo Tree Search) on top of heuristics ● DPS (combined with MCTS or heuristics) ● Links with Artelys: ● Joint software ● Experiments – Non-anticipativity – Non-linearities