The document discusses the computational complexity of partially observable games. Some key points:
1. Two-player unobservable games are EXPSPACE-complete, as strategies are just sequences of actions with no observability.
2. Encoding a Turing machine as a game shows the hardness of the unobservable case. The tape configurations can be represented in a game state of size logarithmic in the tape size.
3. Two-player partially observable games or one-player partially observable games against randomness are 2EXPTIME-complete, even more complex than the unobservable case.
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Olivier Teytaud
Here are a few suggestions on how to improve the Zermelo algorithm when it is too slow:
1. Add a depth limit. Stop recursion when a maximum search depth is reached. Return a heuristic evaluation instead of continuing search.
2. Use alpha-beta pruning. Track the best value found (alpha) and prune branches that cannot improve on it.
3. Iterative deepening. Run successive searches with increasing depth limits to get progressively better approximations.
4. Move ordering. Evaluate better moves earlier in the search tree. This prunes bad moves earlier.
5. Transposition tables. Store previously computed move evaluations to avoid re-expanding the same position.
6. Parallelize the
- The document discusses energy management in France and potential areas of research collaboration between France and Taiwan.
- Key areas discussed include optimizing long-term investment policies for electricity generation using tools like reinforcement learning and stochastic programming to account for uncertainties.
- Specific questions mentioned are around optimal connections between Europe and Africa, impacts of subsidizing solar power or switching off nuclear plants, and benefits of demand reduction contracts.
- The researcher proposes combining methods like direct policy search and Monte Carlo tree search to better optimize long-term planning while accounting for short-term effects. Plans are discussed to test new ideas, share data and codes, and potentially organize joint work between the two regions.
This document provides an overview of distributed decision making in partially observable dynamic games and multiobjective policy optimization. It discusses applying these techniques to optimization problems in games like chess and Go, as well as industrial applications like managing groups of power plants involving renewable energy, nuclear power, coal, hydroelectric power, and interactions with electricity consumers and networks. The goal is to optimize strategies using parallel computing and test these approaches on games and energy systems.
Hydroelectricity uses water to produce electricity and has advantages for electricity storage. It provides daily, yearly, and negative electricity production by pumping water to higher reservoirs. However, expanding hydroelectricity is challenging due to its large infrastructure requirements and local environmental impacts. New technologies may improve energy storage capabilities and grid stability in the future, but developing large-scale annual storage remains difficult given constraints. Hydroelectricity will continue playing an important role in energy systems alongside other renewable technologies and efficiency strategies.
Choosing between several options in uncertain environmentsOlivier Teytaud
The document discusses bandit problems with strategic choices and small budgets. It defines bandit problems, strategic bandit problems, and compares the two. It presents algorithms for exploring options and making recommendations in both one-player and two-player settings. Experimental results on a Go positioning problem and an online card game show that TEXP3 outperforms other algorithms in two-player settings. The document concludes with discussions on extensions to structured bandits and using strategic bandits to model investment choices.
Tools for Discrete Time Control; Application to Power SystemsOlivier Teytaud
3 main algorithms from the state of the art:
- Model Predictive Control
- Stochastic Dynamic Programming
- Direct Policy Search
==> and our proposal, a modified Direct Policy Search
termed Direct Value Search
This document discusses blind Go, a variant of the game where players do not look at the board and must memorize positions. It explores strategies for blind Go, such as playing unusual moves that are harder for the opponent to remember. Experiments found that providing an empty board as a visual aid helped players. When playing against professionals in blind 9x9 Go, the computer won 2 of 3 games. In a 19x19 game against a top human player, the computer won through an unexpected, unusual move where the human made a rare mistake due to not seeing the board. Further research is needed, but playing unconventional moves seems beneficial in blind Go.
The document discusses the computational complexity of partially observable games. Some key points:
1. Two-player unobservable games are EXPSPACE-complete, as strategies are just sequences of actions with no observability.
2. Encoding a Turing machine as a game shows the hardness of the unobservable case. The tape configurations can be represented in a game state of size logarithmic in the tape size.
3. Two-player partially observable games or one-player partially observable games against randomness are 2EXPTIME-complete, even more complex than the unobservable case.
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Olivier Teytaud
Here are a few suggestions on how to improve the Zermelo algorithm when it is too slow:
1. Add a depth limit. Stop recursion when a maximum search depth is reached. Return a heuristic evaluation instead of continuing search.
2. Use alpha-beta pruning. Track the best value found (alpha) and prune branches that cannot improve on it.
3. Iterative deepening. Run successive searches with increasing depth limits to get progressively better approximations.
4. Move ordering. Evaluate better moves earlier in the search tree. This prunes bad moves earlier.
5. Transposition tables. Store previously computed move evaluations to avoid re-expanding the same position.
6. Parallelize the
- The document discusses energy management in France and potential areas of research collaboration between France and Taiwan.
- Key areas discussed include optimizing long-term investment policies for electricity generation using tools like reinforcement learning and stochastic programming to account for uncertainties.
- Specific questions mentioned are around optimal connections between Europe and Africa, impacts of subsidizing solar power or switching off nuclear plants, and benefits of demand reduction contracts.
- The researcher proposes combining methods like direct policy search and Monte Carlo tree search to better optimize long-term planning while accounting for short-term effects. Plans are discussed to test new ideas, share data and codes, and potentially organize joint work between the two regions.
This document provides an overview of distributed decision making in partially observable dynamic games and multiobjective policy optimization. It discusses applying these techniques to optimization problems in games like chess and Go, as well as industrial applications like managing groups of power plants involving renewable energy, nuclear power, coal, hydroelectric power, and interactions with electricity consumers and networks. The goal is to optimize strategies using parallel computing and test these approaches on games and energy systems.
Hydroelectricity uses water to produce electricity and has advantages for electricity storage. It provides daily, yearly, and negative electricity production by pumping water to higher reservoirs. However, expanding hydroelectricity is challenging due to its large infrastructure requirements and local environmental impacts. New technologies may improve energy storage capabilities and grid stability in the future, but developing large-scale annual storage remains difficult given constraints. Hydroelectricity will continue playing an important role in energy systems alongside other renewable technologies and efficiency strategies.
Choosing between several options in uncertain environmentsOlivier Teytaud
The document discusses bandit problems with strategic choices and small budgets. It defines bandit problems, strategic bandit problems, and compares the two. It presents algorithms for exploring options and making recommendations in both one-player and two-player settings. Experimental results on a Go positioning problem and an online card game show that TEXP3 outperforms other algorithms in two-player settings. The document concludes with discussions on extensions to structured bandits and using strategic bandits to model investment choices.
Tools for Discrete Time Control; Application to Power SystemsOlivier Teytaud
3 main algorithms from the state of the art:
- Model Predictive Control
- Stochastic Dynamic Programming
- Direct Policy Search
==> and our proposal, a modified Direct Policy Search
termed Direct Value Search
This document discusses blind Go, a variant of the game where players do not look at the board and must memorize positions. It explores strategies for blind Go, such as playing unusual moves that are harder for the opponent to remember. Experiments found that providing an empty board as a visual aid helped players. When playing against professionals in blind 9x9 Go, the computer won 2 of 3 games. In a 19x19 game against a top human player, the computer won through an unexpected, unusual move where the human made a rare mistake due to not seeing the board. Further research is needed, but playing unconventional moves seems beneficial in blind Go.
This document discusses how to save money by using open source software instead of proprietary software like Microsoft Office. It recommends downloading and using OpenOffice or LibreOffice instead, as they are free alternatives that work very well. It also recommends installing a free open source operating system like Linux, as this can save a lot of money on software costs over time. Open source is discussed as an economic model where the marginal cost of sharing and distributing code is very low, enabling new business models to earn money through services, support or customization rather than just software licenses. A variety of important open source software projects are listed across different domains like operating systems, office suites, web servers and more.
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Olivier Teytaud
Ilab METIS is a collaboration between TAO, a machine learning and optimization team within INRIA, and Artelys, an SME focused on optimization. They work on optimizing energy policies through simulations of power systems while taking into account uncertainties and stochastic variables. Their methodologies use a hybrid of reinforcement learning, mathematical programming, and direct policy search to optimize investments and operational decisions for power grids over multiple timescales while handling constraints. They have applied their approaches to problems involving interconnection planning, demand balancing, and renewable integration on scales from cities to entire continents.
Theory of games, with a short reminder of computational complexity and an independent appendix on human complexity and the game of Go
@article{david:hal-00710073,
hal_id = {hal-00710073},
url = {http://hal.inria.fr/hal-00710073},
title = {{The Frontier of Decidability in Partially Observable Recursive Games}},
author = {David, Auger and Teytaud, Olivier},
abstract = {{The classical decision problem associated with a game is whether a given player has a winning strategy, i.e. some strategy that leads almost surely to a victory, regardless of the other players' strategies. While this problem is relevant for deterministic fully observable games, for a partially observable game the requirement of winning with probability 1 is too strong. In fact, as shown in this paper, a game might be decidable for the simple criterion of almost sure victory, whereas optimal play (even in an approximate sense) is not computable. We therefore propose another criterion, the decidability of which is equivalent to the computability of approximately optimal play. Then, we show that (i) this criterion is undecidable in the general case, even with deterministic games (no random part in the game), (ii) that it is in the jump 0', and that, even in the stochastic case, (iii) it becomes decidable if we add the requirement that the game halts almost surely whatever maybe the strategies of the players.}},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{Special Issue on "Frontier between Decidability and Undecidability"}},
publisher = {World Scinet},
journal = {International Journal on Foundations of Computer Science (IJFCS)},
volume = {Accepted},
note = {revised 2011, accepted 2011, in press },
audience = {internationale },
year = {2012},
}
Noisy Optimization combining Bandits and Evolutionary AlgorithmsOlivier Teytaud
@inproceedings{rolet:inria-00437140,
hal_id = {inria-00437140},
url = {http://hal.inria.fr/inria-00437140},
title = {{Bandit-based Estimation of Distribution Algorithms for Noisy Optimization: Rigorous Runtime Analysis}},
author = {Rolet, Philippe and Teytaud, Olivier},
abstract = {{We show complexity bounds for noisy optimization, in frame- works in which noise is stronger than in previously published papers[19]. We also propose an algorithm based on bandits (variants of [16]) that reaches the bound within logarithmic factors. We emphasize the differ- ences with empirical derived published algorithms.}},
keywords = {noisy optimization evolutionary algorithms bandits},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Futurs , TAO - INRIA Saclay - Ile de France},
booktitle = {{Lion4}},
address = {Venice, Italie},
audience = {internationale },
year = {2010},
pdf = {http://hal.inria.fr/inria-00437140/PDF/lion4long.pdf},
}
@inproceedings{coulom:hal-00517157,
hal_id = {hal-00517157},
url = {http://hal.archives-ouvertes.fr/hal-00517157},
title = {{Handling Expensive Optimization with Large Noise}},
author = {Coulom, R{\'e}mi and Rolet, Philippe and Sokolovska, Nataliya and Teytaud, Olivier},
abstract = {{This paper exhibits lower and upper bounds on runtimes for expensive noisy optimization problems. Runtimes are expressed in terms of number of fitness evaluations. Fitnesses considered are monotonic transformations of the {\em sphere} function. The analysis focuses on the common case of fitness functions quadratic in the distance to the optimum in the neighborhood of this optimum---it is nonetheless also valid for any monotonic polynomial of degree p>2. Upper bounds are derived via a bandit-based estimation of distribution algorithm that relies on Bernstein races called R-EDA. It is known that the algorithm is consistent even in non-differentiable cases. Here we show that: (i) if the variance of the noise decreases to 0 around the optimum, it can perform optimally for quadratic transformations of the norm to the optimum, (ii) otherwise, it provides a slower convergence rate than the one exhibited empirically by an algorithm called Quadratic Logistic Regression based on surrogate models---although QLR requires a probabilistic prior on the fitness class.}},
keywords = {Noisy optimization, Bernstein races},
language = {Anglais},
affiliation = {SEQUEL - INRIA Lille - Nord Europe , TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI},
booktitle = {{Foundations of Genetic Algorithms (FOGA 2011)}},
pages = {TBA},
address = {Autriche},
editor = {ACM },
audience = {internationale },
year = {2011},
month = Jan,
pdf = {http://hal.archives-ouvertes.fr/hal-00517157/PDF/foga10noise.pdf},
}
Artificial Intelligence and Optimization with ParallelismOlivier Teytaud
This document discusses parallelism in artificial intelligence and evolutionary computation. It explains that comparison-based optimization algorithms, which include many evolutionary algorithms, can be naturally parallelized by speculatively running multiple branches in parallel with a branching factor of 3 or more. This allows theoretical logarithmic speedups to be achieved in practice through simple parallelization tricks.
A simple tutorial on Monte-Carlo Tree Search
Contains a description of dynamic programming and alpha-beta search, then MCTS. Special cases for simultaneous actions are discussed.
I should add comments so that it can be used without preliminary knowledge of MCTS, if there is at least one request for doing so I'll do it.
@article{gelly:hal-00695370,
hal_id = {hal-00695370},
url = {http://hal.inria.fr/hal-00695370},
title = {{The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions}},
author = {Gelly, Sylvain and Kocsis, Levente and Schoenauer, Marc and Sebag, Mich{\`e}le and Silver, David and Szepesvari, Csaba and Teytaud, Olivier},
abstract = {{The ancient oriental game of Go has long been considered a grand challenge for artificial intelligence. For decades, com- puter Go has defied the classical methods in game tree search that worked so successfully for chess and checkers. How- ever, recent play in computer Go has been transformed by a new paradigm for tree search based on Monte-Carlo meth- ods. Programs based on Monte-Carlo tree search now play at human-master levels and are beginning to challenge top professional players. In this paper we describe the leading algorithms for Monte-Carlo tree search and explain how they have advanced the state of the art in computer Go.}},
language = {Anglais},
affiliation = {TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI , LPDS , Microsoft Research - Inria Joint Centre - MSR - INRIA , University of Alberta, Canada , Department of Computing Science},
publisher = {ACM},
pages = {106-113},
journal = {Communication of the ACM},
volume = {55},
number = {3 },
audience = {internationale },
year = {2012},
pdf = {http://hal.inria.fr/hal-00695370/PDF/CACM-MCTS.pdf},
}
Don't believe what is written in these slides.
These statements are just provocative statements, most of them found on internet, here for discussion and for brain storming.
- The document discusses games with simultaneous actions and hidden information. It presents games as directed graphs with actions, players, observations, rewards, and loops.
- Games with simultaneous actions and short-term hidden information can be represented as games with hidden information by removing intermediate turns.
- Questions about the existence of a sure-win strategy for one player (the "UD" question) are only relevant for games with full observability, not matrix games.
Dynamic Optimization without Markov Assumptions: application to power systemsOlivier Teytaud
Ilab METIS is a collaboration between TAO, a machine learning and optimization team at INRIA, and Artelys, an SME focused on optimization. They work on optimizing energy policies through modeling power systems and simulating operational and investment decisions. Their methodologies hybridize reinforcement learning, mathematical programming, and direct policy search to optimize complex, constrained problems with uncertainties while minimizing model error. They have applied these techniques to problems involving European-scale power grids with stochastic renewables.
- Ilab METIS is a collaboration between Inria-Tao, a research team focused on optimization and machine learning problems, and Artelys, an SME focused on power systems modeling.
- They develop black-box planning tools for power systems that aim to minimize model error by using direct policy search techniques on high-fidelity simulations.
- These tools are applied to problems like optimizing investments in new power plants, transmission lines, and other infrastructure for power grids under uncertainty.
Ilab Metis works on optimizing energy policies through power system simulation and modeling. It uses principles of classical unit commitment modeling, but also accounts for reserves, recourse actions, and network constraints. While this basic model can determine short-term operations, it does not adequately address uncertainties like variable hydro inflows. More advanced techniques model the problem as a multi-stage decision process or reinforcement learning problem to compute optimal policies over long time horizons under uncertainty. Future work may integrate real-time control, game-theoretic approaches, or bilevel optimization to better represent the complex, dynamic nature of modern power systems.
- Stochastic dynamic programming (SDP) and stochastic dual dynamic programming (SDDP) are algorithms for solving sequential decision making problems under uncertainty.
- They represent the value function and controller as piecewise linear functions that can be encoded in linear programming formulations. This allows solving problems with up to around 100,000 decision variables per time step.
- However, solving the full problem using SDP/SDDP can be computationally expensive due to the "curse of dimensionality" as the number of states increases. The methods also require linear programming approximations and convex value functions.
Online Machine Learning: introduction and examplesFelipe
In this talk I introduce the topic of Online Machine Learning, which deals with techniques for doing machine learning in an online setting, i.e. where you train your model a few examples at a time, rather than using the full dataset (off-line learning).
This document provides an introduction to an optimization course for data science. It discusses how optimization methods are important for data science applications like predictive analytics and prescriptive analytics. It provides an example of using a polynomial interpolation model and cross-validation to estimate daily energy production from an energy community. The goal is to optimally size a battery for the energy community based on energy predictions and price signals. The course will cover algorithms for solving optimization problems that arise in fitting machine learning models and other data science applications.
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...Srinath Perera
Large scale data processing analyses and makes sense of large amounts of data. Although the field itself is not new, it is finding many usecases under the theme "Bigdata" where Google itself, IBM Watson, and Google's Driverless car are some of success stories. Spanning many fields, Large scale data processing brings together technologies like Distributed Systems, Machine Learning, Statistics, and Internet of Things together. It is a multi-billion-dollar industry including use cases like targeted advertising, fraud detection, product recommendations, and market surveys. With new technologies like Internet of Things (IoT), these use cases are expanding to scenarios like Smart Cities, Smart health, and Smart Agriculture. Some usecases like Urban Planning can be slow, which is done in batch mode, while others like stock markets need results within Milliseconds, which are done in streaming fashion. There are different technologies for each case: MapReduce for batch processing and Complex Event Processing and Stream Processing for real-time usecases. Furthermore, the type of analysis range from basic statistics like mean to complicated prediction models based on machine Learning. In this talk, we will discuss data processing landscape: concepts, usecases, technologies and open questions while drawing examples from real world scenarios.
http://icter.org/conference/invited_speeches
Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchOlivier Teytaud
The document discusses using simulations and algorithms to optimize power grids over different timescales:
1. Short term dispatching in real-time using human control
2. Combinatorial optimization over days/weeks
3. Stochastic hydroelectric optimization over years
4. Expensive multi-objective optimization of investment strategies over 50 years
It proposes using simulation-based optimization techniques like Upper Confidence Trees and Direct Policy Search to analyze simulations while allowing domain expertise to be incorporated through approximate policies. The goal is optimizing investments in power grids in Europe and North Africa over the next 50 years under different scenarios.
The document summarizes key points from presentations at the PAISS Prairie AI summer school in July 2018. It discusses several machine learning techniques:
1. Cordelia Schmid presented on action recognition from optical flow data and the importance of warping for optical flow estimation.
2. Julien Mairal discussed incremental gradient descent methods for large-scale optimization and machine learning.
3. Martial Hebert covered robotics applications for vision and planning, including techniques for failure prediction, reducing supervision across tasks, and avoiding early commitment.
This document provides an overview and introduction to the concepts taught in a data structures and algorithms course. It discusses the goals of reinforcing that every data structure has costs and benefits, learning commonly used data structures, and understanding how to analyze the efficiency of algorithms. Key topics covered include abstract data types, common data structures, algorithm analysis techniques like best/worst/average cases and asymptotic notation, and examples of analyzing the time complexity of various algorithms. The document emphasizes that problems can have multiple potential algorithms and that problems should be carefully defined in terms of inputs, outputs, and resource constraints.
The document discusses tools for artificial intelligence and their applications. It describes Olivier Teytaud's work at Tao, a research group in Paris focused on reservoir computing, optimal decision making under uncertainty, optimization, and machine learning. It then provides examples of applications for these tools in electricity generation, urban rivals, pokemons, minesweeper, and solving unsolved situations in the game of Go. Olivier suggests that breakthroughs in games can help open doors to applying these algorithms to more important real-world problems by building trust in the approaches.
This lecture provides an introduction to recurrent neural networks, which include a layer whose hidden state is aware of its values in a previous time-step.
These slides were used in the Master in Computer Vision Barcelona 2019/2020, in the Module 6 dedicated to Video Analysis.
http://pagines.uab.cat/mcv/
This document discusses how to save money by using open source software instead of proprietary software like Microsoft Office. It recommends downloading and using OpenOffice or LibreOffice instead, as they are free alternatives that work very well. It also recommends installing a free open source operating system like Linux, as this can save a lot of money on software costs over time. Open source is discussed as an economic model where the marginal cost of sharing and distributing code is very low, enabling new business models to earn money through services, support or customization rather than just software licenses. A variety of important open source software projects are listed across different domains like operating systems, office suites, web servers and more.
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Olivier Teytaud
Ilab METIS is a collaboration between TAO, a machine learning and optimization team within INRIA, and Artelys, an SME focused on optimization. They work on optimizing energy policies through simulations of power systems while taking into account uncertainties and stochastic variables. Their methodologies use a hybrid of reinforcement learning, mathematical programming, and direct policy search to optimize investments and operational decisions for power grids over multiple timescales while handling constraints. They have applied their approaches to problems involving interconnection planning, demand balancing, and renewable integration on scales from cities to entire continents.
Theory of games, with a short reminder of computational complexity and an independent appendix on human complexity and the game of Go
@article{david:hal-00710073,
hal_id = {hal-00710073},
url = {http://hal.inria.fr/hal-00710073},
title = {{The Frontier of Decidability in Partially Observable Recursive Games}},
author = {David, Auger and Teytaud, Olivier},
abstract = {{The classical decision problem associated with a game is whether a given player has a winning strategy, i.e. some strategy that leads almost surely to a victory, regardless of the other players' strategies. While this problem is relevant for deterministic fully observable games, for a partially observable game the requirement of winning with probability 1 is too strong. In fact, as shown in this paper, a game might be decidable for the simple criterion of almost sure victory, whereas optimal play (even in an approximate sense) is not computable. We therefore propose another criterion, the decidability of which is equivalent to the computability of approximately optimal play. Then, we show that (i) this criterion is undecidable in the general case, even with deterministic games (no random part in the game), (ii) that it is in the jump 0', and that, even in the stochastic case, (iii) it becomes decidable if we add the requirement that the game halts almost surely whatever maybe the strategies of the players.}},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{Special Issue on "Frontier between Decidability and Undecidability"}},
publisher = {World Scinet},
journal = {International Journal on Foundations of Computer Science (IJFCS)},
volume = {Accepted},
note = {revised 2011, accepted 2011, in press },
audience = {internationale },
year = {2012},
}
Noisy Optimization combining Bandits and Evolutionary AlgorithmsOlivier Teytaud
@inproceedings{rolet:inria-00437140,
hal_id = {inria-00437140},
url = {http://hal.inria.fr/inria-00437140},
title = {{Bandit-based Estimation of Distribution Algorithms for Noisy Optimization: Rigorous Runtime Analysis}},
author = {Rolet, Philippe and Teytaud, Olivier},
abstract = {{We show complexity bounds for noisy optimization, in frame- works in which noise is stronger than in previously published papers[19]. We also propose an algorithm based on bandits (variants of [16]) that reaches the bound within logarithmic factors. We emphasize the differ- ences with empirical derived published algorithms.}},
keywords = {noisy optimization evolutionary algorithms bandits},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Futurs , TAO - INRIA Saclay - Ile de France},
booktitle = {{Lion4}},
address = {Venice, Italie},
audience = {internationale },
year = {2010},
pdf = {http://hal.inria.fr/inria-00437140/PDF/lion4long.pdf},
}
@inproceedings{coulom:hal-00517157,
hal_id = {hal-00517157},
url = {http://hal.archives-ouvertes.fr/hal-00517157},
title = {{Handling Expensive Optimization with Large Noise}},
author = {Coulom, R{\'e}mi and Rolet, Philippe and Sokolovska, Nataliya and Teytaud, Olivier},
abstract = {{This paper exhibits lower and upper bounds on runtimes for expensive noisy optimization problems. Runtimes are expressed in terms of number of fitness evaluations. Fitnesses considered are monotonic transformations of the {\em sphere} function. The analysis focuses on the common case of fitness functions quadratic in the distance to the optimum in the neighborhood of this optimum---it is nonetheless also valid for any monotonic polynomial of degree p>2. Upper bounds are derived via a bandit-based estimation of distribution algorithm that relies on Bernstein races called R-EDA. It is known that the algorithm is consistent even in non-differentiable cases. Here we show that: (i) if the variance of the noise decreases to 0 around the optimum, it can perform optimally for quadratic transformations of the norm to the optimum, (ii) otherwise, it provides a slower convergence rate than the one exhibited empirically by an algorithm called Quadratic Logistic Regression based on surrogate models---although QLR requires a probabilistic prior on the fitness class.}},
keywords = {Noisy optimization, Bernstein races},
language = {Anglais},
affiliation = {SEQUEL - INRIA Lille - Nord Europe , TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI},
booktitle = {{Foundations of Genetic Algorithms (FOGA 2011)}},
pages = {TBA},
address = {Autriche},
editor = {ACM },
audience = {internationale },
year = {2011},
month = Jan,
pdf = {http://hal.archives-ouvertes.fr/hal-00517157/PDF/foga10noise.pdf},
}
Artificial Intelligence and Optimization with ParallelismOlivier Teytaud
This document discusses parallelism in artificial intelligence and evolutionary computation. It explains that comparison-based optimization algorithms, which include many evolutionary algorithms, can be naturally parallelized by speculatively running multiple branches in parallel with a branching factor of 3 or more. This allows theoretical logarithmic speedups to be achieved in practice through simple parallelization tricks.
A simple tutorial on Monte-Carlo Tree Search
Contains a description of dynamic programming and alpha-beta search, then MCTS. Special cases for simultaneous actions are discussed.
I should add comments so that it can be used without preliminary knowledge of MCTS, if there is at least one request for doing so I'll do it.
@article{gelly:hal-00695370,
hal_id = {hal-00695370},
url = {http://hal.inria.fr/hal-00695370},
title = {{The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions}},
author = {Gelly, Sylvain and Kocsis, Levente and Schoenauer, Marc and Sebag, Mich{\`e}le and Silver, David and Szepesvari, Csaba and Teytaud, Olivier},
abstract = {{The ancient oriental game of Go has long been considered a grand challenge for artificial intelligence. For decades, com- puter Go has defied the classical methods in game tree search that worked so successfully for chess and checkers. How- ever, recent play in computer Go has been transformed by a new paradigm for tree search based on Monte-Carlo meth- ods. Programs based on Monte-Carlo tree search now play at human-master levels and are beginning to challenge top professional players. In this paper we describe the leading algorithms for Monte-Carlo tree search and explain how they have advanced the state of the art in computer Go.}},
language = {Anglais},
affiliation = {TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI , LPDS , Microsoft Research - Inria Joint Centre - MSR - INRIA , University of Alberta, Canada , Department of Computing Science},
publisher = {ACM},
pages = {106-113},
journal = {Communication of the ACM},
volume = {55},
number = {3 },
audience = {internationale },
year = {2012},
pdf = {http://hal.inria.fr/hal-00695370/PDF/CACM-MCTS.pdf},
}
Don't believe what is written in these slides.
These statements are just provocative statements, most of them found on internet, here for discussion and for brain storming.
- The document discusses games with simultaneous actions and hidden information. It presents games as directed graphs with actions, players, observations, rewards, and loops.
- Games with simultaneous actions and short-term hidden information can be represented as games with hidden information by removing intermediate turns.
- Questions about the existence of a sure-win strategy for one player (the "UD" question) are only relevant for games with full observability, not matrix games.
Dynamic Optimization without Markov Assumptions: application to power systemsOlivier Teytaud
Ilab METIS is a collaboration between TAO, a machine learning and optimization team at INRIA, and Artelys, an SME focused on optimization. They work on optimizing energy policies through modeling power systems and simulating operational and investment decisions. Their methodologies hybridize reinforcement learning, mathematical programming, and direct policy search to optimize complex, constrained problems with uncertainties while minimizing model error. They have applied these techniques to problems involving European-scale power grids with stochastic renewables.
- Ilab METIS is a collaboration between Inria-Tao, a research team focused on optimization and machine learning problems, and Artelys, an SME focused on power systems modeling.
- They develop black-box planning tools for power systems that aim to minimize model error by using direct policy search techniques on high-fidelity simulations.
- These tools are applied to problems like optimizing investments in new power plants, transmission lines, and other infrastructure for power grids under uncertainty.
Ilab Metis works on optimizing energy policies through power system simulation and modeling. It uses principles of classical unit commitment modeling, but also accounts for reserves, recourse actions, and network constraints. While this basic model can determine short-term operations, it does not adequately address uncertainties like variable hydro inflows. More advanced techniques model the problem as a multi-stage decision process or reinforcement learning problem to compute optimal policies over long time horizons under uncertainty. Future work may integrate real-time control, game-theoretic approaches, or bilevel optimization to better represent the complex, dynamic nature of modern power systems.
- Stochastic dynamic programming (SDP) and stochastic dual dynamic programming (SDDP) are algorithms for solving sequential decision making problems under uncertainty.
- They represent the value function and controller as piecewise linear functions that can be encoded in linear programming formulations. This allows solving problems with up to around 100,000 decision variables per time step.
- However, solving the full problem using SDP/SDDP can be computationally expensive due to the "curse of dimensionality" as the number of states increases. The methods also require linear programming approximations and convex value functions.
Online Machine Learning: introduction and examplesFelipe
In this talk I introduce the topic of Online Machine Learning, which deals with techniques for doing machine learning in an online setting, i.e. where you train your model a few examples at a time, rather than using the full dataset (off-line learning).
This document provides an introduction to an optimization course for data science. It discusses how optimization methods are important for data science applications like predictive analytics and prescriptive analytics. It provides an example of using a polynomial interpolation model and cross-validation to estimate daily energy production from an energy community. The goal is to optimally size a battery for the energy community based on energy predictions and price signals. The course will cover algorithms for solving optimization problems that arise in fitting machine learning models and other data science applications.
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...Srinath Perera
Large scale data processing analyses and makes sense of large amounts of data. Although the field itself is not new, it is finding many usecases under the theme "Bigdata" where Google itself, IBM Watson, and Google's Driverless car are some of success stories. Spanning many fields, Large scale data processing brings together technologies like Distributed Systems, Machine Learning, Statistics, and Internet of Things together. It is a multi-billion-dollar industry including use cases like targeted advertising, fraud detection, product recommendations, and market surveys. With new technologies like Internet of Things (IoT), these use cases are expanding to scenarios like Smart Cities, Smart health, and Smart Agriculture. Some usecases like Urban Planning can be slow, which is done in batch mode, while others like stock markets need results within Milliseconds, which are done in streaming fashion. There are different technologies for each case: MapReduce for batch processing and Complex Event Processing and Stream Processing for real-time usecases. Furthermore, the type of analysis range from basic statistics like mean to complicated prediction models based on machine Learning. In this talk, we will discuss data processing landscape: concepts, usecases, technologies and open questions while drawing examples from real world scenarios.
http://icter.org/conference/invited_speeches
Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchOlivier Teytaud
The document discusses using simulations and algorithms to optimize power grids over different timescales:
1. Short term dispatching in real-time using human control
2. Combinatorial optimization over days/weeks
3. Stochastic hydroelectric optimization over years
4. Expensive multi-objective optimization of investment strategies over 50 years
It proposes using simulation-based optimization techniques like Upper Confidence Trees and Direct Policy Search to analyze simulations while allowing domain expertise to be incorporated through approximate policies. The goal is optimizing investments in power grids in Europe and North Africa over the next 50 years under different scenarios.
The document summarizes key points from presentations at the PAISS Prairie AI summer school in July 2018. It discusses several machine learning techniques:
1. Cordelia Schmid presented on action recognition from optical flow data and the importance of warping for optical flow estimation.
2. Julien Mairal discussed incremental gradient descent methods for large-scale optimization and machine learning.
3. Martial Hebert covered robotics applications for vision and planning, including techniques for failure prediction, reducing supervision across tasks, and avoiding early commitment.
This document provides an overview and introduction to the concepts taught in a data structures and algorithms course. It discusses the goals of reinforcing that every data structure has costs and benefits, learning commonly used data structures, and understanding how to analyze the efficiency of algorithms. Key topics covered include abstract data types, common data structures, algorithm analysis techniques like best/worst/average cases and asymptotic notation, and examples of analyzing the time complexity of various algorithms. The document emphasizes that problems can have multiple potential algorithms and that problems should be carefully defined in terms of inputs, outputs, and resource constraints.
The document discusses tools for artificial intelligence and their applications. It describes Olivier Teytaud's work at Tao, a research group in Paris focused on reservoir computing, optimal decision making under uncertainty, optimization, and machine learning. It then provides examples of applications for these tools in electricity generation, urban rivals, pokemons, minesweeper, and solving unsolved situations in the game of Go. Olivier suggests that breakthroughs in games can help open doors to applying these algorithms to more important real-world problems by building trust in the approaches.
This lecture provides an introduction to recurrent neural networks, which include a layer whose hidden state is aware of its values in a previous time-step.
These slides were used in the Master in Computer Vision Barcelona 2019/2020, in the Module 6 dedicated to Video Analysis.
http://pagines.uab.cat/mcv/
This document provides an introduction to big data and data science concepts. It discusses how data is now plentiful and inexpensive to store compared to the past. It outlines some of the challenges of big data like ingesting, organizing, interpreting large datasets as well as overfitting. Machine learning models discussed include neural networks, convolutional neural networks, and Word2Vec for natural language processing. The document provides an overview of key statistical concepts in evaluating models like training, validating, testing and comparing different performance metrics.
Tracking the tracker: Time Series Analysis in Python from First Principleskenluck2001
The talk will focus on
1. Forecasting
2. Anomaly Detection
This will take a dive into common methods of doing time series analysis, introduce a new algorithm for online ARIMA, and a number of variations of Kalman filters with barebone implementations in Python.
A Python implementation of a anomaly detection system on data stream with a deep dive into the mathematics that will be explained in clear layman's term. We will work through a easy group exercise to internalize the concepts.
The talk will discuss how to deploy machine learning module in a production. We discuss lessons learnt in practice and conclusion.
Initialization methods for the tsp with time windows using variable neighborh...Konstantinos Giannakis
This document presents an initialization method for solving the travelling salesman problem with time windows (TSP-TW) using variable neighborhood search (VNS). The authors implement a VNS metaheuristic that uses both random and sorted initial solutions and performs local search. Their results show that for some problem instances, a sorted initial solution does not find a feasible solution as often as a random initial solution. The authors propose using alternative random initialization procedures with different probability distributions for future work.
This document provides notes for an introduction to simulation course. It defines key terms like system, entities, events, and different types of models. It explains that simulation is useful for evaluating systems that would be too complex, expensive or dangerous to experiment on directly. The document outlines the goals of the course as understanding simulation concepts, mathematics, programming and implementing simulation projects. It also discusses different approaches to representing time in a simulation, like next-event time advance and fixed-increment time advance.
Multi-Period Integer Portfolio Optimization Using a Quantum Annealer (Present...maikelcorleoni
This document discusses using a quantum annealer to solve the multi-period integer portfolio optimization problem. The problem involves computing an optimal trading trajectory that balances single-period optimal positions against transaction costs over multiple periods. Computing the optimal trajectory is NP-complete. The document shows that a quantum annealer can solve instances of this problem by encoding it as a quadratic binary optimization problem and obtaining solutions that are within the margin of error of the true optimal solution. This approach provides a scalable way to solve an important problem in financial optimization that is currently intractable.
Jay Yagnik at AI Frontiers : A History Lesson on AIAI Frontiers
We have reached a remarkable point in history with the evolution of AI, from applying this technology to incredible use cases in healthcare, to addressing the world's biggest humanitarian and environmental issues. Our ability to learn task-specific functions for vision, language, sequence and control tasks is getting better at a rapid pace. This talk will survey some of the current advances in AI, compare AI to other fields that have historically developed over time, and calibrate where we are in the relative advancement timeline. We will also speculate about the next inflection points and capabilities that AI can offer down the road, and look at how those might intersect with other emergent fields, e.g. Quantum computing.
Similar to Optimization of power systems - old and new tools (20)
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
UiPath Test Automation using UiPath Test Suite series, part 5
Optimization of power systems - old and new tools
1. I do not speak Chinese ! ! !
●
And my English is extremely French
(when native English speakers listen to my
English, they sometimes believe that they
suddenly, by miracle, understand French)
2. I do not speak Chinese ! ! !
●
●
And my English is extremely French
(when native English speakers listen to my
English, they sometimes believe that they
suddenly, by miracle, understand French)
For the moment if I gave a talk in Chinese it
would be boring, with only:
● hse-hse
● nirao
● pukachi
3. I do not speak Chinese ! ! !
●
●
●
And my English is extremely French
(when native English speakers listen to my
English, they sometimes believe that they
suddenly, by miracle, understand French)
For the moment if I gave a talk in Chinese it
would be boring, with only:
● hse-hse
● nirao
● pukachi
Interrupt me as much as you want for
facilitating understanding :-)
4. High-Scale Power Systems:
Simulation & Optimization
Olivier Teytaud + Inria-Tao + Artelys
TAO project-team
INRIA Saclay Île-de-France
O. Teytaud, Research Fellow,
olivier.teytaud@inria.fr
http://www.lri.fr/~teytaud/
5. Ilab METIS
www.lri.fr/~teytaud/metis.html
●
Metis = Tao + Artelys
TAO tao.lri.fr, Machine Learning & Optimization
● Joint INRIA / CNRS / Univ. Paris-Sud team
● 12 researchers, 17 PhDs, 3 post-docs, 3 engineers
● Artelys www.artelys.com SME
- France / US / Canada
- 50 persons
==> collaboration through common platform
●
●
Activities
● Optimization (uncertainties, sequential)
● Application to power systems
O. Teytaud, Research Fellow,
olivier.teytaud@inria.fr
http://www.lri.fr/~teytaud/
6. Importantly, it is not a lie.
●
●
●
●
It is a tradition, in research institutes, to claim
some links with industry
I don't claim that having such links is necessary
or always a great achievement in itself
But I do claim that in my case it is true that I
have links with industry
My four students here in Taiwan, and others in
France, all have real salaries based on
industrial fundings.
7. All in one slide
Consider an electric system.
Decisions =
● Strategic decisions (a few time steps):
● building a nuclear power plant
● build a Spain-Marocco connection
● build a wind farm
●
tactical decisions (many time steps):
● switching on hydroelectricity plant #7
● switching on thermal plant #4
● ....
Based on
Simulations
of the
tactical level
Depends on
the
strategic
level
8. A bit more precisely:
the strategic level
Brute force approach for strategic level:
●
●
I simulate
● each possible strategic decision (e.g. 20000);
● 1000 times;
● each of them with optimal tactical decisions
==> 20 000 optimizations, 1000
simulations each
I choose the best one.
Better: More simulations on the best strategic decisions.
However, this talk will not focus on that part.
9. A bit more precisely:
the tactical level
Brute force approach for tactical level:
●
●
Simplify
● Replace each random process by
expectation
● Optimize decisions deterministically
But reality is stochastic:
● Water inflows
● Wind farms
Better: optimizing a policy
(i.e. reactive, closed-loop)
10. Specialization on Power Systems
●
Planning/control (tactical level)
●
Pluriannual planning: evaluate marginal costs of hydroelectricity
●
Taking into account stochasticity and uncertainties
==> IOMCA (ANR)
●
High scale investment studies (e.g. Europe+North Africa)
●
Long term (2030 - 2050)
●
Huge (non-stochastic) uncertainties
●
Investments: interconnections, storage, smart grids, power plants...
==> POST (ADEME)
●
Moderate scale (Cities, Factories) (tactical level simpler)
●
Master plan optimization
●
Stochastic uncertainties
==> Citines project (FP7)
12. The POST project – supergrids
simulation and optimization
Mature technology:HVDC links
(high-voltage direct current)
European subregions:
- Case 1 : electric corridor France / Spain / Marocco
- Case 2 : south-west
(France/Spain/Italiy/Tunisia/Marocco)
- Case 3 : maghreb – Central West Europe
==> towards a European supergrid
Related
ideas in Asia
13. Tactical level: unit commitment at the
scale of a coutry: looks like a game
●
Many time steps.
●
Many power plants.
●
Some of them have stocks (hydroelectricity).
●
Many constraints (rules).
●
Uncertainties (water inflows, temperature, …)
==> make decisions:
●
When should I switch on ? (for each PP)
●
At which power ?
14. Investment decisions through simulations
●
Issues
–
–
–
●
Methods
–
–
●
Demand varying in time, limited previsibility
Transportation introduces constraints
Renewable ==> variability ++
Markovian assumptions ==> wrong
Simplified models ==> Model error >> optimization error
Our approach
●
Machine Learning on top of Mathematical Programming
15. Hybridization reinforcement learning /
mathematical programming
●
Math programming (mathematicians doing discrete-time
control)
–
–
–
●
Nearly exact solutions for a simplified problem
High-dimensional constrained action space
But small state space & not anytime
Reinforcement learning (artificially intelligent people
doing discrete-time control :-) )
–
–
–
–
Unstable
Small model bias
Small / simple action space
But high dimensional state space & anytime
17. Now the technical part
Model Predictive Control,
Stochastic Dynamic
Programming,
18. Now the technical part
Model Predictive Control,
Stochastic Dynamic
Programming,
Direct Policy Search
19. Now the technical part
Model Predictive Control,
Stochastic Dynamic
Programming,
Direct Policy Search,
and Direct Value Search (new)
(3/4 of this talk is about the state of the art, only
1/4 our work)
20. Now the technical part
Model Predictive Control,
Stochastic Dynamic
Programming,
Direct Policy Search
and Direct Value Search
combining Direct Policy Search
and Stochastic Dynamic Programming
(3/4 of this talk is about the state of the art, only
1/4 our work)
21. Many optimization tools (SDP, MPC):
● Strong constraints on forecasts
● Strong constraints on model structure.
Direct Policy Search
● Arbitrary forecasts, arbitrary structure
● But not scalable / # decision variables.
→ merge: Direct Value Search
Jean-Joseph.Christophe@inria.fr
Jeremie.Decock@inria.fr
Pierre.Defreminville@artelys.com
Olivier.Teytaud@inria.fr
22. ●
Stochastic Dynamic Optimization
●
Classical solutions: Bellman (old & new)
●
●
Markov Chains
●
Overfitting
●
●
Anticipativity
SDP, SDDP
Alternate solution: Direct Policy Search
●
●
●
No problem with anticipativity
Scalability issue
The best of both worlds: Direct Value Search
23. Random
process
Random values
Stochastic Control
Controller commands
with
memory
Observation
Cost
System
State
●
●
For an optimal representation, you need access
to the whole archive
or to forecasts (generative model / probabilistic forecasts)
(Astrom 1965)
24. ●
Stochastic Dynamic Optimization
●
Classical solutions: Bellman (old & new)
●
●
Markov Chains
●
Overfitting
●
●
Anticipativity (dirty solution)
SDP, SDDP
Alternate solution: Direct Policy Search
●
●
●
No problem with anticipativity
Scalability issue
The best of both worlds: Direct Value Search
25. ●
Anticipative solutions:
●
Maximum over strategic decisions
●
Of average over random processes
●
●
Of optimized decisions, given random
processes & strategic decisions
Pros/Cons
●
●
●
Much simpler (deterministic optimization)
But in real life you can not guess November
rains in January
Rather optimistic decisions
26. MODEL PREDICTIVE CONTROL
●
Anticipative solutions:
●
Maximum over strategic decisions
●
Of pessimistic forecasts (e.g. quantile)
●
●
Of optimized decisions, given forecasts &
strategic decisions
Pros/Cons
●
●
●
Much simpler (deterministic optimization)
But in real life you can not guess November
rains in January
Not so optimistic, convenient, simple
27. MODEL PREDICTIVE CONTROL
●
Anticipative solutions:
●
Maximum over strategic decisions
●
Of pessimistic forecasts (e.g. quantile)
●
●
Of optimized decisions, given forecasts &
strategic decisions
Ok,
we have done one
Pros/Cons
of the four targets:
Much simpler (deterministic optimization)
model predictive
But in real life you can not guess November
control.
rains in January
●
●
●
Not so optimistic, convenient, simple
28. ●
Stochastic Dynamic Optimization
●
Classical solutions: Bellman (old & new)
●
●
Markov Chains
●
Overfitting
●
●
Anticipativity (dirty solution)
SDP, SDDP
Alternate solution: Direct Policy Search
●
●
●
No problem with anticipativity
Scalability issue
The best of both worlds: Direct Value Search
29. Markov solution
Representation as a Markov process (a tree):
This is the representation
of the random process.
Let us see how to
represent the rest.
30. How to solve, simple case, binary
stock, one day
It is
December
30th and I have
water
er
at )
w 0
se t =
I u os
(c
No more water,
december 31st
I do not use
I have water,
december 31st
31. How to solve, simple case, binary
stock, one day
It is
December
30th and I have
water:
Future
Cost = 0
er
at )
w 0
se t =
I u os
(c
No more water,
december 31st
I do not use
I have water,
december 31st
32. How to solve, simple case, binary
stock, 3 days, no random process
2
1
1
2
3
2
2
2
3
1
4
3
3
3
3
2
3
33. How to solve, simple case, binary
stock, 3 days, no random process
1
1
2
2
2
3
2
2
2
2
3
1
4
3
3
2
3
2
3
3
34. How to solve, simple case, binary
stock, 3 days, no random process
4
3
1
2
2
1
2
2
2
5
3
2
4
1
2
3
4
3
3
3
7
3
2
6
2
3
36. How to solve, simple case, binary
stock, 3 days, random parts
2
2
1
1
23
2 4 22 2
5
3
1
4 2 2 3
73 36 3
3
3
4
4
3
2
2
1
23
2 4 22 2
5
3
1
4 2 2 3
73 36 3
3
1
o ba
Pr
1/ 3
ility
b
Pro
bab
ilit
y 2/
3
2
2
1
23
2 4 22 2
5
3
1
4 2 2 3
73 36 3
3
4
1
3
37. Markov solution: ok you have understood
stochastic dynamic programming (Bellman)
Representation as a Markov process (a tree):
This is the representation
of the random process.
In each node, there are the
state-nodes with decision-edges.
38. Markov solution: ok you have understood
stochastic dynamic programming (Bellman)
Representation as a Markov process (a tree):
Ok,
we have done the 2nd
of the four targets:
This is the representation
stochastic dynamic
of the random process.
programming
In each node, there are the
state-nodes with decision-edges.
39. Markov solution
Representation as a Markov process (a tree):
Optimize decisions for each state.
This means you are not cheating.
But difficult to use.
Strategy optimized for
very specific forecasting
models
Might be ok for your problem ?
40. ●
Stochastic Dynamic Optimization
●
Classical solutions: Bellman (old & new)
●
●
Markov Chains
●
Overfitting
●
●
Anticipativity (dirty solution)
SDP, SDDP
Alternate solution: Direct Policy Search
●
●
●
No problem with anticipativity
Scalability issue
The best of both worlds: Direct Value Search
41. Overfitting
●
Representation as a Markov process (a tree):
How do you actually make decisions when the random values
are not exactly those observed ? (heuristics...)
●
●
●
Check on random realizations which have not been used for
building the tree.
Does it work correctly ?
Overfitting = when it works only on scenarios used in the
optimization process.
42. ●
Stochastic Dynamic Optimization
●
Classical solutions: Bellman (old & new)
●
●
Markov Chains
●
Overfitting
●
●
Anticipativity (dirty solution)
SDP, SDDP
Alternate solution: Direct Policy Search
●
●
●
No problem with anticipativity
Scalability issue
The best of both worlds: Direct Value Search
43. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller with Linear Progamming
(value function as piecewise linear)
44. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller with Linear Progamming
(value function as piecewise linear)
●
→ ok for 100 000 decision variables per time step
(tenths of time steps, hundreds of plants, several
decisions each)
45. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller with Linear Progamming
(value function as piecewise linear)
●
●
→ ok for 100 000 decision variables per time step
but solving by expensive SDP/SDDP (curse of
dimensionality, exp. in state variables)
46. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller with Linear Progamming
(value function as piecewise linear)
●
●
●
→ ok for 100 000 decision variables per time step
but solving by expensive SDP/SDDP
Constraints
●
Needs LP approximation: ok for you ?
47. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller with Linear Progamming
(value function as piecewise linear)
●
●
●
→ ok for 100 000 decision variables per time step
but solving by expensive SDP/SDDP
Constraints
●
Needs LP approximation: ok for you ?
●
SDDP requires convex Bellman values: ok for you ?
48. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller with Linear Progamming
(value function as piecewise linear)
●
●
●
→ ok for 100 000 decision variables per time step
but solving by expensive SDP/SDDP
Constraints
●
Needs LP approximation: ok for you ?
●
SDDP requires convex Bellman values: ok for you ?
●
Needs Markov random processes: ok for you ?
(possibly after some random process extension...)
49. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller with Linear Progamming
(value function as piecewise linear)
●
●
●
→ ok for 100 000 decision variables per time step
but solving by expensive SDP/SDDP
Constraints
●
Needs LP approximation: ok for you ?
●
SDDP requires convex Bellman values: ok for you ?
●
Needs Markov random processes: ok for you ?
(possibly after some random process extension...)
●
Goal
keep scalability
but get rid of SDP/SDDP solving
50. Summary
●
●
●
Most classical solution = SDP and variants
Or MPC (model-predictive control), replacing
the stochastic parts by deterministic pessimistic
forecasts
Statistical modelization is “cast” into a tree
model & (probabilistic) forecasting modules are
essentially lost
51. ●
Stochastic Dynamic Optimization
●
Classical solutions: Bellman (old & new)
●
●
Markov Chains
●
Overfitting
●
●
Anticipativity (dirty solution)
SDP, SDDP
Alternate solution: Direct Policy Search
●
●
●
No problem with anticipativity
But scalability issue
The best of both worlds: Direct Value Search
52. Direct Policy Search
●
●
●
●
Requires a parametric controller
Principle: optimize the parameters on
simulations
Unusual in large scale Power Systems
(we will see why)
Usual in other areas (finance, evolutionary
robotics)
53. Random
process
Random values
Stochastic Control
Controller commands
with
memory
State
Cost
System
State
Optimize the controller thanks to a simulator:
●
●
●
Command = Controller(w,state,forecasts)
Simulate( w ) = stochastic loss with parameter w
w* = argmin [Simulate(w)]
54. Random
process
Random values
Stochastic Control
Controller commands
with
memory
State
●
●
●
Cost
System
Ok,
State
we have done the 3rd
of the four targets:
Optimize the controller thanks to a simulator:
Direct policy search.
Command = Controller(w,state,forecasts)
Simulate( w ) = stochastic loss with parameter w
w* = argmin [Simulate(w)]
56. Direct Policy Search (DPS)
●
Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
57. Direct Policy Search (DPS)
●
Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
●
Noisy Black-Box Optimization
58. Direct Policy Search (DPS)
●
Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
●
Noisy Black-Box Optimization
●
Advantages: non-linear ok, forecasts included
59. Direct Policy Search (DPS)
●
Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
●
Noisy Black-Box Optimization
●
●
Advantages: non-linear ok, forecasts included
Issue: too slow
hundreds of parameters for even 20 decision variables
(depends on structure)
60. Direct Policy Search (DPS)
●
Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
●
Noisy Black-Box Optimization
●
●
Advantages: non-linear ok, forecasts included
Issue: too slow
hundreds of parameters for even 20 decision variables
(depends on structure)
●
Idea: a special structure for DPS
(inspired from SDP)
61. Direct Policy Search (DPS)
●
Requires a parametric controller
e.g. neural network
Controller(w,x) =
W3+W2.tanh(W1.x+W0)
●
Noisy Black-Box Optimization
●
●
Strategy optimized given the real
Advantages: non-linear ok, forecasts included
forecasting module you have
Issue: too slow
hundreds of parameters for even 20 decision variables
(forecasts are inputs)
(depends on structure)
●
Idea: a special structure for DPS
(inspired from SDP)
62. ●
Stochastic Dynamic Optimization
●
Classical solutions: Bellman (old & new)
●
●
Markov Chains
●
Overfitting
●
●
Anticipativity (dirty solution)
SDP, SDDP
Alternate solution: Direct Policy Search
●
●
●
No problem with anticipativity
Scalability issue
The best of both worlds: Direct Value
Search
63. Direct Value Search
SDP representation in DPS
Controller(state) =
argmin Cost(decision) + V(next state)
●
V(nextState) = alpha x NextState
●
LP
alpha = NeuralNetwork(w,state)
Not LP
(or a more sophisticated LP)
==> given w, decision making solved as a LP
==> non-linear mapping for choosing the
parameters of the LP from the current state
64. Direct Value Search
SDP representation in DPS
Controller(state) =
argmin Cost(decision) + V(next state)
●
V(nextState) = alpha x NextState
●
LP
alpha = NeuralNetwork(w,state)
Not LP
(or a more sophisticated LP)
Drawback: requires the optimization of w
( = noisy black-box optimization problem)
65. Summary: the best of both worlds
Controller(w,state)
The Structure of
the Controller
(fast, scalable by structure)
●
V(w,state,.) is
non-linear
●
Optimize Cost(dec) + V(w,state,nextState) is LP
Simul(w)
●
Do a simulation with w
●
A simulator
(you can put anything you want in it,
even if it is not linear, nothing Markovian...)
Return the cost
DirectValueSearch
●
optimize w* = argmin simul(w)
●
Return Controller with w*
The optimization
(will do its best, given the simulator
and the structure)
66. Summary: the best of both worlds
Controller(w,state)
●
V(w,state,.) is
non-linear
●
Optimize Cost(dec) + V(w,state,nextState) is LP
Simul(w)
●
Do a simulation with w
●
Return the cost
3 optimizers:
● SAES
DirectValueSearch
●
●
optimize w* = argmin simul(w)
●
Return Controller with w*
●
Fabian:
● gradient descent
● redundant finite differences
Newton version
67.
68. Ok,
we have done the 4th
of the four targets:
Direct value search.
69. State of the art in discrete-time control, a few tools:
●
Model Predictive Control:
For making a decision in a given state:
(i) do forecasts
(ii) replace random procs -> pessimistic forecasts
(iii) Optimize as if deterministic problem
●
Stochastic Dynamic Programming:
●
●
●
Markov model
Compute “cost to go” backwards
Direct Policy Search:
●
Parametric controller
●
Optimized on simulations
70. Conclusion
●
Still rather preliminary (less tested than MPC or
SDDP) but promising:
●
●
●
●
Forecasts naturally included in optimization
Anytime algorithm
(the user immediately gets approximate results)
No convexity constraints
Room for detailed simulations
(e.g. with very small time scale, for volatility)
●
No random process constraints (not Markov)
●
Can handle large state spaces (as DPS)
●
Can handle large action spaces (as SDP)
==> can work on the “real” problem, without “cast”
71. Bibliography
●
●
●
●
Dynamic Programming and Suboptimal Control:
A Survey from ADP to MPC. D. Bertsekas,
2005. (MPC = deterministic forecasts)
Astrom 1965
Renewable energy forecasts ought to be
probabilistic! P. Pinson, 2013 (wipfor talk)
Training a neural network with a financial
criterion rather than a prediction criterion.
Y. Bengio, 1997 (quite practical application of direct
policy search, convincing experiments)
74. SDP / SDDP
Stochastic (Dual) Dynamic Programming
●
Representation of the controller
●
decision(current state)=
argmin Cost(decision) + Bellman(next state)
●
Linear programming (LP) if:
–
–
●
For a given current state, next state = LP(decision)
Cost(decision) = LP(decision)
→100 000 decision variables per time step