A simple tutorial on Monte-Carlo Tree Search
Contains a description of dynamic programming and alpha-beta search, then MCTS. Special cases for simultaneous actions are discussed.
I should add comments so that it can be used without preliminary knowledge of MCTS, if there is at least one request for doing so I'll do it.
@article{gelly:hal-00695370,
hal_id = {hal-00695370},
url = {http://hal.inria.fr/hal-00695370},
title = {{The Grand Challenge of Computer Go: Monte Carlo Tree Search and Extensions}},
author = {Gelly, Sylvain and Kocsis, Levente and Schoenauer, Marc and Sebag, Mich{\`e}le and Silver, David and Szepesvari, Csaba and Teytaud, Olivier},
abstract = {{The ancient oriental game of Go has long been considered a grand challenge for artificial intelligence. For decades, com- puter Go has defied the classical methods in game tree search that worked so successfully for chess and checkers. How- ever, recent play in computer Go has been transformed by a new paradigm for tree search based on Monte-Carlo meth- ods. Programs based on Monte-Carlo tree search now play at human-master levels and are beginning to challenge top professional players. In this paper we describe the leading algorithms for Monte-Carlo tree search and explain how they have advanced the state of the art in computer Go.}},
language = {Anglais},
affiliation = {TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI , LPDS , Microsoft Research - Inria Joint Centre - MSR - INRIA , University of Alberta, Canada , Department of Computing Science},
publisher = {ACM},
pages = {106-113},
journal = {Communication of the ACM},
volume = {55},
number = {3 },
audience = {internationale },
year = {2012},
pdf = {http://hal.inria.fr/hal-00695370/PDF/CACM-MCTS.pdf},
}
The document discusses greedy algorithms and how they work. It provides an example of using a greedy algorithm to solve the fractional knapsack problem in 3 steps: (1) sorting items by value to weight ratio, (2) initializing a selection array, (3) iteratively selecting highest ratio items that fit in the knapsack until full. While fast, greedy algorithms may not always find the optimal solution. The document also covers using Huffman coding to create efficient variable-length codes.
"Sparse Binary Zero-Sum Games". David Auger, Jialin Liu, Sylvie Ruette, David L. St-Pierre and Olivier Teytaud. The 6th Asian Conference on Machine Learning (ACML), 2014.
This document discusses backpropagation in convolutional neural networks. It begins by explaining backpropagation for single neurons and multi-layer neural networks. It then discusses the specific operations involved in convolutional and pooling layers, and how backpropagation is applied to convolutional neural networks as a composite function with multiple differentiable operations. The key steps are decomposing the network into differentiable operations, propagating error signals backward using derivatives, and computing gradients to update weights.
The document provides an introduction to variational autoencoders (VAE). It discusses how VAEs can be used to learn the underlying distribution of data by introducing a latent variable z that follows a prior distribution like a standard normal. The document outlines two approaches - explicitly modeling the data distribution p(x), or using the latent variable z. It suggests using z and assuming the conditional distribution p(x|z) is a Gaussian with mean determined by a neural network gθ(z). The goal is to maximize the likelihood of the dataset by optimizing the evidence lower bound objective.
Lesson 27: Integration by Substitution (Section 041 slides)Matthew Leingang
The document contains notes from a Calculus I class at New York University on December 13, 2010. It discusses using the substitution method for indefinite and definite integrals. Examples are provided to demonstrate how to use substitutions to evaluate integrals involving trigonometric, exponential, and polynomial functions. The key steps are to make a substitution for the variable in terms of a new variable, determine the differential of the substitution, and substitute into the integral to transform it into an integral involving only the new variable.
This document discusses computing Nash equilibria in game theory. It begins by defining games, Nash equilibria, and different categories of games including normal form games, anonymous games, and graphical games. It then provides an overview of approximating Nash equilibria in these different game types. Key points include the document showing there are polynomial time approximation algorithms for anonymous games with a constant number of strategies, but the problem is PPAD-hard for games with more strategies. It also discusses reductions between computing Nash equilibria in different game types.
This document contains lecture notes on sparse autoencoders. It begins with an introduction describing the limitations of supervised learning and the need for algorithms that can automatically learn feature representations from unlabeled data. The notes then state that sparse autoencoders are one approach to learn features from unlabeled data, and describe the organization of the rest of the notes. The notes will cover feedforward neural networks, backpropagation for supervised learning, autoencoders for unsupervised learning, and how sparse autoencoders are derived from these concepts.
This document discusses attractors and dynamical systems through examples like number games and the Lorenz system. It begins by using a 3-digit number game to illustrate the concept of attractors, showing how different starting numbers eventually converge to the fixed point attractors of 495 and 000. The document then discusses how this idea can be applied in other contexts like numerical analysis, economics, and system identification. It also introduces different types of attractors like fixed points, periodic attractors, and strange attractors. Finally, it summarizes recent work extending the Lorenz system to a one-parameter family and exploring relationships between different 3D autonomous systems.
The document discusses greedy algorithms and how they work. It provides an example of using a greedy algorithm to solve the fractional knapsack problem in 3 steps: (1) sorting items by value to weight ratio, (2) initializing a selection array, (3) iteratively selecting highest ratio items that fit in the knapsack until full. While fast, greedy algorithms may not always find the optimal solution. The document also covers using Huffman coding to create efficient variable-length codes.
"Sparse Binary Zero-Sum Games". David Auger, Jialin Liu, Sylvie Ruette, David L. St-Pierre and Olivier Teytaud. The 6th Asian Conference on Machine Learning (ACML), 2014.
This document discusses backpropagation in convolutional neural networks. It begins by explaining backpropagation for single neurons and multi-layer neural networks. It then discusses the specific operations involved in convolutional and pooling layers, and how backpropagation is applied to convolutional neural networks as a composite function with multiple differentiable operations. The key steps are decomposing the network into differentiable operations, propagating error signals backward using derivatives, and computing gradients to update weights.
The document provides an introduction to variational autoencoders (VAE). It discusses how VAEs can be used to learn the underlying distribution of data by introducing a latent variable z that follows a prior distribution like a standard normal. The document outlines two approaches - explicitly modeling the data distribution p(x), or using the latent variable z. It suggests using z and assuming the conditional distribution p(x|z) is a Gaussian with mean determined by a neural network gθ(z). The goal is to maximize the likelihood of the dataset by optimizing the evidence lower bound objective.
Lesson 27: Integration by Substitution (Section 041 slides)Matthew Leingang
The document contains notes from a Calculus I class at New York University on December 13, 2010. It discusses using the substitution method for indefinite and definite integrals. Examples are provided to demonstrate how to use substitutions to evaluate integrals involving trigonometric, exponential, and polynomial functions. The key steps are to make a substitution for the variable in terms of a new variable, determine the differential of the substitution, and substitute into the integral to transform it into an integral involving only the new variable.
This document discusses computing Nash equilibria in game theory. It begins by defining games, Nash equilibria, and different categories of games including normal form games, anonymous games, and graphical games. It then provides an overview of approximating Nash equilibria in these different game types. Key points include the document showing there are polynomial time approximation algorithms for anonymous games with a constant number of strategies, but the problem is PPAD-hard for games with more strategies. It also discusses reductions between computing Nash equilibria in different game types.
This document contains lecture notes on sparse autoencoders. It begins with an introduction describing the limitations of supervised learning and the need for algorithms that can automatically learn feature representations from unlabeled data. The notes then state that sparse autoencoders are one approach to learn features from unlabeled data, and describe the organization of the rest of the notes. The notes will cover feedforward neural networks, backpropagation for supervised learning, autoencoders for unsupervised learning, and how sparse autoencoders are derived from these concepts.
This document discusses attractors and dynamical systems through examples like number games and the Lorenz system. It begins by using a 3-digit number game to illustrate the concept of attractors, showing how different starting numbers eventually converge to the fixed point attractors of 495 and 000. The document then discusses how this idea can be applied in other contexts like numerical analysis, economics, and system identification. It also introduces different types of attractors like fixed points, periodic attractors, and strange attractors. Finally, it summarizes recent work extending the Lorenz system to a one-parameter family and exploring relationships between different 3D autonomous systems.
Tools for Discrete Time Control; Application to Power SystemsOlivier Teytaud
3 main algorithms from the state of the art:
- Model Predictive Control
- Stochastic Dynamic Programming
- Direct Policy Search
==> and our proposal, a modified Direct Policy Search
termed Direct Value Search
- The document discusses games with simultaneous actions and hidden information. It presents games as directed graphs with actions, players, observations, rewards, and loops.
- Games with simultaneous actions and short-term hidden information can be represented as games with hidden information by removing intermediate turns.
- Questions about the existence of a sure-win strategy for one player (the "UD" question) are only relevant for games with full observability, not matrix games.
Don't believe what is written in these slides.
These statements are just provocative statements, most of them found on internet, here for discussion and for brain storming.
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Olivier Teytaud
Here are a few suggestions on how to improve the Zermelo algorithm when it is too slow:
1. Add a depth limit. Stop recursion when a maximum search depth is reached. Return a heuristic evaluation instead of continuing search.
2. Use alpha-beta pruning. Track the best value found (alpha) and prune branches that cannot improve on it.
3. Iterative deepening. Run successive searches with increasing depth limits to get progressively better approximations.
4. Move ordering. Evaluate better moves earlier in the search tree. This prunes bad moves earlier.
5. Transposition tables. Store previously computed move evaluations to avoid re-expanding the same position.
6. Parallelize the
This document provides an overview of distributed decision making in partially observable dynamic games and multiobjective policy optimization. It discusses applying these techniques to optimization problems in games like chess and Go, as well as industrial applications like managing groups of power plants involving renewable energy, nuclear power, coal, hydroelectric power, and interactions with electricity consumers and networks. The goal is to optimize strategies using parallel computing and test these approaches on games and energy systems.
Theory of games, with a short reminder of computational complexity and an independent appendix on human complexity and the game of Go
@article{david:hal-00710073,
hal_id = {hal-00710073},
url = {http://hal.inria.fr/hal-00710073},
title = {{The Frontier of Decidability in Partially Observable Recursive Games}},
author = {David, Auger and Teytaud, Olivier},
abstract = {{The classical decision problem associated with a game is whether a given player has a winning strategy, i.e. some strategy that leads almost surely to a victory, regardless of the other players' strategies. While this problem is relevant for deterministic fully observable games, for a partially observable game the requirement of winning with probability 1 is too strong. In fact, as shown in this paper, a game might be decidable for the simple criterion of almost sure victory, whereas optimal play (even in an approximate sense) is not computable. We therefore propose another criterion, the decidability of which is equivalent to the computability of approximately optimal play. Then, we show that (i) this criterion is undecidable in the general case, even with deterministic games (no random part in the game), (ii) that it is in the jump 0', and that, even in the stochastic case, (iii) it becomes decidable if we add the requirement that the game halts almost surely whatever maybe the strategies of the players.}},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{Special Issue on "Frontier between Decidability and Undecidability"}},
publisher = {World Scinet},
journal = {International Journal on Foundations of Computer Science (IJFCS)},
volume = {Accepted},
note = {revised 2011, accepted 2011, in press },
audience = {internationale },
year = {2012},
}
Noisy Optimization combining Bandits and Evolutionary AlgorithmsOlivier Teytaud
@inproceedings{rolet:inria-00437140,
hal_id = {inria-00437140},
url = {http://hal.inria.fr/inria-00437140},
title = {{Bandit-based Estimation of Distribution Algorithms for Noisy Optimization: Rigorous Runtime Analysis}},
author = {Rolet, Philippe and Teytaud, Olivier},
abstract = {{We show complexity bounds for noisy optimization, in frame- works in which noise is stronger than in previously published papers[19]. We also propose an algorithm based on bandits (variants of [16]) that reaches the bound within logarithmic factors. We emphasize the differ- ences with empirical derived published algorithms.}},
keywords = {noisy optimization evolutionary algorithms bandits},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Futurs , TAO - INRIA Saclay - Ile de France},
booktitle = {{Lion4}},
address = {Venice, Italie},
audience = {internationale },
year = {2010},
pdf = {http://hal.inria.fr/inria-00437140/PDF/lion4long.pdf},
}
@inproceedings{coulom:hal-00517157,
hal_id = {hal-00517157},
url = {http://hal.archives-ouvertes.fr/hal-00517157},
title = {{Handling Expensive Optimization with Large Noise}},
author = {Coulom, R{\'e}mi and Rolet, Philippe and Sokolovska, Nataliya and Teytaud, Olivier},
abstract = {{This paper exhibits lower and upper bounds on runtimes for expensive noisy optimization problems. Runtimes are expressed in terms of number of fitness evaluations. Fitnesses considered are monotonic transformations of the {\em sphere} function. The analysis focuses on the common case of fitness functions quadratic in the distance to the optimum in the neighborhood of this optimum---it is nonetheless also valid for any monotonic polynomial of degree p>2. Upper bounds are derived via a bandit-based estimation of distribution algorithm that relies on Bernstein races called R-EDA. It is known that the algorithm is consistent even in non-differentiable cases. Here we show that: (i) if the variance of the noise decreases to 0 around the optimum, it can perform optimally for quadratic transformations of the norm to the optimum, (ii) otherwise, it provides a slower convergence rate than the one exhibited empirically by an algorithm called Quadratic Logistic Regression based on surrogate models---although QLR requires a probabilistic prior on the fitness class.}},
keywords = {Noisy optimization, Bernstein races},
language = {Anglais},
affiliation = {SEQUEL - INRIA Lille - Nord Europe , TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI},
booktitle = {{Foundations of Genetic Algorithms (FOGA 2011)}},
pages = {TBA},
address = {Autriche},
editor = {ACM },
audience = {internationale },
year = {2011},
month = Jan,
pdf = {http://hal.archives-ouvertes.fr/hal-00517157/PDF/foga10noise.pdf},
}
Choosing between several options in uncertain environmentsOlivier Teytaud
The document discusses bandit problems with strategic choices and small budgets. It defines bandit problems, strategic bandit problems, and compares the two. It presents algorithms for exploring options and making recommendations in both one-player and two-player settings. Experimental results on a Go positioning problem and an online card game show that TEXP3 outperforms other algorithms in two-player settings. The document concludes with discussions on extensions to structured bandits and using strategic bandits to model investment choices.
Hydroelectricity uses water to produce electricity and has advantages for electricity storage. It provides daily, yearly, and negative electricity production by pumping water to higher reservoirs. However, expanding hydroelectricity is challenging due to its large infrastructure requirements and local environmental impacts. New technologies may improve energy storage capabilities and grid stability in the future, but developing large-scale annual storage remains difficult given constraints. Hydroelectricity will continue playing an important role in energy systems alongside other renewable technologies and efficiency strategies.
- The document discusses energy management in France and potential areas of research collaboration between France and Taiwan.
- Key areas discussed include optimizing long-term investment policies for electricity generation using tools like reinforcement learning and stochastic programming to account for uncertainties.
- Specific questions mentioned are around optimal connections between Europe and Africa, impacts of subsidizing solar power or switching off nuclear plants, and benefits of demand reduction contracts.
- The researcher proposes combining methods like direct policy search and Monte Carlo tree search to better optimize long-term planning while accounting for short-term effects. Plans are discussed to test new ideas, share data and codes, and potentially organize joint work between the two regions.
This document discusses how to save money by using open source software instead of proprietary software like Microsoft Office. It recommends downloading and using OpenOffice or LibreOffice instead, as they are free alternatives that work very well. It also recommends installing a free open source operating system like Linux, as this can save a lot of money on software costs over time. Open source is discussed as an economic model where the marginal cost of sharing and distributing code is very low, enabling new business models to earn money through services, support or customization rather than just software licenses. A variety of important open source software projects are listed across different domains like operating systems, office suites, web servers and more.
The document discusses the computational complexity of partially observable games. Some key points:
1. Two-player unobservable games are EXPSPACE-complete, as strategies are just sequences of actions with no observability.
2. Encoding a Turing machine as a game shows the hardness of the unobservable case. The tape configurations can be represented in a game state of size logarithmic in the tape size.
3. Two-player partially observable games or one-player partially observable games against randomness are 2EXPTIME-complete, even more complex than the unobservable case.
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Olivier Teytaud
Ilab METIS is a collaboration between TAO, a machine learning and optimization team within INRIA, and Artelys, an SME focused on optimization. They work on optimizing energy policies through simulations of power systems while taking into account uncertainties and stochastic variables. Their methodologies use a hybrid of reinforcement learning, mathematical programming, and direct policy search to optimize investments and operational decisions for power grids over multiple timescales while handling constraints. They have applied their approaches to problems involving interconnection planning, demand balancing, and renewable integration on scales from cities to entire continents.
This document discusses blind Go, a variant of the game where players do not look at the board and must memorize positions. It explores strategies for blind Go, such as playing unusual moves that are harder for the opponent to remember. Experiments found that providing an empty board as a visual aid helped players. When playing against professionals in blind 9x9 Go, the computer won 2 of 3 games. In a 19x19 game against a top human player, the computer won through an unexpected, unusual move where the human made a rare mistake due to not seeing the board. Further research is needed, but playing unconventional moves seems beneficial in blind Go.
Artificial Intelligence and Optimization with ParallelismOlivier Teytaud
This document discusses parallelism in artificial intelligence and evolutionary computation. It explains that comparison-based optimization algorithms, which include many evolutionary algorithms, can be naturally parallelized by speculatively running multiple branches in parallel with a branching factor of 3 or more. This allows theoretical logarithmic speedups to be achieved in practice through simple parallelization tricks.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Tools for Discrete Time Control; Application to Power SystemsOlivier Teytaud
3 main algorithms from the state of the art:
- Model Predictive Control
- Stochastic Dynamic Programming
- Direct Policy Search
==> and our proposal, a modified Direct Policy Search
termed Direct Value Search
- The document discusses games with simultaneous actions and hidden information. It presents games as directed graphs with actions, players, observations, rewards, and loops.
- Games with simultaneous actions and short-term hidden information can be represented as games with hidden information by removing intermediate turns.
- Questions about the existence of a sure-win strategy for one player (the "UD" question) are only relevant for games with full observability, not matrix games.
Don't believe what is written in these slides.
These statements are just provocative statements, most of them found on internet, here for discussion and for brain storming.
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Olivier Teytaud
Here are a few suggestions on how to improve the Zermelo algorithm when it is too slow:
1. Add a depth limit. Stop recursion when a maximum search depth is reached. Return a heuristic evaluation instead of continuing search.
2. Use alpha-beta pruning. Track the best value found (alpha) and prune branches that cannot improve on it.
3. Iterative deepening. Run successive searches with increasing depth limits to get progressively better approximations.
4. Move ordering. Evaluate better moves earlier in the search tree. This prunes bad moves earlier.
5. Transposition tables. Store previously computed move evaluations to avoid re-expanding the same position.
6. Parallelize the
This document provides an overview of distributed decision making in partially observable dynamic games and multiobjective policy optimization. It discusses applying these techniques to optimization problems in games like chess and Go, as well as industrial applications like managing groups of power plants involving renewable energy, nuclear power, coal, hydroelectric power, and interactions with electricity consumers and networks. The goal is to optimize strategies using parallel computing and test these approaches on games and energy systems.
Theory of games, with a short reminder of computational complexity and an independent appendix on human complexity and the game of Go
@article{david:hal-00710073,
hal_id = {hal-00710073},
url = {http://hal.inria.fr/hal-00710073},
title = {{The Frontier of Decidability in Partially Observable Recursive Games}},
author = {David, Auger and Teytaud, Olivier},
abstract = {{The classical decision problem associated with a game is whether a given player has a winning strategy, i.e. some strategy that leads almost surely to a victory, regardless of the other players' strategies. While this problem is relevant for deterministic fully observable games, for a partially observable game the requirement of winning with probability 1 is too strong. In fact, as shown in this paper, a game might be decidable for the simple criterion of almost sure victory, whereas optimal play (even in an approximate sense) is not computable. We therefore propose another criterion, the decidability of which is equivalent to the computability of approximately optimal play. Then, we show that (i) this criterion is undecidable in the general case, even with deterministic games (no random part in the game), (ii) that it is in the jump 0', and that, even in the stochastic case, (iii) it becomes decidable if we add the requirement that the game halts almost surely whatever maybe the strategies of the players.}},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{Special Issue on "Frontier between Decidability and Undecidability"}},
publisher = {World Scinet},
journal = {International Journal on Foundations of Computer Science (IJFCS)},
volume = {Accepted},
note = {revised 2011, accepted 2011, in press },
audience = {internationale },
year = {2012},
}
Noisy Optimization combining Bandits and Evolutionary AlgorithmsOlivier Teytaud
@inproceedings{rolet:inria-00437140,
hal_id = {inria-00437140},
url = {http://hal.inria.fr/inria-00437140},
title = {{Bandit-based Estimation of Distribution Algorithms for Noisy Optimization: Rigorous Runtime Analysis}},
author = {Rolet, Philippe and Teytaud, Olivier},
abstract = {{We show complexity bounds for noisy optimization, in frame- works in which noise is stronger than in previously published papers[19]. We also propose an algorithm based on bandits (variants of [16]) that reaches the bound within logarithmic factors. We emphasize the differ- ences with empirical derived published algorithms.}},
keywords = {noisy optimization evolutionary algorithms bandits},
language = {Anglais},
affiliation = {Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Futurs , TAO - INRIA Saclay - Ile de France},
booktitle = {{Lion4}},
address = {Venice, Italie},
audience = {internationale },
year = {2010},
pdf = {http://hal.inria.fr/inria-00437140/PDF/lion4long.pdf},
}
@inproceedings{coulom:hal-00517157,
hal_id = {hal-00517157},
url = {http://hal.archives-ouvertes.fr/hal-00517157},
title = {{Handling Expensive Optimization with Large Noise}},
author = {Coulom, R{\'e}mi and Rolet, Philippe and Sokolovska, Nataliya and Teytaud, Olivier},
abstract = {{This paper exhibits lower and upper bounds on runtimes for expensive noisy optimization problems. Runtimes are expressed in terms of number of fitness evaluations. Fitnesses considered are monotonic transformations of the {\em sphere} function. The analysis focuses on the common case of fitness functions quadratic in the distance to the optimum in the neighborhood of this optimum---it is nonetheless also valid for any monotonic polynomial of degree p>2. Upper bounds are derived via a bandit-based estimation of distribution algorithm that relies on Bernstein races called R-EDA. It is known that the algorithm is consistent even in non-differentiable cases. Here we show that: (i) if the variance of the noise decreases to 0 around the optimum, it can perform optimally for quadratic transformations of the norm to the optimum, (ii) otherwise, it provides a slower convergence rate than the one exhibited empirically by an algorithm called Quadratic Logistic Regression based on surrogate models---although QLR requires a probabilistic prior on the fitness class.}},
keywords = {Noisy optimization, Bernstein races},
language = {Anglais},
affiliation = {SEQUEL - INRIA Lille - Nord Europe , TAO - INRIA Saclay - Ile de France , Laboratoire de Recherche en Informatique - LRI},
booktitle = {{Foundations of Genetic Algorithms (FOGA 2011)}},
pages = {TBA},
address = {Autriche},
editor = {ACM },
audience = {internationale },
year = {2011},
month = Jan,
pdf = {http://hal.archives-ouvertes.fr/hal-00517157/PDF/foga10noise.pdf},
}
Choosing between several options in uncertain environmentsOlivier Teytaud
The document discusses bandit problems with strategic choices and small budgets. It defines bandit problems, strategic bandit problems, and compares the two. It presents algorithms for exploring options and making recommendations in both one-player and two-player settings. Experimental results on a Go positioning problem and an online card game show that TEXP3 outperforms other algorithms in two-player settings. The document concludes with discussions on extensions to structured bandits and using strategic bandits to model investment choices.
Hydroelectricity uses water to produce electricity and has advantages for electricity storage. It provides daily, yearly, and negative electricity production by pumping water to higher reservoirs. However, expanding hydroelectricity is challenging due to its large infrastructure requirements and local environmental impacts. New technologies may improve energy storage capabilities and grid stability in the future, but developing large-scale annual storage remains difficult given constraints. Hydroelectricity will continue playing an important role in energy systems alongside other renewable technologies and efficiency strategies.
- The document discusses energy management in France and potential areas of research collaboration between France and Taiwan.
- Key areas discussed include optimizing long-term investment policies for electricity generation using tools like reinforcement learning and stochastic programming to account for uncertainties.
- Specific questions mentioned are around optimal connections between Europe and Africa, impacts of subsidizing solar power or switching off nuclear plants, and benefits of demand reduction contracts.
- The researcher proposes combining methods like direct policy search and Monte Carlo tree search to better optimize long-term planning while accounting for short-term effects. Plans are discussed to test new ideas, share data and codes, and potentially organize joint work between the two regions.
This document discusses how to save money by using open source software instead of proprietary software like Microsoft Office. It recommends downloading and using OpenOffice or LibreOffice instead, as they are free alternatives that work very well. It also recommends installing a free open source operating system like Linux, as this can save a lot of money on software costs over time. Open source is discussed as an economic model where the marginal cost of sharing and distributing code is very low, enabling new business models to earn money through services, support or customization rather than just software licenses. A variety of important open source software projects are listed across different domains like operating systems, office suites, web servers and more.
The document discusses the computational complexity of partially observable games. Some key points:
1. Two-player unobservable games are EXPSPACE-complete, as strategies are just sequences of actions with no observability.
2. Encoding a Turing machine as a game shows the hardness of the unobservable case. The tape configurations can be represented in a game state of size logarithmic in the tape size.
3. Two-player partially observable games or one-player partially observable games against randomness are 2EXPTIME-complete, even more complex than the unobservable case.
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...Olivier Teytaud
Ilab METIS is a collaboration between TAO, a machine learning and optimization team within INRIA, and Artelys, an SME focused on optimization. They work on optimizing energy policies through simulations of power systems while taking into account uncertainties and stochastic variables. Their methodologies use a hybrid of reinforcement learning, mathematical programming, and direct policy search to optimize investments and operational decisions for power grids over multiple timescales while handling constraints. They have applied their approaches to problems involving interconnection planning, demand balancing, and renewable integration on scales from cities to entire continents.
This document discusses blind Go, a variant of the game where players do not look at the board and must memorize positions. It explores strategies for blind Go, such as playing unusual moves that are harder for the opponent to remember. Experiments found that providing an empty board as a visual aid helped players. When playing against professionals in blind 9x9 Go, the computer won 2 of 3 games. In a 19x19 game against a top human player, the computer won through an unexpected, unusual move where the human made a rare mistake due to not seeing the board. Further research is needed, but playing unconventional moves seems beneficial in blind Go.
Artificial Intelligence and Optimization with ParallelismOlivier Teytaud
This document discusses parallelism in artificial intelligence and evolutionary computation. It explains that comparison-based optimization algorithms, which include many evolutionary algorithms, can be naturally parallelized by speculatively running multiple branches in parallel with a branching factor of 3 or more. This allows theoretical logarithmic speedups to be achieved in practice through simple parallelization tricks.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
Tutorialmcts
1. Bandit-based Monte-Carlo planning: the game
of Go and beyond
Designing intelligent
agents with
Monte-Carlo Tree Search.
Olivier.Teytaud@inria.fr + F. Teytaud + H. Doghmen + others
TAO, Inria-Saclay IDF, Cnrs 8623,
Lri, Univ. Paris-Sud,
Digiteo Labs, Pascal
Network of
Excellence.
Keywords: UCB, EXP3, MCTS, UCT.
Paris
April 2011.
Games, games with hidden information,
games with simultaneous actions.
2. Key point
PLEASE INTERRUPT ME !
HAVE QUESTIONS !
LET'S HAVE A FRIENDLY SESSION !
ASK QUESTIONS NOW AND LATER BY MAIL!
olivier.teytaud@inria.fr
3. Outline
Introduction:
games / control / planning.
Standard approaches
Bandit-based Monte-Carlo Planning and
UCT.
Application to the
game of Go and
(far) beyond
7. A game is a directed graph with actions
and players and observations
Bob
Bear Bee
Bee 1 White
Black
2
3
White 12
43
White Black
Black
Black
Black
Games with simultaneous actions Paris 1st of February 7
8. A game is a directed graph with actions
and players and observations and rewards
Bob
Bear Bee
Bee 1 White
Black
2
3
+1
0
White 12
43 Rewards
White Black on leafs
Black only!
Black
Black
Games with simultaneous actions Paris 1st of February 8
9. A game is a directed graph +actions
+players +observations +rewards +loops
Bob
Bear Bee
Bee 1 White
Black
2
3
+1
0
White 12
43
White Black
Black
Black
Black
Games with simultaneous actions Paris 1st of February 9
10. More than games in this
formalism
A main application: the management of
many energy stocks in front of randomness
At each time step we see random outcomes
We have to make decisions (switching on or off)
We have losses
(ANR / NSC project)
11. Opening a reservoir produces energy
(and water goes to another reservoir)
Classical
Thermal plants
Reservoir 2
Reservoir 1
Reservoir 5 Electricity
demand
Reservoir 4 Reservoir 3
Nuclear
plants
Lost water
12. Outline
Introduction:
games / control / planning.
Standard approaches
Bandit-based Monte-Carlo Planning and
UCT.
Application to the
game of Go and
(far) beyond
13. What are the approaches ?
Dynamic programming (Massé – Bellman 50's)
(still the main approach in industry)
(minimax / alpha-beta in games)
Reinforcement learning (some promising results,
less used in industry)
Some tree exploration tools (less usual in
stochastic or continuous cases)
Bandit-Based Monte-Carlo planning
Scripts + tuning
14. What are the approaches ?
Dynamic programming (Massé – Bellman 50's)
(still the main approach in industry)
Where we are:
Done: Presentation of the problem.
Now: We briefly present dynamic
programming
Thereafter: We present MCTS / UCT.
15. Dynamic programming
V(x) = expectation of future loss if optimal
strategy after state x. (well defined)
u(x) such that the expectation of
V(f(x,u(x),A)) is minimal
Computation by dynamic programming
We compute V for all the final states X.
16. Dynamic programming
V(x) = expectation of C(xH) if optimal
strategy. (well defined)
u(x) such that the expectation of
V(f(x,u(x),A)) is minimal
Computation by dynamic programming
We compute V for all the final states X.
We compute V for all the “-1” states X.
17. Dynamic programming (DP)
V(x) = expectation of C(xH) if optimal
strategy. (well defined)
u(x) such that the expectation of
V(f(x,u(x),A)) is minimal
Computation by dynamic programming
We compute V for all the final states X.
We compute V for all the “-1” states X.
... ... ...
20. Alpha-beta = DP + pruning
“Nevertheless, I believe that a world-champion-
level
Go machine can be built within 10 years,
based upon the same method of intensive
analysis--brute force, basically--that
Deep Blue employed for chess.”
Hsu, IEEE Spectrum, 2007.
(==> I don't think so.)
21. Extensions of DP
Approximate dynamic programming (e.g.
for continuous domains)
Reinforcement learning
Case where f(...) or A is black-box
Huge state spaces
==> but lack of stability
Direct Policy Search,
Fitted-Q-Iteration...
==> there is room for improvements
22. Outline
Discrete time control: various approaches
Monte-Carlo Tree Search (UCT, MCTS;
2006)
Extensions
Weakness
Games as benchmarks ?
23. Monte-Carlo Tree Search
Monte-Carlo Tree Search (MCTS) appeared
in games.
R. Coulom. Efficient Selectivity and Backup Operators in
Monte-Carlo Tree Search. In Proceedings of the 5th
International Conference on Computers and Games, Turin, Italy,
2006.
Its most well-known variant is termed Upper
Confidence Tree (UCT).
35. Parallelizing MCTS
On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
On a parallel machine with no shared memory: one MCTS
per comp. node, and 3 times per second:
Select nodes with at least 5% of total sims (depth at most
3)
Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
36. Parallelizing MCTS
On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
On a parallel machine with no shared memory: one MCTS
per comp. node, and 3 times per second:
Select nodes with at least 5% of total sims (depth at most
3)
Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
37. Parallelizing MCTS
On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
On a parallel machine with no shared memory: one MCTS
per comp. node, and 3 times per second:
Select nodes with at least 5% of total sims (depth at most
3)
Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
38. Parallelizing MCTS
On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
On a parallel machine with no shared memory: one MCTS
per comp. node, and 3 times per second:
Select nodes with at least 5% of total sims (depth at most
3)
Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
39. Parallelizing MCTS
On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
On a parallel machine with no shared memory: one MCTS
per comp. node, and 3 times per second:
Select nodes with at least 5% of total sims (depth at most
3)
Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
45. Outline
Discrete time control: various approaches
Monte-Carlo Tree Search (UCT, MCTS; 2006)
Extensions
Weakness
Games as benchmarks ?
46. Outline
Discrete time control: various approaches
Monte-Carlo Tree Search (UCT, MCTS; 2006)
Extensions
More than UCT in MCTS
Infinite action spaces
Offline learning
Online learning
Expert knowledge
Hidden information
47. Why UCT is suboptimal for games ?
There are better formula than
mean + sqrt(log(...) / …) (=UCT)
MCTS, under mild conditions on games
(including deterministic two-player zero-
sum games), can be
consistent (→ best move);
frugal (if there is a good move, it does not
visit infinitely often all the tree).
(==> not true for UCT)
48. Why UCT is suboptimal for games ?
There are better formula than
mean + sqrt(log(...) / …) (=UCT)
MCTS, under mild There is better for deterministic
conditions on games
win/draw/loss games:
(including deterministic two-player zero-
(sumRewards+K)/
sum games), can be
(nbTrials+2K)
consistent (→ best move);
frugal (if there is a good move, it does not
visit infinitely often all the tree).
(==> not true for UCT)
49. Go: from 29 to 6 stones
Formula for
simulation
nbWins + 1
argmax ---------------
nbLosses + 2 Berthier, Doghmen, T.,
LION 2010
==> consistency
==> frugality
51. Outline
Discrete time control: various approaches
Monte-Carlo Tree Search (UCT, MCTS; 2006)
Extensions
More than UCT in MCTS
Infinite action spaces
Offline learning
Online learning
Expert knowledge
Hidden information
52. Infinite action spaces:
progressive widening
UCB1: Choose u maximizing the compromise:
Empirical average for decision u
+ √( log(i)/ number of trials with decision u )
==> argmax only on the i first arms
( [ 0.25 0.5 ] )
(Coulom, Chaslot et al, Wang et al)
53. Outline
Discrete time control: various approaches
Monte-Carlo Tree Search (UCT, MCTS; 2006)
Extensions
More than UCT in MCTS
Infinite action spaces
Offline learning
Online learning
Expert knowledge
Hidden information
54. Extensions
``Standard'' UCT:
score(situation,move) = compromise (in [0,1+] )
between
a) empirical quality
P ( win | nextMove(situation) = move )
estimated in simulations
b) exploration term
Remark: No offline learning
55. Extension: offline learning
(introducing imitation learning)
c) offline value (Bouzy et al) =
empirical estimate P ( played | pattern )
Pattern = ball of locations, each location either:
- this is black stone
- this is white stone
- this is empty
- this is not black stone
- this is not white stone
- this is not empty
- this is border
Support = frequency of “the center of this pattern is played”
Confidence = conditional frequency of play
Bias = confidence of pattern with max support
56. Extension: offline learning
(introducing imitation learning)
score(situation,move) = compromise between
a) empirical quality
b) exploration term
c) offline value (Bouzy et al, Coulom) =
empirical estimate P ( played | pattern )
for patterns with big support
==> estimated on database
At first, (c) is the most important; later, (a) dominates.
57. Outline
Discrete time control: various approaches
Monte-Carlo Tree Search (UCT, MCTS; 2006)
Extensions
More than UCT in MCTS
Infinite action spaces
Offline learning
Online learning
Expert knowledge
Hidden information
58. Extensions
``Standard'' UCT:
score(situation,move) = compromise between
a) empirical quality
b) exploration term
Remark: No learning from one situation to another
59. Extension: transient values
score(situation,move) = compromise between
a) empirical quality
P' ( win | nextMove(situation) = move )
estimated in simulations
b) exploration term
c) offline value
d) ``transient'' value: (Gelly et al, 07)
P' (win | move ∈ laterMoves(situations) )
==> brings information from node N to ancestor node M
==> does not bring information from node N to
descendants or cousins (many people have tried...)
Brügman, Gelly et al
60. Transient values = RAVE = very good
in many games
It works also in Havannah.
61. It works also in NoGo
NoGo = rules of Go
except that
capturing
==> loosing
62. Counter-example to RAVE, B2
By M. Müller
B2 makes sense
only if it is played
Immediately
(otherwise A5 kills).
63. Outline
Discrete time control: various approaches
Monte-Carlo Tree Search (UCT, MCTS; 2006)
Extensions
More than UCT in MCTS
Infinite action spaces
Offline learning
Online learning
Expert knowledge
Hidden information
64. Extensions
``Standard'' UCT:
score(situation,move) = compromise between
a) empirical quality
b) exploration term
Remarks: No expert rules
65. Extension: expert rules
score(situation,move) = compromise between
a) empirical quality
b) exploration term
c) offline value
d) transient value
e) expert rules
==> empirically derived linear combination
Most important terms,
(e)+(c) first,
then (d) becomes stronger,
finally (a) only
66. Extension: expert rules
in the Monte-Carlo part
Decisive moves: play immediate wins.
Anti-decisive moves: don't play moves with immediate
winning reply.
Teytaud&Teytaud, CIG2010:
can be fast in connection games. E.g. Havannah:
67. Go: from 29 to 6 stones
1998: loss against amateur (6d) 19x19 H29
2008: win against a pro (8p) 19x19, H9 MoGo
2008: win against a pro (4p) 19x19, H8 CrazyStone
2008: win against a pro (4p) 19x19, H7 CrazyStone
2009: win against a pro (9p) 19x19, H7 MoGo
2009: win against a pro (1p) 19x19, H6 MoGo
2010: win against a pro (4p) 19x19, H6 Zen
2010: win against a pro (5p) 19x19, H6 Zen
2007: win against a pro (5p) 9x9 (blitz) MoGo
2008: win against a pro (5p) 9x9 white MoGo
2009: win against a pro (5p) 9x9 black MoGo
2009: win against a pro (9p) 9x9 white Fuego
2009: win against a pro (9p) 9x9 black MoGoTW
==> still 6 stones at least!
68. Go: from 29 to 6 stones
1998: loss against amateur (6d) 19x19 H29
2008: win against a pro (8p) 19x19, H9 MoGo
2008: win against a pro (4p) 19x19, H8 CrazyStone
2008: win against a pro (4p) 19x19, H7 CrazyStone
2009: win against a pro (9p) 19x19, H7 MoGo
2009: win against a pro (1p) 19x19, H6 MoGo
2010: win against a pro (5p) 19x19, H6 Zen
Wins with H6 / H7
are lucky (rare)
2007: win against a pro (5p) 9x9 (blitz) MoGo
wins
2008: win against a pro (5p) 9x9 white MoGo
2009: win against a pro (5p) 9x9 black MoGo
2009: win against a pro (9p) 9x9 white Fuego
2009: win against a pro (9p) 9x9 black MoGoTW
==> still 6 stones at least!
69. Go: from 29 to 6 stones
1998: loss against amateur (6d) 19x19 H29
2008: win against a pro (8p) 19x19, H9 MoGo
2008: win against a pro (4p) 19x19, H8 CrazyStone
2008: win against a pro (4p) 19x19, H7 CrazyStone
2009: win against a pro (9p) 19x19, H7 MoGo
2009: win against a pro (1p) 19x19, H6 MoGo
2010: win against a pro (5p) 19x19, H6 Zen
2007: win against a pro (5p) 9x9 (blitz) MoGo
2008: win against a pro (5p) 9x9 white MoGo
2009: win against a pro (5p) 9x9 black MoGo
2009: win against a pro (9p) 9x9 white Fuego
2009: win against a pro (9p) 9x9 black MoGoTW
Win with
disadvantageous
==> still 6 stones at least! side.
70. 13x13 Go: new results!
9x9 Go: computers are at the human best level.
- Fuego won against a top level human as white
- mogoTW did it both as black and white and regularly wins
some games against the top players.
- mogoTW won ¾ yesterday in blind go (blind go = go in 9x9
according to the pros)
19x19 Go: the best humans still (almost always) win easily with
7 handicap stones.
In WCCI 2010, experiments in 13x13 Go:
- MoGo won 2/2 against 6D players with handicap 2
- MfoG won 1/2 against 6D players with handicap 2
- Fuego won 0/2 against 6D players with handicap 2
And yesterday MoGoTW won one game with handicap 2.5!
71. Outline
Discrete time control: various approaches
Monte-Carlo Tree Search (UCT, MCTS; 2006)
Extensions
More than UCT in MCTS
Infinite action spaces
Offline learning
Online learning
Expert knowledge
Hidden information
72. Bandits
We have seen UCB:
choose action with maximal score
Q(action,state) =
empirical_reward(action,state)
+sqrt(
log(nbSims(state)) / nbSims(action,state) )
EXP3 is another bandit:
For adversarial cases
Based on a stochastic formula
73. EXP3 in one slide
Grigoriadis et al, Auer et al, Audibert & Bubeck Colt 2009
74. MCTS for simultaneous actions
Player 1 plays
Player 2 plays Both players
play
... Player 1 plays
Player 2 plays
75. MCTS for simultaneous actions
Player 1 plays Flory, Teytaud,
= maxUCB node Evostar 2011
Player 2 plays
=minUCB node Both players play
=EXP3 node
Player 1 plays
... Player 2 plays =maxUCB node
=minUCB node
76. MCTS for hidden information
Player 1
Observation set 1 Observation set 2
EXP3 node EXP3 node
Observation set 3
EXP3 node
Observation set 2
Player 2
Observation set 1 EXP3 node
EXP3 node
Observation set 3
EXP3 node
77. MCTS for hidden information
Player 1
Observation set 1 Observation set 2
EXP3 node EXP3 node
Observation set 3
EXP3 node “Observation set”
= set of
sequences
of observations
Observation set 2
Player 2
Observation set 1 EXP3 node
EXP3 node
Observation set 3
EXP3 node
78. MCTS for hidden information
Player 1
Observation set 1 Observation set 2
EXP3 node EXP3 node
Observation set 3
EXP3 node “Observation set”
= set of
Here, possible sequences
sequences of of observations
observations are
partitioned in 3 Observation set 2
Player 2
Observation set 1 EXP3 node
EXP3 node
Observation set 3
EXP3 node
79. MCTS for hidden information
Player 1
Observation set 1 Observation set 2
EXP3 node EXP3 node
Observation set 3
EXP3 node Thanks Martin
(incrementally + application to phantom-tic-tac-toe: see D. Auger 2011)
Observation set 2
Player 2
Observation set 1 EXP3 node
EXP3 node
Observation set 3
EXP3 node
80. MCTS for hidden information
Player 1
Observation set 1 Observation set 2
EXP3 node EXP3 node
Observation set 3
EXP3 node
Use EXP3:
(incrementally + application to phantom-tic-tac-toe: see D. Auger 2010) in
consistent even
adversarial setting.
Observation set 2
Player 2
Observation set 1 EXP3 node
EXP3 node
Observation set 3
EXP3 node
81. MCTS with hidden information
While (there is time for thinking)
{
s=initial state
while (s not terminal)
{
os1=observationSet1=(); os2=()
b1=bandit1(os1); b2=bandit2(os2)
d1=b1.makeDecision;d2=b2.makeDecision
(s,o1,o2)=transition(s,d1,d2)
os1=os1.o1, os2=os2.o2
}
send reward to all bandits in the simulation
}
82. MCTS with hidden information
While (there is time for thinking)
{
s=initial state
while (s not terminal)
{
os1=observationSet1=(); os2=()
b1=bandit1(os1); b2=bandit2(os2)
d1=b1.makeDecision;d2=b2.makeDecision
(s,o1,o2)=transition(s,d1,d2)
os1=os1.o1, os2=os2.o2
}
send reward to all bandits in the simulation
}
83. MCTS with hidden information
While (there is time for thinking)
{
s=initial state
while (s not terminal)
{
os1=observationSet1=(); os2=()
b1=bandit1(os1); b2=bandit2(os2)
d1=b1.makeDecision;d2=b2.makeDecision
(s,o1,o2)=transition(s,d1,d2)
os1=os1.o1, os2=os2.o2
}
send reward to all bandits in the simulation
}
84. MCTS with hidden information
While (there is time for thinking)
{
s=initial state
while (s not terminal)
{
os1=observationSet1=(); os2=()
b1=bandit1(os1); b2=bandit2(os2)
d1=b1.makeDecision;d2=b2.makeDecision
(s,o1,o2)=transition(s,d1,d2)
os1=os1.o1, os2=os2.o2
}
send reward to all bandits in the simulation
}
85. MCTS with hidden information
While (there is time for thinking)
{
s=initial state
while (s not terminal)
{
os1=observationSet1=(); os2=()
b1=bandit1(os1); b2=bandit2(os2)
d1=b1.makeDecision;d2=b2.makeDecision
(s,o1,o2)=transition(s,d1,d2)
os1=os1.o1, os2=os2.o2
}
send reward to all bandits in the simulation
}
86. MCTS with hidden information
While (there is time for thinking)
{
s=initial state
while (s not terminal)
{
os1=observationSet1=(); os2=()
b1=bandit1(os1); b2=bandit2(os2)
d1=b1.makeDecision;d2=b2.makeDecision
(s,o1,o2)=transition(s,d1,d2)
os1=os1.o1, os2=os2.o2
}
send reward to all bandits in the simulation
}
87. MCTS with hidden information
While (there is time for thinking)
{
s=initial state
while (s not terminal)
{
os1=observationSet1=(); os2=()
b1=bandit1(os1); b2=bandit2(os2)
d1=b1.makeDecision;d2=b2.makeDecision
(s,o1,o2)=transition(s,d1,d2)
os1=os1.o1, os2=os2.o2
}
send reward to all bandits in the simulation
}
88. MCTS with hidden information
While (there is time for thinking)
{
s=initial state
while (s not terminal)
{
os1=observationSet1=(); os2=()
b1=bandit1(os1); b2=bandit2(os2)
d1=b1.makeDecision;d2=b2.makeDecision
(s,o1,o2)=transition(s,d1,d2)
os1=os1.o1, os2=os2.o2
}
send reward to all bandits in the simulation
}
89. MCTS with hidden information
While (there is time for thinking)
{
s=initial state
while (s not terminal)
{
os1=observationSet1=(); os2=()
b1=bandit1(os1); b2=bandit2(os2)
d1=b1.makeDecision;d2=b2.makeDecision
(s,o1,o2)=transition(s,d1,d2)
os1=os1.o1, os2=os2.o2
}
send reward to all bandits in the simulation
}
90. MCTS with hidden information:
incremental version
While (there is time for thinking)
{
Possibly refine
s=initial state
the family
while (s not terminal)
{
of bandits.
os1=observationSet1=(); os2=()
b1=bandit1(os1); b2=bandit2(os2)
d1=b1.makeDecision;d2=b2.makeDecision
(s,o1,o2)=transition(s,d1,d2)
os1=os1.o1, os2=os2.o2
}
send reward to all bandits in the simulation
}
92. Let's have fun with Urban Rivals (4 cards)
Each player has
- four cards (each one can be used once)
- 12 pilz (each one can be used once)
- 12 life points
Each card has:
- one attack level
- one damage
- special effects (forget it for the moment)
Four turns:
- P1 attacks P2
- P2 attacks P1
- P1 attacks P2
- P2 attacks P1
Games with simultaneous actions Paris 1st of February 92
93. Let's have fun with Urban Rivals
First, attacker plays:
- chooses a card
- chooses ( PRIVATELY ) a number of pilz
Attack level = attack(card) x (1+nb of pilz)
Then, defender plays:
- chooses a card
- chooses a number of pilz
Defense level = attack(card) x (1+nb of pilz)
Result:
If attack > defense
Defender looses Power(attacker's card)
Else
Attacker looses Power(defender's card)
Games with simultaneous actions Paris 1st of February 93
94. Let's have fun with Urban Rivals
==> The MCTS-based AI is now at the best human level.
Experimental (only) remarks on EXP3:
- discard strategies with small number of sims = better approx
of the Nash
- also an improvement by taking into
account the other bandit
- not yet compared to INF
- virtual simulations (inspired by Kummer)
Games with simultaneous actions Paris 1st of February 94
95. Let's have fun with a nice
application
) We are now at the
best human
level in Urban Rivals.
96. Outline
Discrete time control: various approaches
Monte-Carlo Tree Search (UCT, MCTS; 2006)
Extensions
Weakness
Games as benchmarks ?
104. Game of Go: counting territories
(white has 7.5 “bonus” as black starts)
105. Game of Go: the rules
Black plays at the blue circle: the
white group dies (it is removed)
It's impossible to kill white (two “eyes”).
“Ko” rules: we don't come back to the same situation.
(without ko: “PSPACE hard”
with ko: “EXPTIME-complete”)
At the end, we count territories
==> black starts, so +7.5 for white.
108. Key point in Go: there are human-easy
situations which are computer-hard.
We'll see much
easier
situations poorly
understood.
(komi 7.5)
108
109. Difficult for computers (win for
black, playing A)
We'll see much
easier
situations poorly
understood.
(komi 7.5)
But let's see an
easier case.
109
110. A trivial semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
111. Semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
112. Semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
113. Semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
114. Semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
115. Semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
116. Semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
117. Semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
118. A trivial semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
119. A trivial semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
120. A trivial semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
121. It does not work. Why ?
50% of estimated
win probability!
In the first node:
The first simulations give ~ 50%
The next simulations go to 100% or 0% (depending
on the chosen move)
But, then, we switch to another node
(~ 8! x 8! such nodes)
122. And the humans ?
50% of estimated
win probability!
In the first node:
The first simulations give ~ 50%
The next simulations go to 100% or 0% (depending
on the chosen move)
But, then, we DON'T switch to another node
135. What else ? Games with
simultaneous actions or hidden information
Flory, Teytaud,
Evostar 2011
Games with hidden information
Games with simultaneous actions.
UrbanRivals = internet card game;
11 millions of registered users.
Game with hidden information.
Frédéric Lemoine MIG 11/07/2008 135
138. “Real” games
Assumption: if a computer understands and guesses spins, then
this robot will be efficient for something else than just games.
(holds true for Go)
Frédéric Lemoine MIG 11/07/2008 138
139. “Real” games
Assumption: if a computer understands and guesses spins, then
this robot will be efficient for something else than just games.
VS
Frédéric Lemoine MIG 11/07/2008 139
141. When is MCTS relevant ?
Robust in front of:
High dimension;
Non-convexity of Bellman values;
Complex models
Delayed reward
142. When is MCTS relevant ?
Robust in front of:
High dimension;
Non-convexity of Bellman values;
Complex models
Delayed reward
More difficult for
High values of H;
Highly unobservable cases (Monte-Carlo, but not
Monte-Carlo Tree Search, see Cazenave et al.)
Lack of reasonable baseline for the MC
143. When is MCTS relevant ?
Robust in front of:
High dimension;
Non-convexity of Bellman values;
Complex models
Delayed reward Go:
H ~300
Dimension = 361
More difficult for Fully observable
High values of H; Fully delayed reward
Highly unobservables cases
Lack of reasonnable baseline for the MC
144. When is MCTS relevant ?
How to apply it:
Implement the transition
(a function action x state → state )
Design a Monte-Carlo part (a random simulation)
(a heuristic in one-player games;
difficult if two opponents)
==> at this point you can simulate...
Implement UCT (just a bias in the simulator – no real optimizer)
145. When is MCTS relevant ?
How to apply it:
Implement the transition
(a function action x state → state )
Design a Monte-Carlo part (a random simulation)
(a heuristic in one-player games;
difficult if two opponents)
==> at this point you can simulate...
Implement UCT (just a bias in the simulator – no real optimizer)
Possibly
RAVE values (Gelly et al)
Parallelize multicores + MPI (Cazenave et al, Gelly et al)
Decisive moves + anti-decisive moves (Teytaud et al)
Patterns (Bouzy et al)
146. Advantages of MCTS:
easy + visible
Many indicators (not only expectation;
simulation-based; visible; easy to check)
Algorithm indeed simpler (unless in-depth
optimization as for Go competition...) than DP
Anytime (you stop when you want)
148. Drawback of MCTS
Recent method
Impact of H not clearly known (?)
No free lunch: a model of the transition /
uncertainties is required (but, an advantage:
no constraint)
(but: Fonteneau et al, Model-free MC)
149. Conclusion
Essentially asymptotically proved only
Empirically good for
The game of Go
Some other (difficult) games
Non-linear expensive optimization
Active learning
Tested industrially (Spiral library – architecture-specific )
There are understood (but not solved) weaknesses
Next challenge:
Solve these weaknesses (introducing learning ? Refut. Tables ? Drake et
al)
More industrial applications
Partially observable cases (Cazenave et al, Rolet et al, Auger)
H large, truncating (Lorentz)
Scalability (Doghmen et al)