Monte-Carlo Tree Search (MCTS) is an approach for computer Go that uses Monte Carlo simulations to evaluate positions and build a search tree. The MCTS approach selects moves using the UCT algorithm, which balances exploitation of promising child nodes based on past results and exploration of lesser-visited nodes. Simulations are conducted by randomly playing out moves until the end of the game, then updating the search tree with the outcome. This allows MCTS to gradually improve its evaluations and identify stronger moves without relying on expert knowledge or complex position analysis.
Alpha-beta pruning is a modification of the minimax algorithm that optimizes it by pruning portions of the search tree that cannot affect the outcome. It uses two thresholds, alpha and beta, to track the best values found for the maximizing and minimizing players. By comparing alpha and beta at each node, it can avoid exploring subtrees where the minimum of the maximizing player's options will be greater than the maximum of the minimizing player's options. This allows it to often prune branches of the tree without calculating their values, improving the algorithm's efficiency.
This document provides an overview of an Artificial Intelligence course. The course aims to introduce students to the broad field of AI and prepare them for opportunities in the AI field. It will cover topics like searching, knowledge representation, learning, neural networks, and applications of AI. The course objectives are to provide a basic foundation of concepts in searching and knowledge representation. Students will complete laboratory tasks involving predicate calculus, searching, and neural networks to apply their learning.
The document discusses constraint satisfaction problems (CSPs). It defines a CSP as having variables with domains of possible values and constraints limiting the values variables can take. A solution assigns values to all variables while satisfying constraints. The document outlines backtracking search and constraint propagation techniques for solving CSPs, including variable and value ordering heuristics, forward checking, and arc consistency. Arc consistency is more effective than forward checking at detecting inconsistencies and pruning the search space. The document provides examples of CSP formulations for map coloring, Sudoku, and N-Queens problems.
Here are the steps to solve an 8-puzzle problem using BFS or A*:
1. Represent the start and goal states as 3x3 matrices with the numbers 1-8 and a blank space.
2. For BFS:
- Create a queue and add the start state to it
- Repeatedly dequeue the first state and enqueue its successors
- A successor is obtained by swapping the blank space with an adjacent number
- Continue until the goal state is found or the queue is empty
3. For A*:
- Create a priority queue ordered by f(n) = g(n) + h(n)
- Where g(n) is the cost to reach state n
The document discusses the Minimax algorithm and its application to game trees. It explains that Minimax is an optimal decision-making procedure for two-player zero-sum games where one player tries to maximize their score and the other tries to minimize it. It provides examples of how Minimax can be applied to games like Tic-Tac-Toe, Chess, Poker, and Monopoly to find the best move assuming the opponent plays optimally.
Monte Carlo Tree Search for the Super Mario BrosChih-Sheng Lin
This document discusses using Monte Carlo Tree Search (MCTS) to design a controller for the game Super Mario Bros. It begins with an introduction to MCTS and its basic algorithm. It then covers applying MCTS to the problem by formulating the game states and actions. The document proposes a MCTS-based controller and improvements to the UCT selection method, including using a best-of-N simulation strategy to evaluate candidate actions instead of random simulations.
Dr. Subrat Panda gave an introduction to reinforcement learning. He defined reinforcement learning as dealing with agents that must sense and act upon their environment to receive delayed scalar feedback in the form of rewards. He described key concepts like the Markov decision process framework, value functions, Q-functions, exploration vs exploitation, and extensions like deep reinforcement learning. He listed several real-world applications of reinforcement learning and resources for learning more.
Adversarial search is a technique used in game playing to determine the best move when facing an opponent who is also trying to maximize their score. It involves searching through possible future game states called a game tree to evaluate the best outcome. The minimax algorithm searches the entire game tree to determine the optimal move by assuming the opponent will make the best counter-move. Alpha-beta pruning improves on minimax by pruning branches that cannot affect the choice of move. Modern game programs use techniques like precomputed databases, sophisticated evaluation functions, and extensive search to defeat human champions at games like checkers, chess, and Othello.
Alpha-beta pruning is a modification of the minimax algorithm that optimizes it by pruning portions of the search tree that cannot affect the outcome. It uses two thresholds, alpha and beta, to track the best values found for the maximizing and minimizing players. By comparing alpha and beta at each node, it can avoid exploring subtrees where the minimum of the maximizing player's options will be greater than the maximum of the minimizing player's options. This allows it to often prune branches of the tree without calculating their values, improving the algorithm's efficiency.
This document provides an overview of an Artificial Intelligence course. The course aims to introduce students to the broad field of AI and prepare them for opportunities in the AI field. It will cover topics like searching, knowledge representation, learning, neural networks, and applications of AI. The course objectives are to provide a basic foundation of concepts in searching and knowledge representation. Students will complete laboratory tasks involving predicate calculus, searching, and neural networks to apply their learning.
The document discusses constraint satisfaction problems (CSPs). It defines a CSP as having variables with domains of possible values and constraints limiting the values variables can take. A solution assigns values to all variables while satisfying constraints. The document outlines backtracking search and constraint propagation techniques for solving CSPs, including variable and value ordering heuristics, forward checking, and arc consistency. Arc consistency is more effective than forward checking at detecting inconsistencies and pruning the search space. The document provides examples of CSP formulations for map coloring, Sudoku, and N-Queens problems.
Here are the steps to solve an 8-puzzle problem using BFS or A*:
1. Represent the start and goal states as 3x3 matrices with the numbers 1-8 and a blank space.
2. For BFS:
- Create a queue and add the start state to it
- Repeatedly dequeue the first state and enqueue its successors
- A successor is obtained by swapping the blank space with an adjacent number
- Continue until the goal state is found or the queue is empty
3. For A*:
- Create a priority queue ordered by f(n) = g(n) + h(n)
- Where g(n) is the cost to reach state n
The document discusses the Minimax algorithm and its application to game trees. It explains that Minimax is an optimal decision-making procedure for two-player zero-sum games where one player tries to maximize their score and the other tries to minimize it. It provides examples of how Minimax can be applied to games like Tic-Tac-Toe, Chess, Poker, and Monopoly to find the best move assuming the opponent plays optimally.
Monte Carlo Tree Search for the Super Mario BrosChih-Sheng Lin
This document discusses using Monte Carlo Tree Search (MCTS) to design a controller for the game Super Mario Bros. It begins with an introduction to MCTS and its basic algorithm. It then covers applying MCTS to the problem by formulating the game states and actions. The document proposes a MCTS-based controller and improvements to the UCT selection method, including using a best-of-N simulation strategy to evaluate candidate actions instead of random simulations.
Dr. Subrat Panda gave an introduction to reinforcement learning. He defined reinforcement learning as dealing with agents that must sense and act upon their environment to receive delayed scalar feedback in the form of rewards. He described key concepts like the Markov decision process framework, value functions, Q-functions, exploration vs exploitation, and extensions like deep reinforcement learning. He listed several real-world applications of reinforcement learning and resources for learning more.
Adversarial search is a technique used in game playing to determine the best move when facing an opponent who is also trying to maximize their score. It involves searching through possible future game states called a game tree to evaluate the best outcome. The minimax algorithm searches the entire game tree to determine the optimal move by assuming the opponent will make the best counter-move. Alpha-beta pruning improves on minimax by pruning branches that cannot affect the choice of move. Modern game programs use techniques like precomputed databases, sophisticated evaluation functions, and extensive search to defeat human champions at games like checkers, chess, and Othello.
Alpha-beta pruning is a technique used in game tree search to prune branches that cannot possibly change the outcome. It uses two values - alpha, the highest value for the maximizing player, and beta, the lowest value for the minimizing player. The algorithm traverses the game tree recursively, pruning branches where the value at a node exceeds beta (for maximizing) or falls below alpha (for minimizing). This allows portions of the tree to be skipped over, improving search efficiency. The example shows how alpha and beta values are updated during traversal and used to prune subtrees without affecting the optimal solution.
The document discusses knowledge-based agents and how they use inference to derive new representations of the world from their knowledge base in order to determine what actions to take. It provides the example of an agent exploring a cave, or "Wumpus world", where the goal is to locate gold and exit without being killed by the Wumpus monster or falling into pits. The agent uses its percepts and knowledge base along with inference rules to deduce its next action at each step.
In this talk we discuss about the aplicação of Reinforcement Learning to Games. Recently, OpenAI created an algorithm capable of beating a human team in DOTA, considered a game with great amount of complexity and strategy. In this talk, we'll evaluate the role Reinforcement Learning plays in the world of games, taking a look at some of main achievements and how they look like in terms of implementation. We'll also take a look at some of the history of AI applied to games and how things evolved over time.
The document discusses game playing in artificial intelligence. It describes how general game playing (GGP) involves designing AI that can play multiple games by learning the rules, rather than being programmed for a specific game. The document outlines how the minimax algorithm is commonly used for game playing, involving move generation and static evaluation functions to search game trees and determine the best move by maximizing or minimizing values at each level.
This document discusses adversarial search techniques used in artificial intelligence to model games as search problems. It introduces the minimax algorithm and alpha-beta pruning to determine optimal strategies by looking ahead in the game tree. These techniques allow computers to search deeper and play games like chess and Go at a world-champion level by evaluating board positions and pruning unfavorable branches in the search.
1. The document describes an artificial intelligence implementation of the tic-tac-toe game using the minimax algorithm.
2. It provides details on the game rules, initial and goal states, and the state space tree and winning conditions.
3. The minimax approach is then explained as a recursive algorithm that evaluates all possible future moves from the current state and assumes the opponent will make the choice that results in the least preferred outcome.
This document discusses adversarial search in artificial intelligence. It provides an overview of games and introduces the minimax algorithm. The minimax algorithm is used to determine optimal strategies in two-player adversarial games by recursively considering all possible moves by both players. Tic-tac-toe is given as an example game where minimax can be applied to choose the best first move. The properties and limitations of the minimax algorithm are also summarized.
The document discusses how AlphaGo, a computer program developed by DeepMind, was able to defeat world champion Lee Sedol at the game of Go. It achieved this through a combination of deep learning and tree search techniques. Four deep neural networks were used: three convolutional networks to reduce the action space and search depth through imitation learning, self-play reinforcement learning, and value prediction; and a smaller network for faster simulations. This combination of deep learning and search allowed AlphaGo to master the complex game of Go, demonstrating the capabilities of modern AI.
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha
the presentation of the article "Mastering the game of Go with deep neural networks and tree search" given at the Optimization Seminar 2015/2016
Notes:
- All URLs are clickable.
- All citations are clickable (when hovered over the "year" part of "[author year]").
- To download without a SlideShare account, use https://www.dropbox.com/s/p4rnlhoewbedkjg/AlphaGo.pdf?dl=0
- The corresponding leaflet is available at http://www.slideshare.net/KarelHa1/leaflet-for-the-talk-on-alphago
- The source code is available at https://github.com/mathemage/AlphaGo-presentation
This document provides an overview of deep deterministic policy gradient (DDPG), which combines aspects of DQN and policy gradient methods to enable deep reinforcement learning with continuous action spaces. It summarizes DQN and its limitations for continuous domains. It then explains policy gradient methods like REINFORCE, actor-critic, and deterministic policy gradient (DPG) that can handle continuous action spaces. DDPG adopts key elements of DQN like experience replay and target networks, and models the policy as a deterministic function like DPG, to apply deep reinforcement learning to complex continuous control tasks.
Artificial Intelligence: Introduction, Typical Applications. State Space Search: Depth Bounded
DFS, Depth First Iterative Deepening. Heuristic Search: Heuristic Functions, Best First Search,
Hill Climbing, Variable Neighborhood Descent, Beam Search, Tabu Search. Optimal Search: A
*
algorithm, Iterative Deepening A*
, Recursive Best First Search, Pruning the CLOSED and OPEN
Lists
This presentation contains an introduction to reinforcement learning, comparison with others learning ways, introduction to Q-Learning and some applications of reinforcement learning in video games.
Reinforcement learning is a machine learning technique that involves trial-and-error learning. The agent learns to map situations to actions by trial interactions with an environment in order to maximize a reward signal. Deep Q-networks use reinforcement learning and deep learning to allow agents to learn complex behaviors directly from high-dimensional sensory inputs like pixels. DQN uses experience replay and target networks to stabilize learning from experiences. DQN has achieved human-level performance on many Atari 2600 games.
AI Greedy & A* Informed Search Strategies by ExampleAhmed Gad
Explaining how informed search strategies in Artificial Intelligence (AI) works by an example.
Two informed search strategies are explained by an example:
Greedy Best-First Search.
A* Search.
Find me on:
AFCIT
http://www.afcit.xyz
YouTube
https://www.youtube.com/channel/UCuewOYbBXH5gwhfOrQOZOdw
Google Plus
https://plus.google.com/u/0/+AhmedGadIT
SlideShare
https://www.slideshare.net/AhmedGadFCIT
LinkedIn
https://www.linkedin.com/in/ahmedfgad/
ResearchGate
https://www.researchgate.net/profile/Ahmed_Gad13
Academia
https://www.academia.edu/
Google Scholar
https://scholar.google.com.eg/citations?user=r07tjocAAAAJ&hl=en
Mendelay
https://www.mendeley.com/profiles/ahmed-gad12/
ORCID
https://orcid.org/0000-0003-1978-8574
StackOverFlow
http://stackoverflow.com/users/5426539/ahmed-gad
Twitter
https://twitter.com/ahmedfgad
Facebook
https://www.facebook.com/ahmed.f.gadd
Pinterest
https://www.pinterest.com/ahmedfgad/
The document discusses reinforcement learning, including Q-learning. It provides an overview of reinforcement learning, describing what it is, important machine learning algorithms for it like Q-learning, and how Q-learning works in theory and practice. It also discusses challenges of reinforcement learning, potential applications, and links between reinforcement learning algorithms and human psychology.
This document discusses various heuristic search algorithms including generate-and-test, hill climbing, best-first search, problem reduction, and constraint satisfaction. Generate-and-test involves generating possible solutions and testing if they are correct. Hill climbing involves moving in the direction that improves the state based on a heuristic evaluation function. Best-first search evaluates nodes and expands the most promising node first. Problem reduction breaks problems into subproblems. Constraint satisfaction views problems as sets of constraints and aims to constrain the problem space as much as possible.
AI_Session 11: searching with Non-Deterministic Actions and partial observati...Asst.prof M.Gokilavani
This document summarizes a session on problem solving by search in artificial intelligence. It discusses uninformed and informed search strategies like breadth-first search, uniform cost search, depth-first search, greedy best-first search, and A* search. It also covers searching with non-deterministic actions, partial observations, and online search agents operating in unknown environments. Examples discussed include the vacuum world problem and how search trees are used to handle non-determinism through contingency planning. The next session will cover online search agents operating in unknown environments.
Game playing in artificial intelligent technique syeda zoya mehdi
The document discusses game artificial intelligence and techniques used to generate intelligent behavior in non-player characters in computer and video games. It covers topics like machine learning, reinforcement learning, pathfinding algorithms, and different data structures used to represent game boards and chess positions. Game AI aims to create behavior that feels natural to the player while obeying the rules of the game. Various computer science disciplines are required to develop effective game AI, and different types of games require different AI techniques.
Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)ftgaic
This are the presentation slides for the following paper:
Shubu Yoshida, Makoto Ishihara, Taichi Miyazaki, Yuto Nakagawa, Tomohiro Harada, and Ruck Thawonmas, "Application of Monte-Carlo Tree Search in a Fighting Game AI," accepted for presentation at the 5th IEEE Global Conference on Consumer Electronics (GCCE 2016), Kyoto, Japan, Oct. 11-14, 2016.
This document outlines an AI agent called MctsAi that uses Monte Carlo tree search with upper confidence bounds (UCT) for the FightingICE fighting game platform. It repeats the process of selection, expansion, playout, and backpropagation of nodes in the search tree until time runs out. The UCB1 formula is used to balance exploration of less visited nodes with high potential value during selection. Actions are selected based on the most visited child node of the root at the end.
Alpha-beta pruning is a technique used in game tree search to prune branches that cannot possibly change the outcome. It uses two values - alpha, the highest value for the maximizing player, and beta, the lowest value for the minimizing player. The algorithm traverses the game tree recursively, pruning branches where the value at a node exceeds beta (for maximizing) or falls below alpha (for minimizing). This allows portions of the tree to be skipped over, improving search efficiency. The example shows how alpha and beta values are updated during traversal and used to prune subtrees without affecting the optimal solution.
The document discusses knowledge-based agents and how they use inference to derive new representations of the world from their knowledge base in order to determine what actions to take. It provides the example of an agent exploring a cave, or "Wumpus world", where the goal is to locate gold and exit without being killed by the Wumpus monster or falling into pits. The agent uses its percepts and knowledge base along with inference rules to deduce its next action at each step.
In this talk we discuss about the aplicação of Reinforcement Learning to Games. Recently, OpenAI created an algorithm capable of beating a human team in DOTA, considered a game with great amount of complexity and strategy. In this talk, we'll evaluate the role Reinforcement Learning plays in the world of games, taking a look at some of main achievements and how they look like in terms of implementation. We'll also take a look at some of the history of AI applied to games and how things evolved over time.
The document discusses game playing in artificial intelligence. It describes how general game playing (GGP) involves designing AI that can play multiple games by learning the rules, rather than being programmed for a specific game. The document outlines how the minimax algorithm is commonly used for game playing, involving move generation and static evaluation functions to search game trees and determine the best move by maximizing or minimizing values at each level.
This document discusses adversarial search techniques used in artificial intelligence to model games as search problems. It introduces the minimax algorithm and alpha-beta pruning to determine optimal strategies by looking ahead in the game tree. These techniques allow computers to search deeper and play games like chess and Go at a world-champion level by evaluating board positions and pruning unfavorable branches in the search.
1. The document describes an artificial intelligence implementation of the tic-tac-toe game using the minimax algorithm.
2. It provides details on the game rules, initial and goal states, and the state space tree and winning conditions.
3. The minimax approach is then explained as a recursive algorithm that evaluates all possible future moves from the current state and assumes the opponent will make the choice that results in the least preferred outcome.
This document discusses adversarial search in artificial intelligence. It provides an overview of games and introduces the minimax algorithm. The minimax algorithm is used to determine optimal strategies in two-player adversarial games by recursively considering all possible moves by both players. Tic-tac-toe is given as an example game where minimax can be applied to choose the best first move. The properties and limitations of the minimax algorithm are also summarized.
The document discusses how AlphaGo, a computer program developed by DeepMind, was able to defeat world champion Lee Sedol at the game of Go. It achieved this through a combination of deep learning and tree search techniques. Four deep neural networks were used: three convolutional networks to reduce the action space and search depth through imitation learning, self-play reinforcement learning, and value prediction; and a smaller network for faster simulations. This combination of deep learning and search allowed AlphaGo to master the complex game of Go, demonstrating the capabilities of modern AI.
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha
the presentation of the article "Mastering the game of Go with deep neural networks and tree search" given at the Optimization Seminar 2015/2016
Notes:
- All URLs are clickable.
- All citations are clickable (when hovered over the "year" part of "[author year]").
- To download without a SlideShare account, use https://www.dropbox.com/s/p4rnlhoewbedkjg/AlphaGo.pdf?dl=0
- The corresponding leaflet is available at http://www.slideshare.net/KarelHa1/leaflet-for-the-talk-on-alphago
- The source code is available at https://github.com/mathemage/AlphaGo-presentation
This document provides an overview of deep deterministic policy gradient (DDPG), which combines aspects of DQN and policy gradient methods to enable deep reinforcement learning with continuous action spaces. It summarizes DQN and its limitations for continuous domains. It then explains policy gradient methods like REINFORCE, actor-critic, and deterministic policy gradient (DPG) that can handle continuous action spaces. DDPG adopts key elements of DQN like experience replay and target networks, and models the policy as a deterministic function like DPG, to apply deep reinforcement learning to complex continuous control tasks.
Artificial Intelligence: Introduction, Typical Applications. State Space Search: Depth Bounded
DFS, Depth First Iterative Deepening. Heuristic Search: Heuristic Functions, Best First Search,
Hill Climbing, Variable Neighborhood Descent, Beam Search, Tabu Search. Optimal Search: A
*
algorithm, Iterative Deepening A*
, Recursive Best First Search, Pruning the CLOSED and OPEN
Lists
This presentation contains an introduction to reinforcement learning, comparison with others learning ways, introduction to Q-Learning and some applications of reinforcement learning in video games.
Reinforcement learning is a machine learning technique that involves trial-and-error learning. The agent learns to map situations to actions by trial interactions with an environment in order to maximize a reward signal. Deep Q-networks use reinforcement learning and deep learning to allow agents to learn complex behaviors directly from high-dimensional sensory inputs like pixels. DQN uses experience replay and target networks to stabilize learning from experiences. DQN has achieved human-level performance on many Atari 2600 games.
AI Greedy & A* Informed Search Strategies by ExampleAhmed Gad
Explaining how informed search strategies in Artificial Intelligence (AI) works by an example.
Two informed search strategies are explained by an example:
Greedy Best-First Search.
A* Search.
Find me on:
AFCIT
http://www.afcit.xyz
YouTube
https://www.youtube.com/channel/UCuewOYbBXH5gwhfOrQOZOdw
Google Plus
https://plus.google.com/u/0/+AhmedGadIT
SlideShare
https://www.slideshare.net/AhmedGadFCIT
LinkedIn
https://www.linkedin.com/in/ahmedfgad/
ResearchGate
https://www.researchgate.net/profile/Ahmed_Gad13
Academia
https://www.academia.edu/
Google Scholar
https://scholar.google.com.eg/citations?user=r07tjocAAAAJ&hl=en
Mendelay
https://www.mendeley.com/profiles/ahmed-gad12/
ORCID
https://orcid.org/0000-0003-1978-8574
StackOverFlow
http://stackoverflow.com/users/5426539/ahmed-gad
Twitter
https://twitter.com/ahmedfgad
Facebook
https://www.facebook.com/ahmed.f.gadd
Pinterest
https://www.pinterest.com/ahmedfgad/
The document discusses reinforcement learning, including Q-learning. It provides an overview of reinforcement learning, describing what it is, important machine learning algorithms for it like Q-learning, and how Q-learning works in theory and practice. It also discusses challenges of reinforcement learning, potential applications, and links between reinforcement learning algorithms and human psychology.
This document discusses various heuristic search algorithms including generate-and-test, hill climbing, best-first search, problem reduction, and constraint satisfaction. Generate-and-test involves generating possible solutions and testing if they are correct. Hill climbing involves moving in the direction that improves the state based on a heuristic evaluation function. Best-first search evaluates nodes and expands the most promising node first. Problem reduction breaks problems into subproblems. Constraint satisfaction views problems as sets of constraints and aims to constrain the problem space as much as possible.
AI_Session 11: searching with Non-Deterministic Actions and partial observati...Asst.prof M.Gokilavani
This document summarizes a session on problem solving by search in artificial intelligence. It discusses uninformed and informed search strategies like breadth-first search, uniform cost search, depth-first search, greedy best-first search, and A* search. It also covers searching with non-deterministic actions, partial observations, and online search agents operating in unknown environments. Examples discussed include the vacuum world problem and how search trees are used to handle non-determinism through contingency planning. The next session will cover online search agents operating in unknown environments.
Game playing in artificial intelligent technique syeda zoya mehdi
The document discusses game artificial intelligence and techniques used to generate intelligent behavior in non-player characters in computer and video games. It covers topics like machine learning, reinforcement learning, pathfinding algorithms, and different data structures used to represent game boards and chess positions. Game AI aims to create behavior that feels natural to the player while obeying the rules of the game. Various computer science disciplines are required to develop effective game AI, and different types of games require different AI techniques.
Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)ftgaic
This are the presentation slides for the following paper:
Shubu Yoshida, Makoto Ishihara, Taichi Miyazaki, Yuto Nakagawa, Tomohiro Harada, and Ruck Thawonmas, "Application of Monte-Carlo Tree Search in a Fighting Game AI," accepted for presentation at the 5th IEEE Global Conference on Consumer Electronics (GCCE 2016), Kyoto, Japan, Oct. 11-14, 2016.
This document outlines an AI agent called MctsAi that uses Monte Carlo tree search with upper confidence bounds (UCT) for the FightingICE fighting game platform. It repeats the process of selection, expansion, playout, and backpropagation of nodes in the search tree until time runs out. The UCB1 formula is used to balance exploration of less visited nodes with high potential value during selection. Actions are selected based on the most visited child node of the root at the end.
What did AlphaGo do to beat the strongest human Go player?Tobias Pfeiffer
This year AlphaGo shocked the world by decisively beating the strongest human Go player, Lee Sedol. An accomplishment that wasn't expected for years to come. How did AlphaGo do this? What algorithms did it use? What advances in AI made it possible? This talk will answer these questions.
AlphaGo is a Go-playing program developed by DeepMind that uses a combination of Monte Carlo tree search and deep neural networks to defeat human professionals. It uses policy networks trained via supervised and reinforcement learning to guide the search by providing prior probabilities over moves, and value networks trained via reinforcement learning to evaluate board positions. By integrating neural network guidance into the tree search process, AlphaGo was able to defeat other Go programs and the European Go champion without relying solely on brute force search of the enormous game tree.
A Markov Chain Monte Carlo approach to the Steiner Tree Problem in water netw...Carlo Lancia
The document describes using a Metropolis algorithm and Markov chain Monte Carlo (MCMC) approach to solve the minimal Steiner tree problem, which models finding the shortest water distribution network for a farm district. It formulates the problem on a graph and defines the optimization problem and statistical mechanics analogy. It then describes designing a Markov chain with transition probabilities following the Metropolis rule and a Hamiltonian function incorporating edge costs and penalties for configurations violating constraints.
2016 Fighting Game Artificial Intelligence Competitionftgaic
These are the slides about the 2016 Fighting Game Artificial Intelligence Competition presented at the 2016 IEEE Conference on Computational Intelligence and Games (CIG 2016) on September 22, 2016 in Santorini, Greece.
2013 Fighting Game Artificial Intelligence Competitionftgaic
These are the slides about the 2013 Fighting Game Artificial Intelligence Competition presented at The 2nd IEEE Global Conference on Consumer Electronics (GCCE 2013) on October 3, 2013.
The document discusses a lecture on machine learning algorithms. It covers recapping the ID3 algorithm, machine learning biases including language bias and preference bias, and decision tree learning. It also compares the ID3 and CANDIDATE-ELIMINATION algorithms, noting that ID3 has a preference bias while CANDIDATE-ELIMINATION has a restriction bias.
This document provides an introduction to Bayesian methods. It discusses key Bayesian concepts like priors, likelihoods, and Bayes' theorem. Bayes' theorem states that the posterior probability of a measure is proportional to the prior probability times the likelihood function. The document uses examples to illustrate Bayesian analysis and key principles like the likelihood principle and exchangeability. It also briefly discusses Bayesian pioneers like Bayes, Laplace, and Gauss and computational Bayesian methods.
This document provides an introduction to Bayesian statistics using R. It discusses key Bayesian concepts like the prior, likelihood, and posterior distributions. It assumes familiarity with basic probability and probability distributions. Examples are provided to demonstrate Bayesian estimation and inference for binomial and normal distributions. Specifically, it shows how to estimate the probability of success θ in a binomial model and the mean μ in a normal model using different prior distributions and calculating the resulting posterior distributions in R.
Mastering the game of Go with deep neural networks and tree search: PresentationKarel Ha
the presentation of the article "Mastering the game of Go with deep neural networks and tree search" given at the Spring School of Combinatorics 2016
Notes:
- All URLs are clickable.
- All citations are clickable (when hovered over the "year" part of "[author year]").
- To download without a SlideShare account, use https://www.dropbox.com/s/4njuiaaou1po0y4/AlphaGo.pdf?dl=0
- The corresponding handout is available at http://www.slideshare.net/KarelHa1/mastering-the-game-of-go-with-deep-neural-networks-and-tree-search-handout
- The video is available at https://youtu.be/Lso2kE58JrI
- The source code is available at https://github.com/mathemage/AlphaGo-presentation
The document outlines key concepts in algorithmic game theory, including solution concepts like Nash equilibrium, dominant strategies, and correlated equilibrium. It also discusses different representations of games and examples like the prisoner's dilemma. The document provides definitions for fundamental game theory topics and outlines the structure of simultaneous move games involving multiple players with their own strategy sets.
AlphaGo uses a novel combination of Monte Carlo tree search and neural networks to master the game of Go. It trains two neural networks - a policy network to predict expert moves and a value network to evaluate board positions. During gameplay, AlphaGo runs multiple Monte Carlo tree simulations that use the neural networks to guide search and evaluate positions. The move selected is the one most frequently visited after all simulations. This approach allowed AlphaGo to defeat world champion Lee Sedol 4-1, achieving a milestone in artificial intelligence.
An introduction to bayesian statisticsJohn Tyndall
This document provides an introduction to Bayesian statistics. It discusses that Bayesian statistics takes a fundamentally different approach to probability than frequentist statistics by viewing parameters as random variables rather than fixed values. It also uses mathematical tools like Bayes' theorem, priors, posteriors, and Markov chain Monte Carlo simulations. The document explains Bayesian concepts and compares the Bayesian and frequentist perspectives. It argues that Bayesian methods are particularly useful for complex models with many interacting parameters.
This document provides an introduction to Bayesian methods for theory, computation, inference and prediction. It discusses key concepts in Bayesian statistics including the likelihood principle, the likelihood function, Bayes' theorem, and using Markov chain Monte Carlo methods like the Metropolis-Hastings algorithm to perform posterior integration when closed-form solutions are not possible. Examples are provided on using Bayesian regression to model the relationship between salmon body length and egg mass while incorporating prior information. The summary concludes that the Bayesian approach provides a coherent way to quantify uncertainty and make predictions accounting for both aleatory and epistemic sources of variation.
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsGreg Makowski
http://www.meetup.com/SF-Bay-ACM/events/227480571/
(see also YouTube for a recording of the presentation)
The talk will cover a brief review of neural network basics and the following types of neural network deep learning:
* autocorrelational - unsupervised learning for extracting features. He will describe how additional layers build complexity in the feature extraction.
* convolutional - how to detect shift invariant patterns in various data sources. Horizontal shift invariant detection applies to signals like speech recognition or IoT data. Horizontal and vertical shift invariance applies to images or videos, for faces or self driving cars
* discuss details of applying deep net systems for continuous or real time scoring
* reinforcement learning or Q Learning - such as learning how to play Atari video games
* continuous space word models - such as word2vec, skipgram training, NLP understanding and translation
This follow up post on the subject of Artificial Intelligence focuses on Expert Systems and the role of traditional experts in their design and development. It explores four main themes:
What do we mean by Expert?
How do experts work?
Expert Systems Application Domains, and
Features of rule based Expert (KB) Systems
Anisotropic Metropolis Adjusted Langevin Algorithm: convergence and utility i...BigMC
The document describes the anisotropic Metropolis adjusted Langevin algorithm (AMALA) for simulation of random variables in high dimensions. AMALA addresses limitations of the isotropic Metropolis adjusted Langevin algorithm (MALA) by using an anisotropic covariance matrix based on the local gradient of the target distribution. The algorithm is shown to have geometric ergodicity under certain conditions on the target distribution. AMALA is then proposed for use within a stochastic algorithm for maximum likelihood estimation of incomplete data models.
Dealing with intractability: Recent Bayesian Monte Carlo methods for dealing ...BigMC
talk by Nicolas Chopin at CREST Statistics Seminar, 16/01/2011.
This is partly a review, partly a talk on recent research such as
http://arxiv.org/abs/1101.1528
Stability of adaptive random-walk Metropolis algorithmsBigMC
The document discusses adaptive MCMC algorithms and their stability. It introduces the stochastic approximation framework that is commonly used to construct adaptive MCMC algorithms. It then discusses issues with stability as the adaptive parameters are updated, and how enforced stability or adaptive reprojections can help address this. Finally, it provides examples of the adaptive Metropolis algorithm and adaptive scaling Metropolis algorithm, which aim to automatically tune the proposal distribution scale parameter.
The document summarizes the Metropolis-adjusted Langevin algorithm (MALA) for sampling from log-concave probability measures in high dimensions. It introduces MALA and different proposal distributions, including random walk, Ornstein-Uhlenbeck, and Euler proposals. It discusses known results on optimal scaling, diffusion limits, ergodicity, and mixing time bounds. The main result is a contraction property for the MALA transition kernel under appropriate assumptions, implying dimension-independent bounds on mixing times.
This document discusses an online EM algorithm and some extensions. It begins by outlining the goals of maximum likelihood estimation, good scaling, processing data incrementally without storage, and simple implementation. It then provides an overview of the topics covered, which include the EM algorithm in exponential families, the limiting EM recursion, the online EM algorithm, using online EM for batch maximum likelihood estimation, and extensions. The document uses a Poisson mixture model as a running example to illustrate the E and M steps of the EM algorithm.
1. The document discusses learning spline-based curve models from a set of contour data using a probabilistic approach.
2. It proposes representing the curves using spline curves defined by a small number of parameters in order to achieve a simple representation that is adaptive to the data.
3. An expectation-maximization approach is described for estimating the model parameters from the contour data in order to characterize the group or class of curves.
Omiros' talk on the Bernoulli factory problemBigMC
This document summarizes previous work on simulating events of unknown probability using reverse time martingales. It discusses von Neumann's solution to the Bernoulli factory problem where f(p)=1/2. It also summarizes the Keane-O'Brien existence result, the Nacu-Peres Bernstein polynomial approach, and issues with implementing the Nacu-Peres algorithm at large n due to the large number of strings involved. It proposes developing a reverse time martingale approach to address these issues.
1. Monte-Carlo Tree Search (MCTS)
for Computer Go
Bruno Bouzy
bruno.bouzy@parisdescartes.fr
Université Paris Descartes
Séminaire BigMC
5 mai 2011
2. Outline
● The game of Go: a 9x9 game
● The « old » approach (*-2002)
● The Monte-Carlo approach (2002-2005)
● The MCTS approach (2006-today)
● Conclusion
MCTS for Computer Go 2
4. The game of Go
● 4000 years
● Originated from China
●
Developed by Japan (20th century)
● Best players in Korea, Japan, China
● 19x19: official board size
● 9x9: beginners' board size
MCTS for Computer Go 4
5. A 9x9 game
● The board has 81 « intersections ». Initially,it
is empty.
MCTS for Computer Go 5
6. A 9x9 game
● Black moves first. A « stone » is played on
an intersection.
MCTS for Computer Go 6
7. A 9x9 game
● White moves second.
MCTS for Computer Go 7
8. A 9x9 game
● Moves alternate between Black and White.
MCTS for Computer Go 8
9. A 9x9 game
● Two adjacent stones of the same color
builds a « string » with « liberties ».
● 4-adjacency
MCTS for Computer Go 9
10. A 9x9 game
● Strings are created.
MCTS for Computer Go 10
11. A 9x9 game
● A white stone is in « atari » (one liberty).
MCTS for Computer Go 11
12. A 9x9 game
● The white string has five liberties.
MCTS for Computer Go 12
13. A 9x9 game
● The black stone is « atari ».
MCTS for Computer Go 13
14. A 9x9 game
● White « captures » the black stone.
MCTS for Computer Go 14
15. A 9x9 game
● For human players, the game is over.
– Hu?
– Why?
MCTS for Computer Go 15
16. A 9x9 game
● What happens if White contests black
« territory »?
MCTS for Computer Go 16
17. A 9x9 game
● White has invaded. Two strings are atari!
MCTS for Computer Go 17
18. A 9x9 game
● Black captures !
MCTS for Computer Go 18
19. A 9x9 game
● White insists but its string is atari...
MCTS for Computer Go 19
20. A 9x9 game
● Black has proved is « territory ».
MCTS for Computer Go 20
21. A 9x9 game
● Black may contest white territory too.
MCTS for Computer Go 21
22. A 9x9 terminal position
● The game is over for computers.
– Hu?
– Who won ?
MCTS for Computer Go 22
23. A 9x9 game
● The game ends when both players pass.
● One black (resp. white) point for each black
(resp. white) stone and each black (resp.
white) « eye » on the board.
● One black (resp. white) eye = an empty
intersection surrounded by black (resp.
white) stones.
MCTS for Computer Go 23
24. A 9x9 game
● Scoring:
– Black = 44
– White = 37
– Komi = 7.5
– Score = -0.5
● White wins!
MCTS for Computer Go 24
25. Go ranking: « kyu » and « dan »
Pro ranking Amateur ranking
Top professional players
9 dan 9 dan
Very strong players 1 dan 6 dan
1 dan
Strong players 1 kyu
Average playersl 10 kyu
Beginners 20 kyu
Very beginners 30 kyu
MCTS for Computer Go 25
26. Computer Go (old history)
● First go program (Lefkovitz 1960)
● Zobrist hashing (Zobrist 1969)
● Interim2 (Wilcox 1979)
● Life and death model (Benson 1988)
● Patterns: Goliath (Boon 1990)
● Mathematical Go (Berlekamp 1991)
● Handtalk (Chen 1995)
MCTS for Computer Go 26
27. The old approach
● Evaluation of non terminal positions
– Knowledge-based
– Breaking-down of a position into sub-
positions
● Fixed-depth global tree search
– Depth = 0 : action with the best value
– Depth = 1: action leading to the position
with the best evaluation
– Depth > 1: alfa-beta or minmax
MCTS for Computer Go 27
28. The old approach
Current position
2 or 3
Bounded depth Tree search
Evaluation of
non terminal positions
Huhu?
361
MCTS for Computer Go Terminal positions
28
29. Position evaluation
● Break-down
– Whole game (win/loss or score)
– Goal-oriented sub-game
● String capture
● Connections, dividers, eyes, life and death
● Local searches
– Alpha-beta and enhancements
– Proof-number search
MCTS for Computer Go 29
33. Possible local evaluations (1)
unstable
Alive and territory
Not important alive
dead
alive
MCTS for Computer Go 33
34. Possible local evaluations (2)
alive
unstable
unstable
alive + big territory unstable
MCTS for Computer Go 34
35. Position evaluation
● Local results
– Obtained with local tree search
– Result if white plays first (resp. black)
– Combinatorial game theory (Conway)
– Switches {a|b}, >, <, *, 0
● Global recomposition
– move generation and evaluation
– position evaluation
MCTS for Computer Go 35
37. Drawbacks (1/2)
● The break-down is not unique
● Performing a (wrong) local tree search on a
(possibly irrelevant) local position
● Misevaluating the size of the local position
● Different kinds of local information
– Symbolic (group: dead alive unstable)
– Numerical (territory size, reduction,
increase)
MCTS for Computer Go 37
38. Drawbacks (2/2)
● Local positions interact
● Complicated
● Domain-dependent knowledge
● Need of human expertise
● Difficult to program and maintain
● Holes of knowledge
● Erratic behaviour
MCTS for Computer Go 38
39. Upsides
● Feasible on 1990's computers
● Execution is fast
● Some specific local tree searches are
accurate and fast
MCTS for Computer Go 39
40. The old approach
Pro ranking Amateur ranking
Top professional players
9 dan 9 dan
Very strong players 1 dan 6 dan
1 dan
Strong players 1 kyu
Average playersl 10 kyu
Beginners 20 kyu
Old
approach Very beginners 30 kyu
MCTS for Computer Go 40
41. End of part one!
● Next: the Monte-Carlo approach...
MCTS for Computer Go 41
42. The Monte-Carlo (MC) approach
● Games containing chance
– Backgammon (Tesauro 1989)
● Games with hidden information
– Bridge (Ginsberg 2001)
– Poker (Billings & al. 2002)
– Scrabble (Sheppard 2002)
MCTS for Computer Go 42
43. The Monte-Carlo approach
● Games with complete information
– A general model (Abramson 1990)
● Simulated annealing Go
– (Brügmann 1993)
– 2 sequences of moves
– « all moves as first » heuristic
– Gobble on 9x9
MCTS for Computer Go 43
44. The Monte-Carlo approach
● Position evaluation:
Launch N random games
Evaluation = mean value of outcomes
● Depth-one MC algorithm:
For each move m {
Play m on the ref position
Launch N random games
Move value (m) = mean value
}
MCTS for Computer Go 44
46. Progressive pruning
● (Billings 2002, Sheppard 2002, Bouzy &
Helmstetter 2003)
Second best
Current best move
Still explored
Pruned
MCTS for Computer Go 46
47. Upper bound
● Optimism in face of uncertainty
– Intestim (Kaelbling 1993),
– UCB multi-armed bandit (Auer & al 2002)
Second best promising
Current best proven move
Current best promising move
MCTS for Computer Go 47
50. All-moves-as-first heuristic (3/3)
B C
r
A D
A C
A B C D
B B
C Actual A Virtual simulation =
simulation actual simulation
assuming c is played
D D
B
« as first »
C
o A D o
MCTS for Computer Go 50
51. The Monte-Carlo approach
● Upsides
– Robust evaluation
– Global search
– Move quality increases with computing power
● Way of playing
– Good strategical sense but weak tactically
● Easy to program
– Follow the rules of the game
– No break-down problem
MCTS for Computer Go 51
52. Monte-Carlo and knowledge
● Pseudo-random simulations using Go
knowledge (Bouzy 2003)
– Moves played with a probability depending on
specific domain-dependent knowledge
● 2 basic concepts
– string capture 3x3 shapes
MCTS for Computer Go 52
53. Monte-Carlo and knowledge
● Results are impressive
– MC(random) << MC(pseudo random)
– Size 9x9 13x13 19x19
– % wins 68 93 98
● Other works on simulations
– Patterns in MoGo, proximity rule (Wang & al
2006)
– Simulation balancing (Silver & Tesauro 2009)
MCTS for Computer Go 53
54. Monte-Carlo and knowledge
● Pseudo-random player
– 3x3 pattern urgency table with 38 patterns
– Few dizains of relevant patterns only
– Patterns gathered by
● Human expertise
● Reinforcement Learning (Bouzy & Chaslot
2006)
● Warning
– p1 better than p2 does not mean MC(p1)
better than MC(p2)
MCTS for Computer Go 54
55. Monte-Carlo Tree Search (MCTS)
● How to integrate MC and TS ?
● UCT = UCB for Trees
– (Kocsis & Szepesvari 2006)
– Superposition of UCB (Auer & al 2002)
● MCTS
– Selection, expansion, updating (Chaslot & al)
(Coulom 2006)
– Simulation (Bouzy 2003) (Wang & Gelly
2006)
MCTS for Computer Go 55
56. MCTS (1/2)
while (hasTime) {
playOutTreeBasedGame()
expandTree()
outcome = playOutRandomGame()
updateNodes(outcome)
}
then choose the node with...
... the best mean value
... the highest visit number
MCTS for Computer Go 56
57. MCTS (2/2)
PlayOutTreeBasedGame() {
node = getNode(position)
while (node) {
move=selectMove(node)
play(move)
node = getNode(position)
}
}
MCTS for Computer Go 57
58. UCT move selection
● Move selection rule to browse the tree:
move=argmax (s*mean + C*sqrt(log(t)/n)
● Mean value for exploitation
– s (=+-1): color to move
● UCT bias for exploration
– C: constant term set up by experiments
– t: number of visits of the parent node
– n: number of visits of the current node
MCTS for Computer Go 58
59. Example
● 1 iteration 1/1
1
MCTS for Computer Go 59
60. Example
● 2 iterations 1/2
0/1
0
MCTS for Computer Go 60
61. Example
● 3 iterations 2/3
1/1 0/1
1
MCTS for Computer Go 61
62. Example
2/4
● 4 iterations
0/1 1/1 0/1
MCTS for Computer Go 62
63. Example
3/5
● 5 iterations
0/1 1/1 0/1 1/1
MCTS for Computer Go 63
64. Example
● 6 iterations
3/6
0/1 1/2 0/1 1/1
0/1
MCTS for Computer Go 64
65. Example
● 7 iterations 3/7
0/1 1/2 0/1 1/2
0/1 0/1
MCTS for Computer Go 65
66. Example
● 8 iterations 4/8
0/1 2/3 0/1 1/2
1/1 0/1 0/1
MCTS for Computer Go 66
67. Example
● 9 iterations 4/9
0/1 2/4 0/1 1/2
0/1 1/1 0/1 0/1
MCTS for Computer Go 67
68. Example
● 10 iterations 5/10
0/1 2/4 0/1 2/3
0/1 1/1 0/1 1/1 0/1
MCTS for Computer Go 68
69. Example
● 11 iterations 6/11
0/1 2/4 0/1 3/4
0/1 1/1 0/1 1/1 0/1 1/1
MCTS for Computer Go 69
70. Example
● 12 iterations 7/12
0/1 2/4 0/1 4/5
0/1 1/1 0/1 1/1 1/2 1/1
1/1
MCTS for Computer Go 70
71. Example
● Clarity
– C=0
● Notice
– with C != 0 a node cannot stay unvisited
– min or max rule according to the node depth
– not visited children have an infinite mean
● Practice
– Mean initialized optimistically
MCTS for Computer Go 71
72. MCTS enhancements
● The raw version can be enhanced
– Tuning UCT C value
– Outcome = score or win loss info (+1/-1)
– Doubling the simulation number
– RAVE
– Using Go knowledge
● In the tree or in the simulations
– Speed-up
● Optimizing, pondering, parallelizing
MCTS for Computer Go 72
73. Assessing an enhancement
● Self-play
– The new version vs the reference version
– % wins with few hundred games
– 9x9 (or 19x19 boards)
● Against differently designed programs
– GTP (Go Text Protocol)
– CGOS (Computer Go Operating System)
● Competitions
MCTS for Computer Go 73
74. Move selection formula tuning
● Using UCB
– Best value for C ?
– 60-40%
● Using « UCB-tuned » (Auer & al 2002)
– C replaced by min(1/4,variance)
– 55-45%
MCTS for Computer Go 74
75. Exploration vs exploitation
● General idea: explore at the beginning and
exploit in the end of thinking time
● Diminishing C linearly in the remaining time
– (Vermorel & al 2005)
– 55-45%
● At the end:
– Argmax over the mean value or over the
number of visits ?
– 55-45% MCTS for Computer Go 75
76. Kind of outcome
● 2 kinds of outcomes
– Score (S) or win loss information (WLI) ?
– Probability of winning or expected score ?
– Combining both (S+WLI) (score +45 if win)
● Results
– WLI vs S 65-35%
– S+WLI vs S 65-35%
MCTS for Computer Go 76
77. Doubling the number of
simulations
● N = 100,000
● Results
– 2N vs N 60-40%
– 4N vs 2N 58-42%
MCTS for Computer Go 77
78. Tree management
● Transposition tables
– Tree -> Directed Acyclic Graph (DAG)
– Different sequences of moves may lead to
the same position
– Interest for MC Go: merge the results
– Result: 60-40%
● Keeping the tree from one move to the next
– Result: 65-35%
MCTS for Computer Go 78
79. RAVE (1/3)
● Rapid Action Value Estimation
– Mogo 2007
– Use the AMAF heuristic (Brugmann 1993)
– There are « many » virtual sequences that
are transposed from the actually played
sequence
● Result:
– 70-30%
MCTS for Computer Go 79
80. RAVE (2/3)
● AMAF heuristic
● Which nodes to update? A B
● Actual C
C D D
– Sequence ACBD
– Nodes B A
A
B
● Virtual
– BCAD, ADBC, BDAC D C
– Nodes
MCTS for Computer Go 80
81. RAVE (3/3)
● 3 variables
– Usual mean value Mu
– AMAF mean value Mamaf
– M = β Mamaf + (1-β) Mu
– β = sqrt(k/(k+3N))
– K set up experimentally
● M varies from Mamaf to Mu
MCTS for Computer Go 81
82. Knowledge in the simulations
● High urgency for...
– capture/escape 55-45%
– 3x3 patterns 60-40%
– Proximity rule 60-40%
● Mercy rule
– Interrupt the game when the difference of
captured stones is greater than a
threshold (Hillis 2006)
– 51-49% MCTS for Computer Go 82
83. Knowledge in the tree
● Virtual wins for good looking moves
● Automatic acquisition of patterns of pro
games (Coulom 2007) (Bouzy & Chaslot 2005)
● Matching has a high cost
● Progressive widening (Chaslot & al 2008)
● Interesting under strong time constraints
● Result: 60-40%
MCTS for Computer Go 83
84. Speeding up the simulations
● Fully random simulations (2007)
– 50,000 game/second (Lew 2006)
– 20,000 (commonly eared)
– 10,000 (my program)
● Pseudo-random
– 5,000 (my program in 2007)
● Rough optimization is worthwhile
MCTS for Computer Go 84
85. Pondering
● Think on the opponent time
– 55-45%
– Possible doubling of thinking time
– The move of the opponent may not be
the planned move on which you think
– Side effect: play quickly to think on the
opponent time
MCTS for Computer Go 85
86. Summing up the enhancements
● MCTS with all enhancements vs raw MCTS
– Exploration and exploitation: 60-40%
– Win/loss outcome: 65-35%
– Rough optimization of simulations 60-40%
– Transposition table 60-40%
– RAVE 70-30%
– Knowledge in the simulations 70-30%
– Knowledge in the tree 60-40%
– Pondering 55-45%
– Parallelization 70-30%
● Result: 99-1%
MCTS for Computer Go 86
87. Parallelization
● Computer Chess: Deep Blue
● Multi-core computer
– Symmetric MultiProcessor (SMP)
– one thread per processor
– shared memory, low latency
– mutual exclusion (mutex) mechanism
● Cluster of computers
– Message Passing Information (MPI)
MCTS for Computer Go 87
88. Parallelization
while (hasTime) {
playOutTreeBasedGame()
expandTree()
outcome = playOutRandomGame()
updateNodes(outcome)
}
MCTS for Computer Go 88
90. Leaf parallelization
● (Cazenave Jouandeau 2007)
● Easy to program
● Drawbacks
– Wait for the longest simulation
– When part of the simulation outcomes is a
loss, performing the remaining may not
be a relevant strategy.
MCTS for Computer Go 90
92. Root parallelization
● (Cazenave Jouandeau 2007)
● Easy to program
● No communication
● At completion, merge the trees
● 4 MCTS for 1sec > 1 MCTS for 4 sec
● Good way for low time settings and a small
number of threads
MCTS for Computer Go 92
95. Tree parallelization
● One shared tree, several threads
● Mutex
– Global: the whole tree has a mutex
– Local: each node has a mutex
● « Virtual loss »
– Given to a node browsed by a thread
– Removed at update stage
– Preventing threads from similar simulations
MCTS for Computer Go 95
96. Computer-computer results
● Computer Olympiads
19x19 9x9
– 2010 Erica, Zen, MFGo MyGoFriend
– 2009 Zen, Fuego, Mogo Fuego
– 2008 MFGo, Mogo, Leela MFGo
– 2007 Mogo, CrazyStone, GNU Go Steenvreter
– 2006 GNU Go, Go Intellect, Indigo CrazyStone
– 2005 Handtalk, Go Intellect, Aya Go Intellect
– 2004 Go Intellect, MFGo, Indigo Go Intellect
MCTS for Computer Go 96
97. Human-computer results
● 9x9
– 2009: Mogo won a pro with black
– 2009: Fuego won a pro with white
● 19x19:
– 2008: Mogo won a pro with 9 stones
Crazy Stone won a pro with 8 stones
Crazy Stone won a pro with 7 stones
– 2009: Mogo won a pro with 6 stones
MCTS for Computer Go 97
98. MCTS and the old approach
Pro ranking Amateur ranking
Top professional players
9 dan 9 dan
9x9 go
Very strong players 1 dan 6 dan
1 dan 19x19 go
Strong players 1 kyu
Average players 10 kyu
MCTS Beginners 20 kyu
Old
Very beginners 30 kyu
approach
MCTS for Computer Go 98
99. Computer Go (MC history)
● Monte-Carlo Go (Brugmann 1993)
● MCGo devel. (Bouzy & Helmstetter 2003)
● MC+knowledge (Bouzy 2003)
● UCT (Kocsis & Szepesvari 2006)
● Crazy Stone (Coulom 2006)
● Mogo (Wang & Gelly 2006)
MCTS for Computer Go 99
100. Conclusion
● Monte-Carlo brought a Big
improvement in Computer Go over the last
decade!
– No old approach based program anymore!
– All go programs are MCTS based!
– Professional level on 9x9!
– Dan level on 19x19!
● Unbelievable 10 years ago!
MCTS for Computer Go 100
101. Some references
● PhD, MCTS and Go (Chaslot 2010)
● PhD, Reinf. Learning and Go (Silver 2010)
● PhD, R. Learning: applic. to Go (Gelly 2007)
● UCT (Kocsis & Szepesvari 2006)
●
1st MCTS go program (Coulom 2006)
MCTS for Computer Go 101
102. Web links
● http://www.grappa.univ-lille3.fr/icga/
● http://cgos.boardspace.net/
● http://www.gokgs.com/
● http://www.lri.fr/~gelly/MoGo.htm
● http://remi.coulom.free.fr/CrazyStone/
● http://fuego.sourceforge.net/
● ...
MCTS for Computer Go 102
103. Thank you for your attention!
MCTS for Computer Go 103