Introduction
        Monte Carlo Go
             My Work




Machine Learning applied to Go

         Dmitry Kamenetsky

 ...
Introduction
                      Monte Carlo Go
                           My Work




1   Introduction

2   Monte Carlo...
Introduction
   Monte Carlo Go
        My Work



          What is Go?


                         Two-player deterministi...
Introduction
                    Monte Carlo Go
                         My Work



                        Why study Go?
...
Introduction
                    Monte Carlo Go
                         My Work



                        Why study Go?
...
Introduction
                    Monte Carlo Go
                         My Work



                        Why study Go?
...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                  Monte Carlo Go   Bandit Problem
                       My Work     MoGo



  ...
Introduction   Monte Carlo
                  Monte Carlo Go   Bandit Problem
                       My Work     MoGo



  ...
Introduction   Monte Carlo
                  Monte Carlo Go   Bandit Problem
                       My Work     MoGo



  ...
Introduction   Monte Carlo
                  Monte Carlo Go   Bandit Problem
                       My Work     MoGo



  ...
Introduction   Monte Carlo
                  Monte Carlo Go   Bandit Problem
                       My Work     MoGo



  ...
Introduction   Monte Carlo
                  Monte Carlo Go   Bandit Problem
                       My Work     MoGo



  ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                 Monte Carlo Go   Bandit Problem
                      My Work     MoGo



    ...
Introduction   Monte Carlo
                Monte Carlo Go   Bandit Problem
                     My Work     MoGo



      ...
Introduction   Monte Carlo
                Monte Carlo Go   Bandit Problem
                     My Work     MoGo



      ...
Introduction   Monte Carlo
                Monte Carlo Go   Bandit Problem
                     My Work     MoGo



      ...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                    Introduction
                                   Cooperative Scorer
                ...
Beta Distributions
                    Introduction
                                   Cooperative Scorer
                ...
Beta Distributions
                    Introduction
                                   Cooperative Scorer
                ...
Beta Distributions
                    Introduction
                                   Cooperative Scorer
                ...
Beta Distributions
                    Introduction
                                   Cooperative Scorer
                ...
Beta Distributions
                    Introduction
                                   Cooperative Scorer
                ...
Beta Distributions
     Introduction
                    Cooperative Scorer
   Monte Carlo Go
                    Zobrist ...
Beta Distributions
     Introduction
                    Cooperative Scorer
   Monte Carlo Go
                    Zobrist ...
Beta Distributions
     Introduction
                    Cooperative Scorer
   Monte Carlo Go
                    Zobrist ...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
                   Introduction
                                  Cooperative Scorer
                 M...
Beta Distributions
     Introduction
                    Cooperative Scorer
   Monte Carlo Go
                    Zobrist ...
Beta Distributions
     Introduction
                    Cooperative Scorer
   Monte Carlo Go
                    Zobrist ...
Beta Distributions
                  Introduction
                                 Cooperative Scorer
                Mont...
Upcoming SlideShare
Loading in …5
×

Machine Learning applied to Go

613 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
613
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Machine Learning applied to Go

  1. 1. Introduction Monte Carlo Go My Work Machine Learning applied to Go Dmitry Kamenetsky Supervisor: Nic Schraudolph March 2007 Dmitry Kamenetsky Machine Learning applied to Go
  2. 2. Introduction Monte Carlo Go My Work 1 Introduction 2 Monte Carlo Go 3 My Work Dmitry Kamenetsky Machine Learning applied to Go
  3. 3. Introduction Monte Carlo Go My Work What is Go? Two-player deterministic board game Originated in ancient China. Today very popular in China, Japan and Korea 19x19 grid, also 9x9 grid for beginners Simple rules, but very complex strategy Dmitry Kamenetsky Machine Learning applied to Go
  4. 4. Introduction Monte Carlo Go My Work Why study Go? If there are sentient beings on other planets, then they play Go – Emanuel Lasker, former chess world champion Go is one of the grand challenges of AI – Ron Rivest, professor of Computer Science at MIT Go is like life and life is like Go - Chinese proverb Dmitry Kamenetsky Machine Learning applied to Go
  5. 5. Introduction Monte Carlo Go My Work Why study Go? If there are sentient beings on other planets, then they play Go – Emanuel Lasker, former chess world champion Go is one of the grand challenges of AI – Ron Rivest, professor of Computer Science at MIT Go is like life and life is like Go - Chinese proverb Dmitry Kamenetsky Machine Learning applied to Go
  6. 6. Introduction Monte Carlo Go My Work Why study Go? If there are sentient beings on other planets, then they play Go – Emanuel Lasker, former chess world champion Go is one of the grand challenges of AI – Ron Rivest, professor of Computer Science at MIT Go is like life and life is like Go - Chinese proverb Dmitry Kamenetsky Machine Learning applied to Go
  7. 7. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Monte Carlo Often used for problems that have no closed solution, e.g. computational physics Sample instances from some large population Use samples to approximate some common property of the population In search, select the next state based on some fixed distribution (usually uniform) Recently, have been very successful in Go, causing a mini-revolution Dmitry Kamenetsky Machine Learning applied to Go
  8. 8. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Monte Carlo Often used for problems that have no closed solution, e.g. computational physics Sample instances from some large population Use samples to approximate some common property of the population In search, select the next state based on some fixed distribution (usually uniform) Recently, have been very successful in Go, causing a mini-revolution Dmitry Kamenetsky Machine Learning applied to Go
  9. 9. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Monte Carlo Often used for problems that have no closed solution, e.g. computational physics Sample instances from some large population Use samples to approximate some common property of the population In search, select the next state based on some fixed distribution (usually uniform) Recently, have been very successful in Go, causing a mini-revolution Dmitry Kamenetsky Machine Learning applied to Go
  10. 10. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Monte Carlo Often used for problems that have no closed solution, e.g. computational physics Sample instances from some large population Use samples to approximate some common property of the population In search, select the next state based on some fixed distribution (usually uniform) Recently, have been very successful in Go, causing a mini-revolution Dmitry Kamenetsky Machine Learning applied to Go
  11. 11. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Monte Carlo Often used for problems that have no closed solution, e.g. computational physics Sample instances from some large population Use samples to approximate some common property of the population In search, select the next state based on some fixed distribution (usually uniform) Recently, have been very successful in Go, causing a mini-revolution Dmitry Kamenetsky Machine Learning applied to Go
  12. 12. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo K-armed bandit problem Slot machine with K arms. Each arm provides a reward based on some unknown, but fixed distribution Goal: to choose arms to play such that the total reward is maximized How should the gambler play at any given moment? Choose arm with highest average reward seen so far (exploitation) Choose a sub-optimal arm in the hope that it it will lead to a greater reward (exploration) Neither - need a combination of exploitation and exploration Dmitry Kamenetsky Machine Learning applied to Go
  13. 13. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo K-armed bandit problem Slot machine with K arms. Each arm provides a reward based on some unknown, but fixed distribution Goal: to choose arms to play such that the total reward is maximized How should the gambler play at any given moment? Choose arm with highest average reward seen so far (exploitation) Choose a sub-optimal arm in the hope that it it will lead to a greater reward (exploration) Neither - need a combination of exploitation and exploration Dmitry Kamenetsky Machine Learning applied to Go
  14. 14. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo K-armed bandit problem Slot machine with K arms. Each arm provides a reward based on some unknown, but fixed distribution Goal: to choose arms to play such that the total reward is maximized How should the gambler play at any given moment? Choose arm with highest average reward seen so far (exploitation) Choose a sub-optimal arm in the hope that it it will lead to a greater reward (exploration) Neither - need a combination of exploitation and exploration Dmitry Kamenetsky Machine Learning applied to Go
  15. 15. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo K-armed bandit problem Slot machine with K arms. Each arm provides a reward based on some unknown, but fixed distribution Goal: to choose arms to play such that the total reward is maximized How should the gambler play at any given moment? Choose arm with highest average reward seen so far (exploitation) Choose a sub-optimal arm in the hope that it it will lead to a greater reward (exploration) Neither - need a combination of exploitation and exploration Dmitry Kamenetsky Machine Learning applied to Go
  16. 16. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo K-armed bandit problem Slot machine with K arms. Each arm provides a reward based on some unknown, but fixed distribution Goal: to choose arms to play such that the total reward is maximized How should the gambler play at any given moment? Choose arm with highest average reward seen so far (exploitation) Choose a sub-optimal arm in the hope that it it will lead to a greater reward (exploration) Neither - need a combination of exploitation and exploration Dmitry Kamenetsky Machine Learning applied to Go
  17. 17. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo K-armed bandit problem Slot machine with K arms. Each arm provides a reward based on some unknown, but fixed distribution Goal: to choose arms to play such that the total reward is maximized How should the gambler play at any given moment? Choose arm with highest average reward seen so far (exploitation) Choose a sub-optimal arm in the hope that it it will lead to a greater reward (exploration) Neither - need a combination of exploitation and exploration Dmitry Kamenetsky Machine Learning applied to Go
  18. 18. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Upper Confidence Bounds Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which are i.i.d. with unknown E(Xi ) = µi Let Ti (n) be the number of times arm i has been played during the first n plays of machine Upper Confidence Bounds - UCB (Auer et al. 2002) Initialization: Play each arm once 2 log n Loop: Play arm i that maximizes µi + Ti (n) Proven to achieve optimal regret Dmitry Kamenetsky Machine Learning applied to Go
  19. 19. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Upper Confidence Bounds Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which are i.i.d. with unknown E(Xi ) = µi Let Ti (n) be the number of times arm i has been played during the first n plays of machine Upper Confidence Bounds - UCB (Auer et al. 2002) Initialization: Play each arm once 2 log n Loop: Play arm i that maximizes µi + Ti (n) Proven to achieve optimal regret Dmitry Kamenetsky Machine Learning applied to Go
  20. 20. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Upper Confidence Bounds Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which are i.i.d. with unknown E(Xi ) = µi Let Ti (n) be the number of times arm i has been played during the first n plays of machine Upper Confidence Bounds - UCB (Auer et al. 2002) Initialization: Play each arm once 2 log n Loop: Play arm i that maximizes µi + Ti (n) Proven to achieve optimal regret Dmitry Kamenetsky Machine Learning applied to Go
  21. 21. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Upper Confidence Bounds Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which are i.i.d. with unknown E(Xi ) = µi Let Ti (n) be the number of times arm i has been played during the first n plays of machine Upper Confidence Bounds - UCB (Auer et al. 2002) Initialization: Play each arm once 2 log n Loop: Play arm i that maximizes µi + Ti (n) Proven to achieve optimal regret Dmitry Kamenetsky Machine Learning applied to Go
  22. 22. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Upper Confidence Bounds Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which are i.i.d. with unknown E(Xi ) = µi Let Ti (n) be the number of times arm i has been played during the first n plays of machine Upper Confidence Bounds - UCB (Auer et al. 2002) Initialization: Play each arm once 2 log n Loop: Play arm i that maximizes µi + Ti (n) Proven to achieve optimal regret Dmitry Kamenetsky Machine Learning applied to Go
  23. 23. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo Upper Confidence Bounds Successive plays of arm i give rewards Xi,1 , Xi,2 , ... which are i.i.d. with unknown E(Xi ) = µi Let Ti (n) be the number of times arm i has been played during the first n plays of machine Upper Confidence Bounds - UCB (Auer et al. 2002) Initialization: Play each arm once 2 log n Loop: Play arm i that maximizes µi + Ti (n) Proven to achieve optimal regret Dmitry Kamenetsky Machine Learning applied to Go
  24. 24. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  25. 25. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  26. 26. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  27. 27. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  28. 28. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  29. 29. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  30. 30. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  31. 31. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  32. 32. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  33. 33. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo UCT UCB for minimax tree search (Kocsis and Szepesvari 2006) Start at the current board position p For i = 1 to 100,000 (number of simulations) p ←p until stopping criterion is reached (e.g. end of game) p ← p + move given by UCB create node p Evaluate leaf: value ← winner of p Update all the visited nodes with value At p play move with highest winning percentage Dmitry Kamenetsky Machine Learning applied to Go
  34. 34. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo First Go program to use UCT (Gelly et. al 2006) Store nodes and their statistics in a tree data structure Stopping criterion is a node that is not yet in the tree Leaf node evaluation: Playout position randomly until no moves remain. Final position is trivial to score Enhanced through the use of patterns and playing near the previous move Pruning techniques, smart ordering of unexplored moves Dmitry Kamenetsky Machine Learning applied to Go
  35. 35. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo First Go program to use UCT (Gelly et. al 2006) Store nodes and their statistics in a tree data structure Stopping criterion is a node that is not yet in the tree Leaf node evaluation: Playout position randomly until no moves remain. Final position is trivial to score Enhanced through the use of patterns and playing near the previous move Pruning techniques, smart ordering of unexplored moves Dmitry Kamenetsky Machine Learning applied to Go
  36. 36. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo First Go program to use UCT (Gelly et. al 2006) Store nodes and their statistics in a tree data structure Stopping criterion is a node that is not yet in the tree Leaf node evaluation: Playout position randomly until no moves remain. Final position is trivial to score Enhanced through the use of patterns and playing near the previous move Pruning techniques, smart ordering of unexplored moves Dmitry Kamenetsky Machine Learning applied to Go
  37. 37. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo First Go program to use UCT (Gelly et. al 2006) Store nodes and their statistics in a tree data structure Stopping criterion is a node that is not yet in the tree Leaf node evaluation: Playout position randomly until no moves remain. Final position is trivial to score Enhanced through the use of patterns and playing near the previous move Pruning techniques, smart ordering of unexplored moves Dmitry Kamenetsky Machine Learning applied to Go
  38. 38. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo First Go program to use UCT (Gelly et. al 2006) Store nodes and their statistics in a tree data structure Stopping criterion is a node that is not yet in the tree Leaf node evaluation: Playout position randomly until no moves remain. Final position is trivial to score Enhanced through the use of patterns and playing near the previous move Pruning techniques, smart ordering of unexplored moves Dmitry Kamenetsky Machine Learning applied to Go
  39. 39. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo First Go program to use UCT (Gelly et. al 2006) Store nodes and their statistics in a tree data structure Stopping criterion is a node that is not yet in the tree Leaf node evaluation: Playout position randomly until no moves remain. Final position is trivial to score Enhanced through the use of patterns and playing near the previous move Pruning techniques, smart ordering of unexplored moves Dmitry Kamenetsky Machine Learning applied to Go
  40. 40. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo First Go program to use UCT (Gelly et. al 2006) Store nodes and their statistics in a tree data structure Stopping criterion is a node that is not yet in the tree Leaf node evaluation: Playout position randomly until no moves remain. Final position is trivial to score Enhanced through the use of patterns and playing near the previous move Pruning techniques, smart ordering of unexplored moves Dmitry Kamenetsky Machine Learning applied to Go
  41. 41. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo’s success Ranked first on 9x9 Computer Go Server since August 2006 Won two most recent tournaments on 9x9 and 13x13 Expected to reach the level of human professional on 9x9 board Dmitry Kamenetsky Machine Learning applied to Go
  42. 42. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo’s success Ranked first on 9x9 Computer Go Server since August 2006 Won two most recent tournaments on 9x9 and 13x13 Expected to reach the level of human professional on 9x9 board Dmitry Kamenetsky Machine Learning applied to Go
  43. 43. Introduction Monte Carlo Monte Carlo Go Bandit Problem My Work MoGo MoGo’s success Ranked first on 9x9 Computer Go Server since August 2006 Won two most recent tournaments on 9x9 and 13x13 Expected to reach the level of human professional on 9x9 board Dmitry Kamenetsky Machine Learning applied to Go
  44. 44. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Replacing UCB UCT is ad-hoc. Lack of theoretical analysis, because random variables (rewards Xi ) are not i.i.d. Instead, use Beta distributions to model random variables Beta distribution is a conjugate prior to binomial distribution (game result) Here α = wins from node, β = losses from node Let p be parent’s winning percentage and 0 < a < 1 parameter Pick a move that is most likely to have a winning percentage greater than (1 − a)p + a Dmitry Kamenetsky Machine Learning applied to Go
  45. 45. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Replacing UCB UCT is ad-hoc. Lack of theoretical analysis, because random variables (rewards Xi ) are not i.i.d. Instead, use Beta distributions to model random variables Beta distribution is a conjugate prior to binomial distribution (game result) Here α = wins from node, β = losses from node Let p be parent’s winning percentage and 0 < a < 1 parameter Pick a move that is most likely to have a winning percentage greater than (1 − a)p + a Dmitry Kamenetsky Machine Learning applied to Go
  46. 46. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Replacing UCB UCT is ad-hoc. Lack of theoretical analysis, because random variables (rewards Xi ) are not i.i.d. Instead, use Beta distributions to model random variables Beta distribution is a conjugate prior to binomial distribution (game result) Here α = wins from node, β = losses from node Let p be parent’s winning percentage and 0 < a < 1 parameter Pick a move that is most likely to have a winning percentage greater than (1 − a)p + a Dmitry Kamenetsky Machine Learning applied to Go
  47. 47. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Replacing UCB UCT is ad-hoc. Lack of theoretical analysis, because random variables (rewards Xi ) are not i.i.d. Instead, use Beta distributions to model random variables Beta distribution is a conjugate prior to binomial distribution (game result) Here α = wins from node, β = losses from node Let p be parent’s winning percentage and 0 < a < 1 parameter Pick a move that is most likely to have a winning percentage greater than (1 − a)p + a Dmitry Kamenetsky Machine Learning applied to Go
  48. 48. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Replacing UCB UCT is ad-hoc. Lack of theoretical analysis, because random variables (rewards Xi ) are not i.i.d. Instead, use Beta distributions to model random variables Beta distribution is a conjugate prior to binomial distribution (game result) Here α = wins from node, β = losses from node Let p be parent’s winning percentage and 0 < a < 1 parameter Pick a move that is most likely to have a winning percentage greater than (1 − a)p + a Dmitry Kamenetsky Machine Learning applied to Go
  49. 49. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Replacing UCB UCT is ad-hoc. Lack of theoretical analysis, because random variables (rewards Xi ) are not i.i.d. Instead, use Beta distributions to model random variables Beta distribution is a conjugate prior to binomial distribution (game result) Here α = wins from node, β = losses from node Let p be parent’s winning percentage and 0 < a < 1 parameter Pick a move that is most likely to have a winning percentage greater than (1 − a)p + a Dmitry Kamenetsky Machine Learning applied to Go
  50. 50. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving node evaluation MoGo’s node evaluation is fast, but not so meaningful Instead, use our cooperative scorer: Initialization: Statically fill neutral territory with stones Loop: players cooperate to make moves that do not affect the score Accurately predicts score: 96.3% on 9x9 and 89.2% on 19x19 Only 15 times slower than pure random Dmitry Kamenetsky Machine Learning applied to Go
  51. 51. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving node evaluation MoGo’s node evaluation is fast, but not so meaningful Instead, use our cooperative scorer: Initialization: Statically fill neutral territory with stones Loop: players cooperate to make moves that do not affect the score Accurately predicts score: 96.3% on 9x9 and 89.2% on 19x19 Only 15 times slower than pure random Dmitry Kamenetsky Machine Learning applied to Go
  52. 52. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving node evaluation MoGo’s node evaluation is fast, but not so meaningful Instead, use our cooperative scorer: Initialization: Statically fill neutral territory with stones Loop: players cooperate to make moves that do not affect the score Accurately predicts score: 96.3% on 9x9 and 89.2% on 19x19 Only 15 times slower than pure random Dmitry Kamenetsky Machine Learning applied to Go
  53. 53. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving node evaluation MoGo’s node evaluation is fast, but not so meaningful Instead, use our cooperative scorer: Initialization: Statically fill neutral territory with stones Loop: players cooperate to make moves that do not affect the score Accurately predicts score: 96.3% on 9x9 and 89.2% on 19x19 Only 15 times slower than pure random Dmitry Kamenetsky Machine Learning applied to Go
  54. 54. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving node evaluation MoGo’s node evaluation is fast, but not so meaningful Instead, use our cooperative scorer: Initialization: Statically fill neutral territory with stones Loop: players cooperate to make moves that do not affect the score Accurately predicts score: 96.3% on 9x9 and 89.2% on 19x19 Only 15 times slower than pure random Dmitry Kamenetsky Machine Learning applied to Go
  55. 55. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving node evaluation MoGo’s node evaluation is fast, but not so meaningful Instead, use our cooperative scorer: Initialization: Statically fill neutral territory with stones Loop: players cooperate to make moves that do not affect the score Accurately predicts score: 96.3% on 9x9 and 89.2% on 19x19 Only 15 times slower than pure random Dmitry Kamenetsky Machine Learning applied to Go
  56. 56. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Scorer Example Dmitry Kamenetsky Machine Learning applied to Go
  57. 57. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Scorer Example Dmitry Kamenetsky Machine Learning applied to Go
  58. 58. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Scorer Example Dmitry Kamenetsky Machine Learning applied to Go
  59. 59. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving memory management Tree data structure is memory-inefficient Instead, use a hash table: For each visited position p, key = ZobristHash(p) Store statistics of p: hashTable[key] = (#wins, #runs, depth) Collision handling Use a small hash table with more information for frequently visited nodes Dmitry Kamenetsky Machine Learning applied to Go
  60. 60. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving memory management Tree data structure is memory-inefficient Instead, use a hash table: For each visited position p, key = ZobristHash(p) Store statistics of p: hashTable[key] = (#wins, #runs, depth) Collision handling Use a small hash table with more information for frequently visited nodes Dmitry Kamenetsky Machine Learning applied to Go
  61. 61. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving memory management Tree data structure is memory-inefficient Instead, use a hash table: For each visited position p, key = ZobristHash(p) Store statistics of p: hashTable[key] = (#wins, #runs, depth) Collision handling Use a small hash table with more information for frequently visited nodes Dmitry Kamenetsky Machine Learning applied to Go
  62. 62. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving memory management Tree data structure is memory-inefficient Instead, use a hash table: For each visited position p, key = ZobristHash(p) Store statistics of p: hashTable[key] = (#wins, #runs, depth) Collision handling Use a small hash table with more information for frequently visited nodes Dmitry Kamenetsky Machine Learning applied to Go
  63. 63. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving memory management Tree data structure is memory-inefficient Instead, use a hash table: For each visited position p, key = ZobristHash(p) Store statistics of p: hashTable[key] = (#wins, #runs, depth) Collision handling Use a small hash table with more information for frequently visited nodes Dmitry Kamenetsky Machine Learning applied to Go
  64. 64. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Improving memory management Tree data structure is memory-inefficient Instead, use a hash table: For each visited position p, key = ZobristHash(p) Store statistics of p: hashTable[key] = (#wins, #runs, depth) Collision handling Use a small hash table with more information for frequently visited nodes Dmitry Kamenetsky Machine Learning applied to Go
  65. 65. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Learning evaluation function Convert the board position into a graph: Collapse regions of the same colour into one node Create edges between adjacent regions Use Condition Random Fields (CRF) to learn from this graph: Learn final territory assignment Predict the next move Can use this with the scorer or for move generation Dmitry Kamenetsky Machine Learning applied to Go
  66. 66. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Learning evaluation function Convert the board position into a graph: Collapse regions of the same colour into one node Create edges between adjacent regions Use Condition Random Fields (CRF) to learn from this graph: Learn final territory assignment Predict the next move Can use this with the scorer or for move generation Dmitry Kamenetsky Machine Learning applied to Go
  67. 67. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Learning evaluation function Convert the board position into a graph: Collapse regions of the same colour into one node Create edges between adjacent regions Use Condition Random Fields (CRF) to learn from this graph: Learn final territory assignment Predict the next move Can use this with the scorer or for move generation Dmitry Kamenetsky Machine Learning applied to Go
  68. 68. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Learning evaluation function Convert the board position into a graph: Collapse regions of the same colour into one node Create edges between adjacent regions Use Condition Random Fields (CRF) to learn from this graph: Learn final territory assignment Predict the next move Can use this with the scorer or for move generation Dmitry Kamenetsky Machine Learning applied to Go
  69. 69. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Learning evaluation function Convert the board position into a graph: Collapse regions of the same colour into one node Create edges between adjacent regions Use Condition Random Fields (CRF) to learn from this graph: Learn final territory assignment Predict the next move Can use this with the scorer or for move generation Dmitry Kamenetsky Machine Learning applied to Go
  70. 70. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Learning evaluation function Convert the board position into a graph: Collapse regions of the same colour into one node Create edges between adjacent regions Use Condition Random Fields (CRF) to learn from this graph: Learn final territory assignment Predict the next move Can use this with the scorer or for move generation Dmitry Kamenetsky Machine Learning applied to Go
  71. 71. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Learning evaluation function Convert the board position into a graph: Collapse regions of the same colour into one node Create edges between adjacent regions Use Condition Random Fields (CRF) to learn from this graph: Learn final territory assignment Predict the next move Can use this with the scorer or for move generation Dmitry Kamenetsky Machine Learning applied to Go
  72. 72. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Graph conversion example Dmitry Kamenetsky Machine Learning applied to Go
  73. 73. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Graph conversion example Dmitry Kamenetsky Machine Learning applied to Go
  74. 74. Beta Distributions Introduction Cooperative Scorer Monte Carlo Go Zobrist Hash My Work Conditional Random Fields Questions? You never ever know if you never ever GO! Dmitry Kamenetsky Machine Learning applied to Go

×