"Monte-Carlo Tree Search for the game of Go"

5,461 views
5,188 views

Published on

"Monte-Carlo Tree Search for the game of Go",
Bruno Bouzy's talk at BigMC, 5th May 2011

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,461
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
130
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

"Monte-Carlo Tree Search for the game of Go"

  1. 1. Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université Paris Descartes Séminaire BigMC 5 mai 2011
  2. 2. Outline● The game of Go: a 9x9 game● The « old » approach (*-2002)● The Monte-Carlo approach (2002-2005)● The MCTS approach (2006-today)● Conclusion MCTS for Computer Go 2
  3. 3. The game of Go MCTS for Computer Go 3
  4. 4. The game of Go● 4000 years● Originated from China● Developed by Japan (20th century)● Best players in Korea, Japan, China● 19x19: official board size● 9x9: beginners board size MCTS for Computer Go 4
  5. 5. A 9x9 game● The board has 81 « intersections ». Initially,it is empty. MCTS for Computer Go 5
  6. 6. A 9x9 game● Black moves first. A « stone » is played on an intersection. MCTS for Computer Go 6
  7. 7. A 9x9 game● White moves second. MCTS for Computer Go 7
  8. 8. A 9x9 game● Moves alternate between Black and White. MCTS for Computer Go 8
  9. 9. A 9x9 game● Two adjacent stones of the same color builds a « string » with « liberties ».● 4-adjacency MCTS for Computer Go 9
  10. 10. A 9x9 game● Strings are created. MCTS for Computer Go 10
  11. 11. A 9x9 game● A white stone is in « atari » (one liberty). MCTS for Computer Go 11
  12. 12. A 9x9 game● The white string has five liberties. MCTS for Computer Go 12
  13. 13. A 9x9 game● The black stone is « atari ». MCTS for Computer Go 13
  14. 14. A 9x9 game● White « captures » the black stone. MCTS for Computer Go 14
  15. 15. A 9x9 game● For human players, the game is over. – Hu? – Why? MCTS for Computer Go 15
  16. 16. A 9x9 game● What happens if White contests black « territory »? MCTS for Computer Go 16
  17. 17. A 9x9 game● White has invaded. Two strings are atari! MCTS for Computer Go 17
  18. 18. A 9x9 game● Black captures ! MCTS for Computer Go 18
  19. 19. A 9x9 game● White insists but its string is atari... MCTS for Computer Go 19
  20. 20. A 9x9 game● Black has proved is « territory ». MCTS for Computer Go 20
  21. 21. A 9x9 game● Black may contest white territory too. MCTS for Computer Go 21
  22. 22. A 9x9 terminal position● The game is over for computers. – Hu? – Who won ? MCTS for Computer Go 22
  23. 23. A 9x9 game● The game ends when both players pass.● One black (resp. white) point for each black (resp. white) stone and each black (resp. white) « eye » on the board.● One black (resp. white) eye = an empty intersection surrounded by black (resp. white) stones. MCTS for Computer Go 23
  24. 24. A 9x9 game● Scoring: – Black = 44 – White = 37 – Komi = 7.5 – Score = -0.5● White wins! MCTS for Computer Go 24
  25. 25. Go ranking: « kyu » and « dan » Pro ranking Amateur ranking Top professional players 9 dan 9 dan Very strong players 1 dan 6 dan 1 dan Strong players 1 kyu Average playersl 10 kyu Beginners 20 kyu Very beginners 30 kyu MCTS for Computer Go 25
  26. 26. Computer Go (old history)● First go program (Lefkovitz 1960)● Zobrist hashing (Zobrist 1969)● Interim2 (Wilcox 1979)● Life and death model (Benson 1988)● Patterns: Goliath (Boon 1990)● Mathematical Go (Berlekamp 1991)● Handtalk (Chen 1995) MCTS for Computer Go 26
  27. 27. The old approach● Evaluation of non terminal positions – Knowledge-based – Breaking-down of a position into sub- positions● Fixed-depth global tree search – Depth = 0 : action with the best value – Depth = 1: action leading to the position with the best evaluation – Depth > 1: alfa-beta or minmax MCTS for Computer Go 27
  28. 28. The old approach Current position2 or 3Bounded depth Tree search Evaluation of non terminal positions Huhu?361 MCTS for Computer Go Terminal positions 28
  29. 29. Position evaluation● Break-down – Whole game (win/loss or score) – Goal-oriented sub-game ● String capture ● Connections, dividers, eyes, life and death● Local searches – Alpha-beta and enhancements – Proof-number search MCTS for Computer Go 29
  30. 30. A 19x19 middle-game position MCTS for Computer Go 30
  31. 31. A possible black break-down MCTS for Computer Go 31
  32. 32. A possible white break-down MCTS for Computer Go 32
  33. 33. Possible local evaluations (1) unstable Alive and territory Not important alive dead alive MCTS for Computer Go 33
  34. 34. Possible local evaluations (2) alive unstable unstable alive + big territory unstable MCTS for Computer Go 34
  35. 35. Position evaluation● Local results – Obtained with local tree search – Result if white plays first (resp. black) – Combinatorial game theory (Conway) – Switches {a|b}, >, <, *, 0● Global recomposition – move generation and evaluation – position evaluation MCTS for Computer Go 35
  36. 36. Position evaluation MCTS for Computer Go 36
  37. 37. Drawbacks (1/2)● The break-down is not unique● Performing a (wrong) local tree search on a (possibly irrelevant) local position● Misevaluating the size of the local position● Different kinds of local information – Symbolic (group: dead alive unstable) – Numerical (territory size, reduction, increase) MCTS for Computer Go 37
  38. 38. Drawbacks (2/2)● Local positions interact● Complicated● Domain-dependent knowledge● Need of human expertise● Difficult to program and maintain● Holes of knowledge● Erratic behaviour MCTS for Computer Go 38
  39. 39. Upsides● Feasible on 1990s computers● Execution is fast● Some specific local tree searches are accurate and fast MCTS for Computer Go 39
  40. 40. The old approach Pro ranking Amateur ranking Top professional players 9 dan 9 dan Very strong players 1 dan 6 dan 1 dan Strong players 1 kyu Average playersl 10 kyu Beginners 20 kyuOldapproach Very beginners 30 kyu MCTS for Computer Go 40
  41. 41. End of part one!● Next: the Monte-Carlo approach... MCTS for Computer Go 41
  42. 42. The Monte-Carlo (MC) approach● Games containing chance – Backgammon (Tesauro 1989)● Games with hidden information – Bridge (Ginsberg 2001) – Poker (Billings & al. 2002) – Scrabble (Sheppard 2002) MCTS for Computer Go 42
  43. 43. The Monte-Carlo approach● Games with complete information – A general model (Abramson 1990)● Simulated annealing Go – (Brügmann 1993) – 2 sequences of moves – « all moves as first » heuristic – Gobble on 9x9 MCTS for Computer Go 43
  44. 44. The Monte-Carlo approach● Position evaluation: Launch N random games Evaluation = mean value of outcomes● Depth-one MC algorithm: For each move m { Play m on the ref position Launch N random games Move value (m) = mean value } MCTS for Computer Go 44
  45. 45. Depth-one Monte-Carlo ...... ... ... MCTS for Computer Go 45
  46. 46. Progressive pruning● (Billings 2002, Sheppard 2002, Bouzy & Helmstetter 2003) Second best Current best move Still explored Pruned MCTS for Computer Go 46
  47. 47. Upper bound● Optimism in face of uncertainty – Intestim (Kaelbling 1993), – UCB multi-armed bandit (Auer & al 2002) Second best promising Current best proven move Current best promising move MCTS for Computer Go 47
  48. 48. All-moves-as-first heuristic (1/3) B C r A D A B C D MCTS for Computer Go 48
  49. 49. All-moves-as-first heuristic (2/3) B C r A D A A B C D B C Actual simulation D B C o A D MCTS for Computer Go 49
  50. 50. All-moves-as-first heuristic (3/3) B C r A D A C A B C D B B C Actual A Virtual simulation = simulation actual simulation assuming c is played D D B « as first » C o A D o MCTS for Computer Go 50
  51. 51. The Monte-Carlo approach● Upsides – Robust evaluation – Global search – Move quality increases with computing power● Way of playing – Good strategical sense but weak tactically● Easy to program – Follow the rules of the game – No break-down problem MCTS for Computer Go 51
  52. 52. Monte-Carlo and knowledge● Pseudo-random simulations using Go knowledge (Bouzy 2003) – Moves played with a probability depending on specific domain-dependent knowledge● 2 basic concepts – string capture 3x3 shapes MCTS for Computer Go 52
  53. 53. Monte-Carlo and knowledge● Results are impressive – MC(random) << MC(pseudo random) – Size 9x9 13x13 19x19 – % wins 68 93 98● Other works on simulations – Patterns in MoGo, proximity rule (Wang & al 2006) – Simulation balancing (Silver & Tesauro 2009) MCTS for Computer Go 53
  54. 54. Monte-Carlo and knowledge● Pseudo-random player – 3x3 pattern urgency table with 38 patterns – Few dizains of relevant patterns only – Patterns gathered by ● Human expertise ● Reinforcement Learning (Bouzy & Chaslot 2006)● Warning – p1 better than p2 does not mean MC(p1) better than MC(p2) MCTS for Computer Go 54
  55. 55. Monte-Carlo Tree Search (MCTS)● How to integrate MC and TS ?● UCT = UCB for Trees – (Kocsis & Szepesvari 2006) – Superposition of UCB (Auer & al 2002)● MCTS – Selection, expansion, updating (Chaslot & al) (Coulom 2006) – Simulation (Bouzy 2003) (Wang & Gelly 2006) MCTS for Computer Go 55
  56. 56. MCTS (1/2)while (hasTime) { playOutTreeBasedGame() expandTree() outcome = playOutRandomGame() updateNodes(outcome)}then choose the node with... ... the best mean value ... the highest visit number MCTS for Computer Go 56
  57. 57. MCTS (2/2)PlayOutTreeBasedGame() { node = getNode(position) while (node) { move=selectMove(node) play(move) node = getNode(position) }} MCTS for Computer Go 57
  58. 58. UCT move selection● Move selection rule to browse the tree: move=argmax (s*mean + C*sqrt(log(t)/n)● Mean value for exploitation – s (=+-1): color to move● UCT bias for exploration – C: constant term set up by experiments – t: number of visits of the parent node – n: number of visits of the current node MCTS for Computer Go 58
  59. 59. Example● 1 iteration 1/1 1 MCTS for Computer Go 59
  60. 60. Example● 2 iterations 1/2 0/1 0 MCTS for Computer Go 60
  61. 61. Example● 3 iterations 2/3 1/1 0/1 1 MCTS for Computer Go 61
  62. 62. Example 2/4● 4 iterations 0/1 1/1 0/1 MCTS for Computer Go 62
  63. 63. Example 3/5● 5 iterations 0/1 1/1 0/1 1/1 MCTS for Computer Go 63
  64. 64. Example● 6 iterations 3/6 0/1 1/2 0/1 1/1 0/1 MCTS for Computer Go 64
  65. 65. Example● 7 iterations 3/7 0/1 1/2 0/1 1/2 0/1 0/1 MCTS for Computer Go 65
  66. 66. Example● 8 iterations 4/8 0/1 2/3 0/1 1/2 1/1 0/1 0/1 MCTS for Computer Go 66
  67. 67. Example● 9 iterations 4/9 0/1 2/4 0/1 1/2 0/1 1/1 0/1 0/1 MCTS for Computer Go 67
  68. 68. Example● 10 iterations 5/10 0/1 2/4 0/1 2/3 0/1 1/1 0/1 1/1 0/1 MCTS for Computer Go 68
  69. 69. Example● 11 iterations 6/11 0/1 2/4 0/1 3/4 0/1 1/1 0/1 1/1 0/1 1/1 MCTS for Computer Go 69
  70. 70. Example● 12 iterations 7/12 0/1 2/4 0/1 4/5 0/1 1/1 0/1 1/1 1/2 1/1 1/1 MCTS for Computer Go 70
  71. 71. Example● Clarity – C=0● Notice – with C != 0 a node cannot stay unvisited – min or max rule according to the node depth – not visited children have an infinite mean● Practice – Mean initialized optimistically MCTS for Computer Go 71
  72. 72. MCTS enhancements● The raw version can be enhanced – Tuning UCT C value – Outcome = score or win loss info (+1/-1) – Doubling the simulation number – RAVE – Using Go knowledge ● In the tree or in the simulations – Speed-up ● Optimizing, pondering, parallelizing MCTS for Computer Go 72
  73. 73. Assessing an enhancement● Self-play – The new version vs the reference version – % wins with few hundred games – 9x9 (or 19x19 boards)● Against differently designed programs – GTP (Go Text Protocol) – CGOS (Computer Go Operating System)● Competitions MCTS for Computer Go 73
  74. 74. Move selection formula tuning● Using UCB – Best value for C ? – 60-40%● Using « UCB-tuned » (Auer & al 2002) – C replaced by min(1/4,variance) – 55-45% MCTS for Computer Go 74
  75. 75. Exploration vs exploitation● General idea: explore at the beginning and exploit in the end of thinking time● Diminishing C linearly in the remaining time – (Vermorel & al 2005) – 55-45%● At the end: – Argmax over the mean value or over the number of visits ? – 55-45% MCTS for Computer Go 75
  76. 76. Kind of outcome● 2 kinds of outcomes – Score (S) or win loss information (WLI) ? – Probability of winning or expected score ? – Combining both (S+WLI) (score +45 if win)● Results – WLI vs S 65-35% – S+WLI vs S 65-35% MCTS for Computer Go 76
  77. 77. Doubling the number of simulations● N = 100,000● Results – 2N vs N 60-40% – 4N vs 2N 58-42% MCTS for Computer Go 77
  78. 78. Tree management● Transposition tables – Tree -> Directed Acyclic Graph (DAG) – Different sequences of moves may lead to the same position – Interest for MC Go: merge the results – Result: 60-40%● Keeping the tree from one move to the next – Result: 65-35% MCTS for Computer Go 78
  79. 79. RAVE (1/3)● Rapid Action Value Estimation – Mogo 2007 – Use the AMAF heuristic (Brugmann 1993) – There are « many » virtual sequences that are transposed from the actually played sequence● Result: – 70-30% MCTS for Computer Go 79
  80. 80. RAVE (2/3)● AMAF heuristic● Which nodes to update? A B● Actual C C D D – Sequence ACBD – Nodes B A A B● Virtual – BCAD, ADBC, BDAC D C – Nodes MCTS for Computer Go 80
  81. 81. RAVE (3/3)● 3 variables – Usual mean value Mu – AMAF mean value Mamaf – M = β Mamaf + (1-β) Mu – β = sqrt(k/(k+3N)) – K set up experimentally● M varies from Mamaf to Mu MCTS for Computer Go 81
  82. 82. Knowledge in the simulations● High urgency for... – capture/escape 55-45% – 3x3 patterns 60-40% – Proximity rule 60-40%● Mercy rule – Interrupt the game when the difference of captured stones is greater than a threshold (Hillis 2006) – 51-49% MCTS for Computer Go 82
  83. 83. Knowledge in the tree● Virtual wins for good looking moves● Automatic acquisition of patterns of pro games (Coulom 2007) (Bouzy & Chaslot 2005)● Matching has a high cost● Progressive widening (Chaslot & al 2008)● Interesting under strong time constraints● Result: 60-40% MCTS for Computer Go 83
  84. 84. Speeding up the simulations● Fully random simulations (2007) – 50,000 game/second (Lew 2006) – 20,000 (commonly eared) – 10,000 (my program)● Pseudo-random – 5,000 (my program in 2007)● Rough optimization is worthwhile MCTS for Computer Go 84
  85. 85. Pondering● Think on the opponent time – 55-45% – Possible doubling of thinking time – The move of the opponent may not be the planned move on which you think – Side effect: play quickly to think on the opponent time MCTS for Computer Go 85
  86. 86. Summing up the enhancements● MCTS with all enhancements vs raw MCTS – Exploration and exploitation: 60-40% – Win/loss outcome: 65-35% – Rough optimization of simulations 60-40% – Transposition table 60-40% – RAVE 70-30% – Knowledge in the simulations 70-30% – Knowledge in the tree 60-40% – Pondering 55-45% – Parallelization 70-30%● Result: 99-1% MCTS for Computer Go 86
  87. 87. Parallelization● Computer Chess: Deep Blue● Multi-core computer – Symmetric MultiProcessor (SMP) – one thread per processor – shared memory, low latency – mutual exclusion (mutex) mechanism● Cluster of computers – Message Passing Information (MPI) MCTS for Computer Go 87
  88. 88. Parallelizationwhile (hasTime) { playOutTreeBasedGame() expandTree() outcome = playOutRandomGame() updateNodes(outcome)} MCTS for Computer Go 88
  89. 89. Leaf parallelization MCTS for Computer Go 89
  90. 90. Leaf parallelization● (Cazenave Jouandeau 2007)● Easy to program● Drawbacks – Wait for the longest simulation – When part of the simulation outcomes is a loss, performing the remaining may not be a relevant strategy. MCTS for Computer Go 90
  91. 91. Root parallelization MCTS for Computer Go 91
  92. 92. Root parallelization● (Cazenave Jouandeau 2007)● Easy to program● No communication● At completion, merge the trees● 4 MCTS for 1sec > 1 MCTS for 4 sec● Good way for low time settings and a small number of threads MCTS for Computer Go 92
  93. 93. Tree parallelization – global mutex MCTS for Computer Go 93
  94. 94. Tree parallelization – local mutex MCTS for Computer Go 94
  95. 95. Tree parallelization● One shared tree, several threads● Mutex – Global: the whole tree has a mutex – Local: each node has a mutex● « Virtual loss » – Given to a node browsed by a thread – Removed at update stage – Preventing threads from similar simulations MCTS for Computer Go 95
  96. 96. Computer-computer results● Computer Olympiads 19x19 9x9 – 2010 Erica, Zen, MFGo MyGoFriend – 2009 Zen, Fuego, Mogo Fuego – 2008 MFGo, Mogo, Leela MFGo – 2007 Mogo, CrazyStone, GNU Go Steenvreter – 2006 GNU Go, Go Intellect, Indigo CrazyStone – 2005 Handtalk, Go Intellect, Aya Go Intellect – 2004 Go Intellect, MFGo, Indigo Go Intellect MCTS for Computer Go 96
  97. 97. Human-computer results● 9x9 – 2009: Mogo won a pro with black – 2009: Fuego won a pro with white● 19x19: – 2008: Mogo won a pro with 9 stones Crazy Stone won a pro with 8 stones Crazy Stone won a pro with 7 stones – 2009: Mogo won a pro with 6 stones MCTS for Computer Go 97
  98. 98. MCTS and the old approach Pro ranking Amateur ranking Top professional players 9 dan 9 dan 9x9 go Very strong players 1 dan 6 dan 1 dan 19x19 go Strong players 1 kyu Average players 10 kyuMCTS Beginners 20 kyuOld Very beginners 30 kyuapproach MCTS for Computer Go 98
  99. 99. Computer Go (MC history)● Monte-Carlo Go (Brugmann 1993)● MCGo devel. (Bouzy & Helmstetter 2003)● MC+knowledge (Bouzy 2003)● UCT (Kocsis & Szepesvari 2006)● Crazy Stone (Coulom 2006)● Mogo (Wang & Gelly 2006) MCTS for Computer Go 99
  100. 100. Conclusion● Monte-Carlo brought a Big improvement in Computer Go over the last decade! – No old approach based program anymore! – All go programs are MCTS based! – Professional level on 9x9! – Dan level on 19x19!● Unbelievable 10 years ago! MCTS for Computer Go 100
  101. 101. Some references● PhD, MCTS and Go (Chaslot 2010)● PhD, Reinf. Learning and Go (Silver 2010)● PhD, R. Learning: applic. to Go (Gelly 2007)● UCT (Kocsis & Szepesvari 2006)● 1st MCTS go program (Coulom 2006) MCTS for Computer Go 101
  102. 102. Web links● http://www.grappa.univ-lille3.fr/icga/● http://cgos.boardspace.net/● http://www.gokgs.com/● http://www.lri.fr/~gelly/MoGo.htm● http://remi.coulom.free.fr/CrazyStone/● http://fuego.sourceforge.net/● ... MCTS for Computer Go 102
  103. 103. Thank you for your attention! MCTS for Computer Go 103

×