Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What did AlphaGo do to beat the strongest human Go player?

985 views

Published on

This year AlphaGo shocked the world by decisively beating the strongest human Go player, Lee Sedol. An accomplishment that wasn't expected for years to come. How did AlphaGo do this? What algorithms did it use? What advances in AI made it possible? This talk will answer these questions.

Published in: Technology
  • Be the first to comment

What did AlphaGo do to beat the strongest human Go player?

  1. 1. March 2016
  2. 2. Mainstream Media
  3. 3. 1997
  4. 4. Ing cup 1985 – 2000 (up to 1,400,000$) (1985-2000)
  5. 5. 5d win 1998
  6. 6. October 2015
  7. 7. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away. Silver, D. et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), p.484-489. January 2016
  8. 8. What did AlphaGo do to beat the strongest human Go player? Tobias Pfeiffer @PragTob pragtob.info
  9. 9. Go
  10. 10. Computational Challenge
  11. 11. Monte Carlo Method
  12. 12. Neural Networks
  13. 13. Revolution with Neural Networks
  14. 14. What did we learn?
  15. 15. Go
  16. 16. Computational Challenge
  17. 17. Go vs. Chess
  18. 18. Complex vs. Complicated
  19. 19. „While the Baroque rules of chess could only have been created by humans, the rules of go are so elegant, organic, and rigorously logical that if intelligent life forms exist elsewhere in the universe, they almost certainly play go.“ Edward Lasker (chess grandmaster)
  20. 20. Larger board 19x19 vs. 8x8
  21. 21. Almost every move is legal
  22. 22. Average branching factor: 250 vs 35
  23. 23. State Space Complexity: 10171 vs 1047
  24. 24. 1080
  25. 25. Global impact of moves
  26. 26. 68957966354765 685766345 857635 563 6 MAX MIN MAX MIN MAX Traditional Seach
  27. 27. 68957966354765 685766345 857635 563 6 MAX MIN MAX MIN MAX Evaluation Function
  28. 28. Monte Carlo Method
  29. 29. What is Pi?
  30. 30. How do you determine Pi?
  31. 31. 2006
  32. 32. Browne, Cb, and Edward Powley. 2012. A survey of monte carlo tree search methods. Intelligence and AI 4, no. 1: 1-49
  33. 33. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13C7
  34. 34. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13C7 Selection
  35. 35. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13C7 0/0 B5 Expansion
  36. 36. 2/4 1/1 0/1 1/1 0/1 A1 D5 F13C7 0/0 B5 Simulation
  37. 37. Random
  38. 38. Not Human like?
  39. 39. 3/5 2/2 0/1 1/1 0/1 A1 D5 F13C7 1/1 B5 Backpropagation
  40. 40. 3/5 2/2 0/1 1/1 0/1 A1 D5 F13C7 1/1 B5 Perspective
  41. 41. 2/5 1/2 0/1 1/1 0/1 A1 D5 F13C7 1/1 B5 Perspective
  42. 42. Multi Armed Bandit
  43. 43. Multi Armed Bandit Exploitation vs Exploration
  44. 44. wins visits +explorationFactor √ln(totalVisits) visits
  45. 45. 15042 86/193 0/1 1/2 0/2 36/1116 2/2 58/151 1/2 0/23/3
  46. 46. 15042 86/193 0/1 1/2 0/2 36/1116 2/2 58/151 1/2 0/23/3
  47. 47. 15042 86/193 0/1 1/2 0/2 36/1116 2/2 58/151 1/2 0/23/3
  48. 48. Generate a valid random move
  49. 49. Who has won?
  50. 50. General Game Playing
  51. 51. Anytime
  52. 52. Lazy
  53. 53. Expert Knowledge
  54. 54. Neural Networks
  55. 55. 2014
  56. 56. What does this even mean?
  57. 57. Neural Networks
  58. 58. Input “Hidden” Layer Output Neural Networks
  59. 59. Weights
  60. 60. Bias/Threshold
  61. 61. 4 2 -3 3.2 Activation
  62. 62. 5.2 >= 4 2 -3 3.2 Activation
  63. 63. 2.2 <= 4 2 -3 3.2 Activation
  64. 64. Activation
  65. 65. Training
  66. 66. Adjust parameters
  67. 67. Supervised Learning Input Expected Output
  68. 68. Backpropagation
  69. 69. Data set
  70. 70. Training data + test data
  71. 71. Training
  72. 72. Verify
  73. 73. Overfitting
  74. 74. Deep Neural Networks
  75. 75. Convolutional Neural Networks
  76. 76. Local Receptive Field
  77. 77. Feature Map
  78. 78. Stride
  79. 79. Shared weights and biases
  80. 80. 19 x 19 3 x 17 x 17 Multiple Feature maps/filters
  81. 81. Architecture ... Input Features 12 layers with 64 – 192 filters Output
  82. 82. Architecture ... Input Features 12 layers with 64 – 192 filters Output
  83. 83. Architecture ... Input Features 12 layers with 64 – 192 filters Output
  84. 84. 2.3 million parameters 630 million connections
  85. 85. ● Stone Colour x 3 ● Liberties x 4 ● Liberties after move played x 6 ● Legal Move x 1 ● Turns since x 5 ● Capture Size x 7 ● Ladder Move x 1 ● KGS Rank x 9 Input Features
  86. 86. Training on game data predicting the next move
  87. 87. 55% Accuracy
  88. 88. Mostly beats GnuGo
  89. 89. Combined with MCTS in the Selection
  90. 90. Asynchronous GPU Power
  91. 91. Revolution
  92. 92. Silver, D. et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), p.484-489. Networks in Training
  93. 93. Silver, D. et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), p.484-489. Networks in Training
  94. 94. Silver, D. et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), p.484-489. AlphaGo Search
  95. 95. Selection
  96. 96. Action Value Prior Probability Visit Count Selection
  97. 97. Action Value Prior Probability Visit Count
  98. 98. Action Value Prior Probability Visit Count Selection
  99. 99. Action Value Prior Probability Visit Count Selection
  100. 100. 0.8 1.2 0.5 1.1 0.9 Action Value + Bonues
  101. 101. 0.8 1.2 0.5 1.1 0.9 Expansion
  102. 102. 0.8 1.2 0.5 1.1 0.9 Expansion
  103. 103. 0.8 1.2 0.5 1.1 0.9 Prior Probability
  104. 104. 0.8 1.2 0.5 1.1 0.9 Evalution
  105. 105. 0.8 1.2 0.5 1.1 0.9 Evalution
  106. 106. 0.8 1.2 0.5 1.1 0.9 Rollout
  107. 107. 0.8 1.2 0.5 1.1 0.9 Value Network
  108. 108. 0.81 1.3 0.5 1.1 0.9 Backup 1.6
  109. 109. 1202 CPUs 176 GPUS 0.8 1.2 0.5 1.1 0.9
  110. 110. Tensor
  111. 111. Human Instinct Policy Network Reading Capability Search Positional Judgement Value Network 3 Strengths of AlphaGo
  112. 112. Human Instinct Policy Network Reading Capability Search Positional Judgement Value Network Most Important Strength
  113. 113. More Natural
  114. 114. Lee Sedol match
  115. 115. Style
  116. 116. So when AlphaGo plays a slack looking move, we may regard it as a mistake, but perhaps it should more accurately be viewed as a declaration of victory? An Younggil 8p
  117. 117. Game 2
  118. 118. Game 4
  119. 119. Game 4
  120. 120. Game 4
  121. 121. What can we learn?
  122. 122. Making X faster vs Doing less of X
  123. 123. Benchmark everything
  124. 124. Solving problems the human way vs Solving problems the computer way
  125. 125. Don't blindly dismiss approaches as infeasible
  126. 126. One Approach vs Combination of Approaches
  127. 127. Joy of Creation
  128. 128. PragTob/Rubykon
  129. 129. pasky/michi
  130. 130. What did AlphaGo do to beat the strongest human Go player? Tobias Pfeiffer @PragTob pragtob.info
  131. 131. Sources ● Maddison, C.J. et al., 2014. Move Evaluation in Go Using Deep Convolutional Neural Networks. ● Silver, D. et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), p.484-489. ● Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, 2015 http://neuralnetworksanddeeplearning.com ● Gelly, S. & Silver, D., 2011. Monte-Carlo tree search and rapid action value estimation in computer Go. Artificial Intelligence, 175(11), p.1856-1876. ● I. Althöfer, “On the Laziness of Monte-Carlo Game Tree Search In Non-tight Situations,” Friedrich-Schiller Univ., Jena, Tech. Rep., 2008. ● Browne, C. & Powley, E., 2012. A survey of monte carlo tree search methods. IEEE Transactions on Intelligence and AI in Games, 4(1), p.1-49. ● Gelly, S. & Silver, D., 2007. Combining online and offline knowledge in UCT. Machine Learning, p.273-280. ● https://www.youtube.com/watch?v=LX8Knl0g0LE&index=9&list=WL
  132. 132. Photo Credit ● http://www.computer-go.info/events/ing/2000/images/bigcup.jpg ● https://en.wikipedia.org/wiki/File:Kasparov-29.jpg ● http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-black/product-images ● http://giphy.com/gifs/dark-thread-after-lCP95tGSbMmWI ● https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chi p.html ● https://gogameguru.com/i/2016/01/Fan-Hui-vs-AlphaGo-550x364.jpg ● http://makeitstranger.com/ ● CC BY 2.0 – https://en.wikipedia.org/wiki/File:Deep_Blue.jpg – https://www.flickr.com/photos/luisbg/2094497611/ ● CC BY-SA 3.0 – https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning#/media/File:AB_pruning.svg ● CC BY-SA 2.0 – https://flic.kr/p/cPUtny – https://flic.kr/p/dLSKTQ – https://www.flickr.com/photos/83633410@N07/7658272558/
  133. 133. Photo Credit ● CC BY-NC-ND 2.0 – https://flic.kr/p/q15pzb – https://flic.kr/p/bHSj7D – https://flic.kr/p/ixSsfM – https://www.flickr.com/photos/waxorian/4228645447/ – https://www.flickr.com/photos/pennstatelive/8972110324/ – https://www.flickr.com/photos/dylanstraub/6428496139/ ● https://en.wikipedia.org/wiki/Alphabet_Inc.#/media/File:Alphabet_Inc_Logo_2015.svg ● CC BY 3.0 – https://en.wikipedia.org/wiki/File:Pi_30K.gif

×