SlideShare a Scribd company logo
Mastering the game of Go
with deep neural networks and tree search
Karel Ha
article by Google DeepMind
Spring School of Combinatorics 2016
Why AI?
Applications of AI
spam filters
1
Applications of AI
spam filters
recommender systems (Netflix, YouTube)
1
Applications of AI
spam filters
recommender systems (Netflix, YouTube)
predictive text (Swiftkey)
1
Applications of AI
spam filters
recommender systems (Netflix, YouTube)
predictive text (Swiftkey)
audio recognition (Shazam, SoundHound)
1
Applications of AI
spam filters
recommender systems (Netflix, YouTube)
predictive text (Swiftkey)
audio recognition (Shazam, SoundHound)
music generation (DeepHear - Composing and harmonizing
music with neural networks)
1
Applications of AI
spam filters
recommender systems (Netflix, YouTube)
predictive text (Swiftkey)
audio recognition (Shazam, SoundHound)
music generation (DeepHear - Composing and harmonizing
music with neural networks)
self-driving cars
1
Auto Reply Feature of Google Inbox
Corrado 2015 2
Artistic-style Painting
[1] Gatys, Ecker, and Bethge 2015 [2] Li and Wand 2016 3
Artistic-style Painting
[1] Gatys, Ecker, and Bethge 2015 [2] Li and Wand 2016 3
Baby Names Generated Character by Character
Baby Killiel Saddie Char Ahbort With
Karpathy 2015 4
Baby Names Generated Character by Character
Baby Killiel Saddie Char Ahbort With
Rudi Levette Berice Lussa Hany Mareanne Chrestina Carissy
Karpathy 2015 4
Baby Names Generated Character by Character
Baby Killiel Saddie Char Ahbort With
Rudi Levette Berice Lussa Hany Mareanne Chrestina Carissy
Marylen Hammine Janye Marlise Jacacrie Hendred Romand
Charienna Nenotto Ette Dorane Wallen Marly Darine Salina
Elvyn Ersia Maralena Minoria Ellia Charmin Antley Nerille
Chelon Walmor Evena Jeryly Stachon Charisa Allisa Anatha
Cathanie Geetra Alexie Jerin Cassen Herbett Cossie Velen
Daurenge Robester Shermond Terisa Licia Roselen Ferine Jayn
Lusine Charyanne Sales Sanny Resa Wallon Martine Merus
Jelen Candica Wallin Tel Rachene Tarine Ozila Ketia Shanne
Arnande Karella Roselina Alessia Chasty Deland Berther
Geamar Jackein Mellisand Sagdy Nenc Lessie Rasemy Guen
Karpathy 2015 4
C code Generated Character by Character
Karpathy 2015 5
Algebraic Geometry Generated Character by Character
Karpathy 2015 6
DeepDrumpf
https://twitter.com/deepdrumpf
Hayes 2016 7
DeepDrumpf
https://twitter.com/deepdrumpf = a Twitter bot that has
learned the language of Donald Trump from his speeches
Hayes 2016 7
Atari Player by Google DeepMind
https://youtu.be/0X-NdPtFKq0?t=21m13s
Mnih et al. 2015 8
8
Heads-up Limit Holdem Poker Is Solved!
Bowling et al. 2015 9
Heads-up Limit Holdem Poker Is Solved!
Cepheus http://poker.srv.ualberta.ca/
Bowling et al. 2015 9
Basics of Machine learning
Supervised versus Unsupervised Learning
Supervised learning:
10
Supervised versus Unsupervised Learning
Supervised learning:
data set must be labelled
10
Supervised versus Unsupervised Learning
Supervised learning:
data set must be labelled
e.g. which e-mail is regular/spam, which image is duck/face,
...
10
Supervised versus Unsupervised Learning
Supervised learning:
data set must be labelled
e.g. which e-mail is regular/spam, which image is duck/face,
...
10
Supervised versus Unsupervised Learning
Supervised learning:
data set must be labelled
e.g. which e-mail is regular/spam, which image is duck/face,
...
Unsupervised learning:
10
Supervised versus Unsupervised Learning
Supervised learning:
data set must be labelled
e.g. which e-mail is regular/spam, which image is duck/face,
...
Unsupervised learning:
data set is not labelled
10
Supervised versus Unsupervised Learning
Supervised learning:
data set must be labelled
e.g. which e-mail is regular/spam, which image is duck/face,
...
Unsupervised learning:
data set is not labelled
it can try to cluster the data into different groups
10
Supervised versus Unsupervised Learning
Supervised learning:
data set must be labelled
e.g. which e-mail is regular/spam, which image is duck/face,
...
Unsupervised learning:
data set is not labelled
it can try to cluster the data into different groups
e.g. grouping similar news, ...
10
Supervised Learning
http://www.nickgillian.com/ 11
Supervised Learning
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
http://www.nickgillian.com/ 11
Supervised Learning
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
http://www.nickgillian.com/ 11
Supervised Learning
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
http://www.nickgillian.com/ 11
Supervised Learning
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 11
Supervised Learning
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 11
Supervised Learning
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 11
Regression
12
Regression
12
Mathematical Regression
https://thermanuals.wordpress.com/descriptive-analysis/sampling-and-regression/
13
Classification
https://kevinbinz.files.wordpress.com/2014/08/ml-svm-after-comparison.png 14
Underfitting and Overfitting
https://www.researchgate.net/post/How_to_Avoid_Overfitting 15
Underfitting and Overfitting
Beware of overfitting!
https://www.researchgate.net/post/How_to_Avoid_Overfitting 15
Underfitting and Overfitting
Beware of overfitting!
It is like learning for a math exam by memorizing proofs.
https://www.researchgate.net/post/How_to_Avoid_Overfitting 15
Reinforcement Learning
https://youtu.be/0X-NdPtFKq0?t=16m57s 16
Reinforcement Learning
Specially: games of self-play
https://youtu.be/0X-NdPtFKq0?t=16m57s 16
Monte Carlo Tree Search
Tree Search
Optimal value v∗(s) determines the outcome of the game:
Silver et al. 2016 17
Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
Silver et al. 2016 17
Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
Silver et al. 2016 17
Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
Silver et al. 2016 17
Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
Silver et al. 2016 17
Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
b is the games breadth (number of legal moves per position)
Silver et al. 2016 17
Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
b is the games breadth (number of legal moves per position)
d is its depth (game length)
Silver et al. 2016 17
Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150
Allis et al. 1994 18
Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
Allis et al. 1994 18
Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
times more complex than
chess.
https://deepmind.com/alpha-go.html
Allis et al. 1994 18
Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
times more complex than
chess.
https://deepmind.com/alpha-go.html
How to handle the size of the game tree?
Allis et al. 1994 18
Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
times more complex than
chess.
https://deepmind.com/alpha-go.html
How to handle the size of the game tree?
for the breadth: a neural network to select moves
Allis et al. 1994 18
Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
times more complex than
chess.
https://deepmind.com/alpha-go.html
How to handle the size of the game tree?
for the breadth: a neural network to select moves
for the depth: a neural network to evaluate current position
Allis et al. 1994 18
Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
times more complex than
chess.
https://deepmind.com/alpha-go.html
How to handle the size of the game tree?
for the breadth: a neural network to select moves
for the depth: a neural network to evaluate current position
for the tree traverse: Monte Carlo tree search (MCTS)
Allis et al. 1994 18
Monte Carlo tree search
19
Neural networks
Neural Network: Inspiration
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
Neural Network: Inspiration
inspired by the neuronal structure of the mammalian cerebral
cortex
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
Neural Network: Inspiration
inspired by the neuronal structure of the mammalian cerebral
cortex
but on much smaller scales
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
Neural Network: Inspiration
inspired by the neuronal structure of the mammalian cerebral
cortex
but on much smaller scales
suitable to model systems with a high tolerance to error
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
Neural Network: Inspiration
inspired by the neuronal structure of the mammalian cerebral
cortex
but on much smaller scales
suitable to model systems with a high tolerance to error
e.g. audio or image recognition
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
Neural Network: Modes
Dieterle 2003 21
Neural Network: Modes
Two modes
Dieterle 2003 21
Neural Network: Modes
Two modes
feedforward for making predictions
Dieterle 2003 21
Neural Network: Modes
Two modes
feedforward for making predictions
backpropagation for learning
Dieterle 2003 21
Neural Network: an example of feedforward
http://stevenmiller888.github.io/mind-how-to-build-a-neural-network/ 22
Gradient Descent in Neural Networks
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 23
Gradient Descent in Neural Networks
Motto: ”Learn by mistakes!”
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 23
Gradient Descent in Neural Networks
Motto: ”Learn by mistakes!”
However, error functions are not necessarily convex or so “smooth”.
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 23
Deep Neural Network: Inspiration
The hierarchy of concepts is captured in the number of layers (the deep in “deep learning”)
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 24
Deep Neural Network: Inspiration
The hierarchy of concepts is captured in the number of layers (the deep in “deep learning”)
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 24
Convolutional Neural Network
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 25
Rules of Go
Classic games (1/2)
26
Classic games (1/2)
Backgammon: Man vs. Fate
26
Classic games (1/2)
Backgammon: Man vs. Fate
Chess: Man vs. Man
26
Classic games (2/2)
Go: Man vs. Self
Robert ˇS´amal (White) versus Karel Kr´al (Black), Spring School of Combinatorics 2016 27
Rules of Go
28
Rules of Go
Black versus White. Black starts the game.
28
Rules of Go
Black versus White. Black starts the game.
28
Rules of Go
Black versus White. Black starts the game.
the rule of liberty
28
Rules of Go
Black versus White. Black starts the game.
the rule of liberty
the “ko” rule
28
Rules of Go
Black versus White. Black starts the game.
the rule of liberty
the “ko” rule
Handicap for difference in ranks: Black can place 1 or more stones
in advance (compensation for White’s greater strength). 28
Scoring Rules: Area Scoring
https://en.wikipedia.org/wiki/Go_(game) 29
Scoring Rules: Area Scoring
A player’s score is:
the number of stones that the player has on the board
https://en.wikipedia.org/wiki/Go_(game) 29
Scoring Rules: Area Scoring
A player’s score is:
the number of stones that the player has on the board
plus the number of empty intersections surrounded by that
player’s stones
https://en.wikipedia.org/wiki/Go_(game) 29
Scoring Rules: Area Scoring
A player’s score is:
the number of stones that the player has on the board
plus the number of empty intersections surrounded by that
player’s stones
plus komi(dashi) points for the White player
which is a compensation for the first move advantage of the Black player
https://en.wikipedia.org/wiki/Go_(game) 29
Ranks of Players
Kyu and Dan ranks
https://en.wikipedia.org/wiki/Go_(game) 30
Ranks of Players
Kyu and Dan ranks
or alternatively, ELO ratings
https://en.wikipedia.org/wiki/Go_(game) 30
Chocolate micro-break
30
AlphaGo: Inside Out
Policy and Value Networks
Silver et al. 2016 31
Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 32
SL Policy Networks (1/3)
13-layer deep convolutional neural network
Silver et al. 2016 33
SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
Silver et al. 2016 33
SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
Silver et al. 2016 33
SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
Silver et al. 2016 33
SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Silver et al. 2016 33
SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Silver et al. 2016 33
SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
Silver et al. 2016 33
SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
44.4% accuracy (the state-of-the-art from other groups)
Silver et al. 2016 33
SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
44.4% accuracy (the state-of-the-art from other groups)
55.7% accuracy (raw board position + move history as input)
Silver et al. 2016 33
SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
44.4% accuracy (the state-of-the-art from other groups)
55.7% accuracy (raw board position + move history as input)
57.0% accuracy (all input features)
Silver et al. 2016 33
SL Policy Networks (2/3)
Small improvements in accuracy led to large improvements
in playing strength (see the next slide)
Silver et al. 2016 34
SL Policy Networks (3/3)
move probabilities taken directly from the SL policy network pσ (reported as a percentage if above 0.1%).
Silver et al. 2016 35
Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 36
Rollout Policy
Rollout policy pπ(a|s) is faster but less accurate than SL
policy network.
Silver et al. 2016 37
Rollout Policy
Rollout policy pπ(a|s) is faster but less accurate than SL
policy network.
accuracy of 24.2%
Silver et al. 2016 37
Rollout Policy
Rollout policy pπ(a|s) is faster but less accurate than SL
policy network.
accuracy of 24.2%
It takes 2µs to select an action, compared to 3 ms in case
of SL policy network.
Silver et al. 2016 37
Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 38
RL Policy Networks (1/2)
identical in structure to the SL policy network
Silver et al. 2016 39
RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
Silver et al. 2016 39
RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
Silver et al. 2016 39
RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
Silver et al. 2016 39
RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
Silver et al. 2016 39
RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
Silver et al. 2016 39
RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
to prevent overfitting to the current policy
Silver et al. 2016 39
RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
to prevent overfitting to the current policy
stochastic gradient ascent:
∆ρ ∝
∂ log pρ(at|st)
∂ρ
zt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 39
RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
to prevent overfitting to the current policy
stochastic gradient ascent:
∆ρ ∝
∂ log pρ(at|st)
∂ρ
zt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 39
RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
to prevent overfitting to the current policy
stochastic gradient ascent:
∆ρ ∝
∂ log pρ(at|st)
∂ρ
zt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 39
RL Policy Networks (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
Silver et al. 2016 40
RL Policy Networks (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
80% of win rate against the SL policy network
Silver et al. 2016 40
RL Policy Networks (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
80% of win rate against the SL policy network
85% of win rate against the strongest open-source Go
program, Pachi (Baudiˇs and Gailly 2011)
Silver et al. 2016 40
RL Policy Networks (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
80% of win rate against the SL policy network
85% of win rate against the strongest open-source Go
program, Pachi (Baudiˇs and Gailly 2011)
The previous state-of-the-art, based only on SL of CNN:
Silver et al. 2016 40
RL Policy Networks (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
80% of win rate against the SL policy network
85% of win rate against the strongest open-source Go
program, Pachi (Baudiˇs and Gailly 2011)
The previous state-of-the-art, based only on SL of CNN:
Silver et al. 2016 40
RL Policy Networks (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
80% of win rate against the SL policy network
85% of win rate against the strongest open-source Go
program, Pachi (Baudiˇs and Gailly 2011)
The previous state-of-the-art, based only on SL of CNN:
11% of “win” rate against Pachi
Silver et al. 2016 40
Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 41
Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
Silver et al. 2016 42
Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
goal: to estimate a value function
vp
(s) = E[zt|st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy pρ)
Silver et al. 2016 42
Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
goal: to estimate a value function
vp
(s) = E[zt|st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy pρ)
Specifically, vθ(s) ≈ vpρ (s) ≈ v∗(s).
Silver et al. 2016 42
Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
goal: to estimate a value function
vp
(s) = E[zt|st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy pρ)
Specifically, vθ(s) ≈ vpρ (s) ≈ v∗(s).
task of regression
Silver et al. 2016 42
Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
goal: to estimate a value function
vp
(s) = E[zt|st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy pρ)
Specifically, vθ(s) ≈ vpρ (s) ≈ v∗(s).
task of regression
stochastic gradient descent:
∆θ ∝
∂vθ(s)
∂θ
(z − vθ(s))
(to minimize the mean squared error (MSE) between the predicted vθ(s) and the true z)
Silver et al. 2016 42
Value Network (2/2)
Beware of overfitting!
Silver et al. 2016 43
Value Network (2/2)
Beware of overfitting!
Successive positions are strongly correlated.
Silver et al. 2016 43
Value Network (2/2)
Beware of overfitting!
Successive positions are strongly correlated.
Value network memorized the game outcomes, rather than
generalizing to new positions.
Silver et al. 2016 43
Value Network (2/2)
Beware of overfitting!
Successive positions are strongly correlated.
Value network memorized the game outcomes, rather than
generalizing to new positions.
Solution: generate 30 million (new) positions, each sampled
from a seperate game
Silver et al. 2016 43
Value Network (2/2)
Beware of overfitting!
Successive positions are strongly correlated.
Value network memorized the game outcomes, rather than
generalizing to new positions.
Solution: generate 30 million (new) positions, each sampled
from a seperate game
almost the accuracy of Monte Carlo rollouts (using pρ), but
15000 times less computation!
Silver et al. 2016 43
Selection of Moves by the Value Network
evaluation of all successors s of the root position s, using vθ(s)
Silver et al. 2016 44
Evaluation accuracy in various stages of a game
Move number is the number of moves that had been played in the given position.
Silver et al. 2016 45
Evaluation accuracy in various stages of a game
Move number is the number of moves that had been played in the given position.
Each position evaluated by:
forward pass of the value network vθ
Silver et al. 2016 45
Evaluation accuracy in various stages of a game
Move number is the number of moves that had been played in the given position.
Each position evaluated by:
forward pass of the value network vθ
100 rollouts, played out using the corresponding policy
Silver et al. 2016 45
Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 46
ELO Ratings for Various Combinations of Networks
Silver et al. 2016 47
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
Silver et al. 2016 48
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
Silver et al. 2016 48
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
Silver et al. 2016 48
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
Silver et al. 2016 48
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)
Silver et al. 2016 48
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)
Silver et al. 2016 48
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)
Each edge (s, a) keeps:
action value Q(s, a)
Silver et al. 2016 48
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)
Each edge (s, a) keeps:
action value Q(s, a)
visit count N(s, a)
Silver et al. 2016 48
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)
Each edge (s, a) keeps:
action value Q(s, a)
visit count N(s, a)
prior probability P(s, a) (from SL policy network pσ)
Silver et al. 2016 48
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)
Each edge (s, a) keeps:
action value Q(s, a)
visit count N(s, a)
prior probability P(s, a) (from SL policy network pσ)
Silver et al. 2016 48
MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)
Each edge (s, a) keeps:
action value Q(s, a)
visit count N(s, a)
prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.
Silver et al. 2016 48
MCTS Algorithm: Selection
Silver et al. 2016 49
MCTS Algorithm: Selection
At each time step t, an action at is selected from state st
at = arg max
a
(Q(st , a) + u(st , a))
Silver et al. 2016 49
MCTS Algorithm: Selection
At each time step t, an action at is selected from state st
at = arg max
a
(Q(st , a) + u(st , a))
where bonus
u(st , a) ∝
P(s, a)
1 + N(s, a)
Silver et al. 2016 49
MCTS Algorithm: Expansion
Silver et al. 2016 50
MCTS Algorithm: Expansion
A leaf position may be expanded (just once) by the SL policy network pσ.
Silver et al. 2016 50
MCTS Algorithm: Expansion
A leaf position may be expanded (just once) by the SL policy network pσ.
The output probabilities are stored as priors P(s, a) := pσ(a|s).
Silver et al. 2016 50
MCTS: Evaluation
Silver et al. 2016 51
MCTS: Evaluation
evaluation from the value network vθ(s)
Silver et al. 2016 51
MCTS: Evaluation
evaluation from the value network vθ(s)
evaluation by the outcome z using the fast rollout policy pπ until the end of game
Silver et al. 2016 51
MCTS: Evaluation
evaluation from the value network vθ(s)
evaluation by the outcome z using the fast rollout policy pπ until the end of game
Silver et al. 2016 51
MCTS: Evaluation
evaluation from the value network vθ(s)
evaluation by the outcome z using the fast rollout policy pπ until the end of game
Using a mixing parameter λ, the final leaf evaluation V (s) is
V (s) = (1 − λ)vθ(s) + λz
Silver et al. 2016 51
Tree Evaluation from Value Network
action values Q(s, a) for each tree-edge (s, a) from root position s (averaged over value network evaluations only)
Silver et al. 2016 52
Tree Evaluation from Rollouts
action values Q(s, a), averaged over rollout evaluations only
Silver et al. 2016 53
MCTS: Backup
At the end of simulation, each traversed edge is updated by accumulating:
the action values Q
Silver et al. 2016 54
MCTS: Backup
At the end of simulation, each traversed edge is updated by accumulating:
the action values Q
visit counts N
Silver et al. 2016 54
Once the search is complete, the algorithm
chooses the most visited move from the root
position.
Silver et al. 2016 54
Percentage of Simulations
percentage frequency with which actions were selected from the root during simulations
Silver et al. 2016 55
Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
Silver et al. 2016 56
Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
AlphaGo selected the move indicated by the red circle;
Silver et al. 2016 56
Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
AlphaGo selected the move indicated by the red circle;
Fan Hui responded with the move indicated by the white square;
Silver et al. 2016 56
Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
AlphaGo selected the move indicated by the red circle;
Fan Hui responded with the move indicated by the white square;
in his post-game commentary, he preferred the move (labelled 1) predicted by AlphaGo.
Silver et al. 2016 56
Scalability
asynchronous multi-threaded search
Silver et al. 2016 57
Scalability
asynchronous multi-threaded search
simulations on CPUs
Silver et al. 2016 57
Scalability
asynchronous multi-threaded search
simulations on CPUs
computation of neural networks on GPUs
Silver et al. 2016 57
Scalability
asynchronous multi-threaded search
simulations on CPUs
computation of neural networks on GPUs
Silver et al. 2016 57
Scalability
asynchronous multi-threaded search
simulations on CPUs
computation of neural networks on GPUs
AlphaGo:
40 search threads
Silver et al. 2016 57
Scalability
asynchronous multi-threaded search
simulations on CPUs
computation of neural networks on GPUs
AlphaGo:
40 search threads
40 CPUs
Silver et al. 2016 57
Scalability
asynchronous multi-threaded search
simulations on CPUs
computation of neural networks on GPUs
AlphaGo:
40 search threads
40 CPUs
8 GPUs
Silver et al. 2016 57
Scalability
asynchronous multi-threaded search
simulations on CPUs
computation of neural networks on GPUs
AlphaGo:
40 search threads
40 CPUs
8 GPUs
Silver et al. 2016 57
Scalability
asynchronous multi-threaded search
simulations on CPUs
computation of neural networks on GPUs
AlphaGo:
40 search threads
40 CPUs
8 GPUs
Distributed version of AlphaGo (on multiple machines):
40 search threads
Silver et al. 2016 57
Scalability
asynchronous multi-threaded search
simulations on CPUs
computation of neural networks on GPUs
AlphaGo:
40 search threads
40 CPUs
8 GPUs
Distributed version of AlphaGo (on multiple machines):
40 search threads
1202 CPUs
Silver et al. 2016 57
Scalability
asynchronous multi-threaded search
simulations on CPUs
computation of neural networks on GPUs
AlphaGo:
40 search threads
40 CPUs
8 GPUs
Distributed version of AlphaGo (on multiple machines):
40 search threads
1202 CPUs
176 GPUs
Silver et al. 2016 57
ELO Ratings for Various Combinations of Threads
Silver et al. 2016 58
Results: the strength of AlphaGo
Tournament with Other Go Programs
Silver et al. 2016 59
Fan Hui
https://en.wikipedia.org/wiki/Fan_Hui 60
Fan Hui
professional 2 dan
https://en.wikipedia.org/wiki/Fan_Hui 60
Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
https://en.wikipedia.org/wiki/Fan_Hui 60
Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
European Professional Go Champion in 2016
https://en.wikipedia.org/wiki/Fan_Hui 60
Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
European Professional Go Champion in 2016
biological neural network:
https://en.wikipedia.org/wiki/Fan_Hui 60
Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
European Professional Go Champion in 2016
biological neural network:
100 billion neurons
https://en.wikipedia.org/wiki/Fan_Hui 60
Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
European Professional Go Champion in 2016
biological neural network:
100 billion neurons
100 up to 1,000 trillion neuronal connections
https://en.wikipedia.org/wiki/Fan_Hui 60
AlphaGo versus Fan Hui
61
AlphaGo versus Fan Hui
AlphaGo won 5 - 0 in a formal match on October 2015.
61
AlphaGo versus Fan Hui
AlphaGo won 5 - 0 in a formal match on October 2015.
[AlphaGo] is very strong and stable, it seems
like a wall. ... I know AlphaGo is a computer,
but if no one told me, maybe I would think
the player was a little strange, but a very
strong player, a real person.
Fan Hui 61
Lee Sedol “The Strong Stone”
https://en.wikipedia.org/wiki/Lee_Sedol 62
Lee Sedol “The Strong Stone”
professional 9 dan
https://en.wikipedia.org/wiki/Lee_Sedol 62
Lee Sedol “The Strong Stone”
professional 9 dan
the 2nd in international titles
https://en.wikipedia.org/wiki/Lee_Sedol 62
Lee Sedol “The Strong Stone”
professional 9 dan
the 2nd in international titles
the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
https://en.wikipedia.org/wiki/Lee_Sedol 62
Lee Sedol “The Strong Stone”
professional 9 dan
the 2nd in international titles
the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
Lee Sedol would win 97 out of 100 games against Fan Hui.
https://en.wikipedia.org/wiki/Lee_Sedol 62
Lee Sedol “The Strong Stone”
professional 9 dan
the 2nd in international titles
the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
Lee Sedol would win 97 out of 100 games against Fan Hui.
biological neural network, comparable to Fan Hui’s (in number
of neurons and connections)
https://en.wikipedia.org/wiki/Lee_Sedol 62
I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
62
I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
...even beating AlphaGo by 4-1 may allow
the Google DeepMind team to claim its de
facto victory and the defeat of him
[Lee Sedol], or even humankind.
interview in JTBC
Newsroom
62
I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
...even beating AlphaGo by 4-1 may allow
the Google DeepMind team to claim its de
facto victory and the defeat of him
[Lee Sedol], or even humankind.
interview in JTBC
Newsroom
62
AlphaGo versus Lee Sedol
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
Since AlphaGo won, Google DeepMind stated that the prize will be
donated to charities, including UNICEF, and Go organisations.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
Since AlphaGo won, Google DeepMind stated that the prize will be
donated to charities, including UNICEF, and Go organisations.
Lee received $170,000 ($150,000 for participating in all the five
games, and an additional $20,000 for each game won).
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
Conclusion
Difficulties of Go
challenging decision-making
Silver et al. 2016 64
Difficulties of Go
challenging decision-making
intractable search space
Silver et al. 2016 64
Difficulties of Go
challenging decision-making
intractable search space
complex optimal solution
It appears infeasible to directly approximate using a policy or value function!
Silver et al. 2016 64
AlphaGo: summary
Monte Carlo tree search
Silver et al. 2016 65
AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
Silver et al. 2016 65
AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
Silver et al. 2016 65
AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
Silver et al. 2016 65
AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
Silver et al. 2016 65
AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Silver et al. 2016 65
AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
Silver et al. 2016 65
AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
scalable implementation
Silver et al. 2016 65
AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
scalable implementation
multi-threaded simulations on CPUs
Silver et al. 2016 65
AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
scalable implementation
multi-threaded simulations on CPUs
parallel GPU computations
Silver et al. 2016 65
AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
scalable implementation
multi-threaded simulations on CPUs
parallel GPU computations
distributed version over multiple machines
Silver et al. 2016 65
Novel approach
Silver et al. 2016 66
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.
Silver et al. 2016 66
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
Silver et al. 2016 66
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Silver et al. 2016 66
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Silver et al. 2016 66
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
Silver et al. 2016 66
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
Silver et al. 2016 66
Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
This approach is not specific to the game of Go. The algorithm
can be used for much wider class of (so far seemingly)
intractable problems in AI!
Silver et al. 2016 66
Thank you!
Questions?
66
Backup slides
Input features for rollout and tree policy
Silver et al. 2016
Results of a tournament between different Go programs
Silver et al. 2016
Results of a tournament between AlphaGo and distributed Al-
phaGo, testing scalability with hardware
Silver et al. 2016
AlphaGo versus Fan Hui: Game 1
Silver et al. 2016
AlphaGo versus Fan Hui: Game 2
Silver et al. 2016
AlphaGo versus Fan Hui: Game 3
Silver et al. 2016
AlphaGo versus Fan Hui: Game 4
Silver et al. 2016
AlphaGo versus Fan Hui: Game 5
Silver et al. 2016
AlphaGo versus Lee Sedol: Game 1
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
AlphaGo versus Lee Sedol: Game 2 (1/2)
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
AlphaGo versus Lee Sedol: Game 2 (2/2)
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
AlphaGo versus Lee Sedol: Game 3
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
AlphaGo versus Lee Sedol: Game 4
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
AlphaGo versus Lee Sedol: Game 5 (1/2)
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
AlphaGo versus Lee Sedol: Game 5 (2/2)
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
Further Reading I
AlphaGo:
Google Research Blog
http://googleresearch.blogspot.cz/2016/01/alphago-mastering-ancient-game-of-go.html
an article in Nature
http://www.nature.com/news/google-ai-algorithm-masters-ancient-game-of-go-1.19234
a reddit article claiming that AlphaGo is even stronger than it appears to be:
“AlphaGo would rather win by less points, but with higher probability.”
https://www.reddit.com/r/baduk/comments/49y17z/the_true_strength_of_alphago/
Articles by Google DeepMind:
Atari player: a DeepRL system which combines Deep Neural Networks with Reinforcement Learning (Mnih
et al. 2015)
Neural Turing Machines (Graves, Wayne, and Danihelka 2014)
Artificial Intelligence:
Artificial Intelligence course at MIT
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/
6-034-artificial-intelligence-fall-2010/index.htm
Introduction to Artificial Intelligence at Udacity
https://www.udacity.com/course/intro-to-artificial-intelligence--cs271
Further Reading II
General Game Playing course https://www.coursera.org/course/ggp
Singularity http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html + Part 2
The Singularity Is Near (Kurzweil 2005)
Combinatorial Game Theory (founded by John H. Conway to study endgames in Go):
Combinatorial Game Theory course https://www.coursera.org/learn/combinatorial-game-theory
On Numbers and Games (Conway 1976)
Machine Learning:
Machine Learning course
https://youtu.be/hPKJBXkyTK://www.coursera.org/learn/machine-learning/
Reinforcement Learning http://reinforcementlearning.ai-depot.com/
Deep Learning (LeCun, Bengio, and Hinton 2015)
Deep Learning course https://www.udacity.com/course/deep-learning--ud730
Two Minute Papers https://www.youtube.com/user/keeroyz
Applications of Deep Learning https://youtu.be/hPKJBXkyTKM
Neuroscience:
http://www.brainfacts.org/
References I
Allis, Louis Victor et al. (1994). Searching for solutions in games and artificial intelligence. Ponsen & Looijen.
Baudiˇs, Petr and Jean-loup Gailly (2011). “Pachi: State of the art open source Go program”. In: Advances in
Computer Games. Springer, pp. 24–38.
Bowling, Michael et al. (2015). “Heads-up limit holdem poker is solved”. In: Science 347.6218, pp. 145–149. url:
http://poker.cs.ualberta.ca/15science.html.
Conway, John Horton (1976). “On Numbers and Games”. In: London Mathematical Society Monographs 6.
Corrado, Greg (2015). Computer, respond to this email. url:
http://googleresearch.blogspot.cz/2015/11/computer-respond-to-this-email.html#1 (visited on
03/31/2016).
Dieterle, Frank Jochen (2003). “Multianalyte quantifications by means of integration of artificial neural networks,
genetic algorithms and chemometrics for time-resolved analytical data”. PhD thesis. Universit¨at T¨ubingen.
Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge (2015). “A Neural Algorithm of Artistic Style”. In:
CoRR abs/1508.06576. url: http://arxiv.org/abs/1508.06576.
Graves, Alex, Greg Wayne, and Ivo Danihelka (2014). “Neural turing machines”. In: arXiv preprint
arXiv:1410.5401.
Hayes, Bradley (2016). url: https://twitter.com/deepdrumpf.
References II
Karpathy, Andrej (2015). The Unreasonable Effectiveness of Recurrent Neural Networks. url:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/ (visited on 04/01/2016).
Kurzweil, Ray (2005). The singularity is near: When humans transcend biology. Penguin.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton (2015). “Deep learning”. In: Nature 521.7553, pp. 436–444.
Li, Chuan and Michael Wand (2016). “Combining Markov Random Fields and Convolutional Neural Networks for
Image Synthesis”. In: CoRR abs/1601.04589. url: http://arxiv.org/abs/1601.04589.
Mnih, Volodymyr et al. (2015). “Human-level control through deep reinforcement learning”. In: Nature 518.7540,
pp. 529–533. url:
https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf.
Munroe, Randall. Game AIs. url: https://xkcd.com/1002/ (visited on 04/02/2016).
Silver, David et al. (2016). “Mastering the game of Go with deep neural networks and tree search”. In: Nature
529.7587, pp. 484–489.
Sun, Felix. DeepHear - Composing and harmonizing music with neural networks. url:
http://web.mit.edu/felixsun/www/neural-music.html (visited on 04/02/2016).

More Related Content

What's hot

Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago Zero
Chia-Ching Lin
 
Agile Use Cases: Balancing Utility with Simplicity - May 2009
Agile Use Cases: Balancing Utility with Simplicity - May 2009Agile Use Cases: Balancing Utility with Simplicity - May 2009
Agile Use Cases: Balancing Utility with Simplicity - May 2009
IIBA Rochester NY
 
Ibm's deep blue chess grandmaster chips
Ibm's deep blue chess grandmaster chipsIbm's deep blue chess grandmaster chips
Ibm's deep blue chess grandmaster chips
Nadeeshaan Gunasinghe
 
(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)
MeetupDataScienceRoma
 
The Slashdot Zoo: Mining a Social Network with Negative Edges
The Slashdot Zoo:  Mining a Social Network with Negative EdgesThe Slashdot Zoo:  Mining a Social Network with Negative Edges
The Slashdot Zoo: Mining a Social Network with Negative Edges
Jérôme KUNEGIS
 
A Deep Journey into Playing Games with Reinforcement Learning - Kim Hammar
A Deep Journey into Playing Games with Reinforcement Learning - Kim HammarA Deep Journey into Playing Games with Reinforcement Learning - Kim Hammar
A Deep Journey into Playing Games with Reinforcement Learning - Kim Hammar
Kim Hammar
 
Digital Marketing
Digital MarketingDigital Marketing
Digital Marketing
Desh Ratan
 
ZaCon 4 (2012) - Game Hacking
ZaCon 4 (2012) - Game HackingZaCon 4 (2012) - Game Hacking
ZaCon 4 (2012) - Game Hacking
HypnZA
 

What's hot (8)

Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago Zero
 
Agile Use Cases: Balancing Utility with Simplicity - May 2009
Agile Use Cases: Balancing Utility with Simplicity - May 2009Agile Use Cases: Balancing Utility with Simplicity - May 2009
Agile Use Cases: Balancing Utility with Simplicity - May 2009
 
Ibm's deep blue chess grandmaster chips
Ibm's deep blue chess grandmaster chipsIbm's deep blue chess grandmaster chips
Ibm's deep blue chess grandmaster chips
 
(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)
 
The Slashdot Zoo: Mining a Social Network with Negative Edges
The Slashdot Zoo:  Mining a Social Network with Negative EdgesThe Slashdot Zoo:  Mining a Social Network with Negative Edges
The Slashdot Zoo: Mining a Social Network with Negative Edges
 
A Deep Journey into Playing Games with Reinforcement Learning - Kim Hammar
A Deep Journey into Playing Games with Reinforcement Learning - Kim HammarA Deep Journey into Playing Games with Reinforcement Learning - Kim Hammar
A Deep Journey into Playing Games with Reinforcement Learning - Kim Hammar
 
Digital Marketing
Digital MarketingDigital Marketing
Digital Marketing
 
ZaCon 4 (2012) - Game Hacking
ZaCon 4 (2012) - Game HackingZaCon 4 (2012) - Game Hacking
ZaCon 4 (2012) - Game Hacking
 

Viewers also liked

AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)
AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)
AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)
Akihiro HATANAKA
 
How AlphaGo Works
How AlphaGo WorksHow AlphaGo Works
How AlphaGo Works
Shane (Seungwhan) Moon
 
Solving Endgames in Large Imperfect-Information Games such as Poker
Solving Endgames in Large Imperfect-Information Games such as PokerSolving Endgames in Large Imperfect-Information Games such as Poker
Solving Endgames in Large Imperfect-Information Games such as Poker
Karel Ha
 
Algorithmic Game Theory
Algorithmic Game TheoryAlgorithmic Game Theory
Algorithmic Game Theory
Karel Ha
 
AlphaGo in Depth
AlphaGo in Depth AlphaGo in Depth
AlphaGo in Depth
Mark Chang
 
AlphaGoのしくみ
AlphaGoのしくみAlphaGoのしくみ
AlphaGoのしくみ
Hiroyuki Yoshida
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
Mark Chang
 
Alphago Tech talk
Alphago Tech talkAlphago Tech talk
Alphago Tech talk
Javier Magana
 
Separation Axioms
Separation AxiomsSeparation Axioms
Separation Axioms
Karel Ha
 
Tape Storage and CRC Protection
Tape Storage and CRC ProtectionTape Storage and CRC Protection
Tape Storage and CRC Protection
Karel Ha
 
Oddělovací axiomy v bezbodové topologii
Oddělovací axiomy v bezbodové topologiiOddělovací axiomy v bezbodové topologii
Oddělovací axiomy v bezbodové topologii
Karel Ha
 
Question Answering with Subgraph Embeddings
Question Answering with Subgraph EmbeddingsQuestion Answering with Subgraph Embeddings
Question Answering with Subgraph Embeddings
Karel Ha
 
Onderzoekstages CERN
Onderzoekstages CERNOnderzoekstages CERN
Summer @CERN
Summer @CERNSummer @CERN
Summer @CERN
Karel Ha
 
Summer Student Programme
Summer Student ProgrammeSummer Student Programme
Summer Student Programme
Karel Ha
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
Tobias Pfeiffer
 
"Monte-Carlo Tree Search for the game of Go"
"Monte-Carlo Tree Search for the game of Go""Monte-Carlo Tree Search for the game of Go"
"Monte-Carlo Tree Search for the game of Go"
BigMC
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
Sri Ambati
 
Openstack에 컨트리뷰션 해보기
Openstack에 컨트리뷰션 해보기Openstack에 컨트리뷰션 해보기
Openstack에 컨트리뷰션 해보기
영우 김
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
Greg Makowski
 

Viewers also liked (20)

AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)
AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)
AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)
 
How AlphaGo Works
How AlphaGo WorksHow AlphaGo Works
How AlphaGo Works
 
Solving Endgames in Large Imperfect-Information Games such as Poker
Solving Endgames in Large Imperfect-Information Games such as PokerSolving Endgames in Large Imperfect-Information Games such as Poker
Solving Endgames in Large Imperfect-Information Games such as Poker
 
Algorithmic Game Theory
Algorithmic Game TheoryAlgorithmic Game Theory
Algorithmic Game Theory
 
AlphaGo in Depth
AlphaGo in Depth AlphaGo in Depth
AlphaGo in Depth
 
AlphaGoのしくみ
AlphaGoのしくみAlphaGoのしくみ
AlphaGoのしくみ
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Alphago Tech talk
Alphago Tech talkAlphago Tech talk
Alphago Tech talk
 
Separation Axioms
Separation AxiomsSeparation Axioms
Separation Axioms
 
Tape Storage and CRC Protection
Tape Storage and CRC ProtectionTape Storage and CRC Protection
Tape Storage and CRC Protection
 
Oddělovací axiomy v bezbodové topologii
Oddělovací axiomy v bezbodové topologiiOddělovací axiomy v bezbodové topologii
Oddělovací axiomy v bezbodové topologii
 
Question Answering with Subgraph Embeddings
Question Answering with Subgraph EmbeddingsQuestion Answering with Subgraph Embeddings
Question Answering with Subgraph Embeddings
 
Onderzoekstages CERN
Onderzoekstages CERNOnderzoekstages CERN
Onderzoekstages CERN
 
Summer @CERN
Summer @CERNSummer @CERN
Summer @CERN
 
Summer Student Programme
Summer Student ProgrammeSummer Student Programme
Summer Student Programme
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
"Monte-Carlo Tree Search for the game of Go"
"Monte-Carlo Tree Search for the game of Go""Monte-Carlo Tree Search for the game of Go"
"Monte-Carlo Tree Search for the game of Go"
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
 
Openstack에 컨트리뷰션 해보기
Openstack에 컨트리뷰션 해보기Openstack에 컨트리뷰션 해보기
Openstack에 컨트리뷰션 해보기
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 

Similar to Mastering the game of Go with deep neural networks and tree search: Presentation

Adam Streck - Reinforcement Learning in Unity. Teach Your Monsters - Codemoti...
Adam Streck - Reinforcement Learning in Unity. Teach Your Monsters - Codemoti...Adam Streck - Reinforcement Learning in Unity. Teach Your Monsters - Codemoti...
Adam Streck - Reinforcement Learning in Unity. Teach Your Monsters - Codemoti...
Codemotion
 
Adam Streck - Reinforcement Learning in Unity - Teach Your Monsters - Codemot...
Adam Streck - Reinforcement Learning in Unity - Teach Your Monsters - Codemot...Adam Streck - Reinforcement Learning in Unity - Teach Your Monsters - Codemot...
Adam Streck - Reinforcement Learning in Unity - Teach Your Monsters - Codemot...
Codemotion
 
Adversarial search
Adversarial searchAdversarial search
Adversarial search
Dheerendra k
 
Showcase of My Research on Games & AI "till the end of Oct. 2014"
Showcase of My Research on Games & AI "till the end of Oct. 2014"Showcase of My Research on Games & AI "till the end of Oct. 2014"
Showcase of My Research on Games & AI "till the end of Oct. 2014"
Mohammad Shaker
 
Как мы сделали многопользовательскую браузерную игру для HL++ с воксельной гр...
Как мы сделали многопользовательскую браузерную игру для HL++ с воксельной гр...Как мы сделали многопользовательскую браузерную игру для HL++ с воксельной гр...
Как мы сделали многопользовательскую браузерную игру для HL++ с воксельной гр...
Ontico
 
20151020 Metis
20151020 Metis20151020 Metis
20151020 Metis
Dean Malmgren
 
How games are driving advances in AI research- Unite Copenhagen 2019
How games are driving advances in AI research- Unite Copenhagen 2019 How games are driving advances in AI research- Unite Copenhagen 2019
How games are driving advances in AI research- Unite Copenhagen 2019
Unity Technologies
 
Big data and AI presentation slides
Big data and AI presentation slidesBig data and AI presentation slides
Big data and AI presentation slides
CloudxLab
 
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB
 
Session 5 coding handson Tensorflow
Session 5 coding handson Tensorflow Session 5 coding handson Tensorflow
Session 5 coding handson Tensorflow
Rajagopal A
 
Nela08
Nela08Nela08
Devoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingDevoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game Playing
Richard Abbuhl
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
Thomas da Silva Paula
 
Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges
Yuandong Tian at AI Frontiers: AI in Games: Achievements and ChallengesYuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges
Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges
AI Frontiers
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game Playing
Richard Abbuhl
 
Last.fm API workshop - Stockholm
Last.fm API workshop - StockholmLast.fm API workshop - Stockholm
Last.fm API workshop - Stockholm
Matthew Ogle
 
Infoyage 2014 Senior Quiz Finals
Infoyage 2014 Senior Quiz FinalsInfoyage 2014 Senior Quiz Finals
Infoyage 2014 Senior Quiz Finals
In-X
 
Why AI is shaping our games
Why AI is shaping our gamesWhy AI is shaping our games
Why AI is shaping our games
Förderverein Technische Fakultät
 
Using (Free!) App Annie data to optimize your next game
Using (Free!) App Annie data to optimize your next gameUsing (Free!) App Annie data to optimize your next game
Using (Free!) App Annie data to optimize your next game
Eric Seufert
 
Using (Free!) AppAnnie Data to Optimize Your Next Game | Eric Seufert
Using (Free!) AppAnnie Data to Optimize Your Next Game | Eric SeufertUsing (Free!) AppAnnie Data to Optimize Your Next Game | Eric Seufert
Using (Free!) AppAnnie Data to Optimize Your Next Game | Eric Seufert
Jessica Tams
 

Similar to Mastering the game of Go with deep neural networks and tree search: Presentation (20)

Adam Streck - Reinforcement Learning in Unity. Teach Your Monsters - Codemoti...
Adam Streck - Reinforcement Learning in Unity. Teach Your Monsters - Codemoti...Adam Streck - Reinforcement Learning in Unity. Teach Your Monsters - Codemoti...
Adam Streck - Reinforcement Learning in Unity. Teach Your Monsters - Codemoti...
 
Adam Streck - Reinforcement Learning in Unity - Teach Your Monsters - Codemot...
Adam Streck - Reinforcement Learning in Unity - Teach Your Monsters - Codemot...Adam Streck - Reinforcement Learning in Unity - Teach Your Monsters - Codemot...
Adam Streck - Reinforcement Learning in Unity - Teach Your Monsters - Codemot...
 
Adversarial search
Adversarial searchAdversarial search
Adversarial search
 
Showcase of My Research on Games & AI "till the end of Oct. 2014"
Showcase of My Research on Games & AI "till the end of Oct. 2014"Showcase of My Research on Games & AI "till the end of Oct. 2014"
Showcase of My Research on Games & AI "till the end of Oct. 2014"
 
Как мы сделали многопользовательскую браузерную игру для HL++ с воксельной гр...
Как мы сделали многопользовательскую браузерную игру для HL++ с воксельной гр...Как мы сделали многопользовательскую браузерную игру для HL++ с воксельной гр...
Как мы сделали многопользовательскую браузерную игру для HL++ с воксельной гр...
 
20151020 Metis
20151020 Metis20151020 Metis
20151020 Metis
 
How games are driving advances in AI research- Unite Copenhagen 2019
How games are driving advances in AI research- Unite Copenhagen 2019 How games are driving advances in AI research- Unite Copenhagen 2019
How games are driving advances in AI research- Unite Copenhagen 2019
 
Big data and AI presentation slides
Big data and AI presentation slidesBig data and AI presentation slides
Big data and AI presentation slides
 
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
 
Session 5 coding handson Tensorflow
Session 5 coding handson Tensorflow Session 5 coding handson Tensorflow
Session 5 coding handson Tensorflow
 
Nela08
Nela08Nela08
Nela08
 
Devoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingDevoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game Playing
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
 
Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges
Yuandong Tian at AI Frontiers: AI in Games: Achievements and ChallengesYuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges
Yuandong Tian at AI Frontiers: AI in Games: Achievements and Challenges
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game Playing
 
Last.fm API workshop - Stockholm
Last.fm API workshop - StockholmLast.fm API workshop - Stockholm
Last.fm API workshop - Stockholm
 
Infoyage 2014 Senior Quiz Finals
Infoyage 2014 Senior Quiz FinalsInfoyage 2014 Senior Quiz Finals
Infoyage 2014 Senior Quiz Finals
 
Why AI is shaping our games
Why AI is shaping our gamesWhy AI is shaping our games
Why AI is shaping our games
 
Using (Free!) App Annie data to optimize your next game
Using (Free!) App Annie data to optimize your next gameUsing (Free!) App Annie data to optimize your next game
Using (Free!) App Annie data to optimize your next game
 
Using (Free!) AppAnnie Data to Optimize Your Next Game | Eric Seufert
Using (Free!) AppAnnie Data to Optimize Your Next Game | Eric SeufertUsing (Free!) AppAnnie Data to Optimize Your Next Game | Eric Seufert
Using (Free!) AppAnnie Data to Optimize Your Next Game | Eric Seufert
 

More from Karel Ha

transcript-master-studies-Karel-Ha
transcript-master-studies-Karel-Hatranscript-master-studies-Karel-Ha
transcript-master-studies-Karel-Ha
Karel Ha
 
Schrodinger poster 2020
Schrodinger poster 2020Schrodinger poster 2020
Schrodinger poster 2020
Karel Ha
 
CapsuleGAN: Generative Adversarial Capsule Network
CapsuleGAN: Generative Adversarial Capsule NetworkCapsuleGAN: Generative Adversarial Capsule Network
CapsuleGAN: Generative Adversarial Capsule Network
Karel Ha
 
Dynamic Routing Between Capsules
Dynamic Routing Between CapsulesDynamic Routing Between Capsules
Dynamic Routing Between Capsules
Karel Ha
 
transcript-bachelor-studies-Karel-Ha
transcript-bachelor-studies-Karel-Hatranscript-bachelor-studies-Karel-Ha
transcript-bachelor-studies-Karel-HaKarel Ha
 
Real-time applications on IntelXeon/Phi
Real-time applications on IntelXeon/PhiReal-time applications on IntelXeon/Phi
Real-time applications on IntelXeon/Phi
Karel Ha
 
HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015
Karel Ha
 

More from Karel Ha (7)

transcript-master-studies-Karel-Ha
transcript-master-studies-Karel-Hatranscript-master-studies-Karel-Ha
transcript-master-studies-Karel-Ha
 
Schrodinger poster 2020
Schrodinger poster 2020Schrodinger poster 2020
Schrodinger poster 2020
 
CapsuleGAN: Generative Adversarial Capsule Network
CapsuleGAN: Generative Adversarial Capsule NetworkCapsuleGAN: Generative Adversarial Capsule Network
CapsuleGAN: Generative Adversarial Capsule Network
 
Dynamic Routing Between Capsules
Dynamic Routing Between CapsulesDynamic Routing Between Capsules
Dynamic Routing Between Capsules
 
transcript-bachelor-studies-Karel-Ha
transcript-bachelor-studies-Karel-Hatranscript-bachelor-studies-Karel-Ha
transcript-bachelor-studies-Karel-Ha
 
Real-time applications on IntelXeon/Phi
Real-time applications on IntelXeon/PhiReal-time applications on IntelXeon/Phi
Real-time applications on IntelXeon/Phi
 
HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015
 

Recently uploaded

Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
SSR02
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
Renu Jangid
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 

Recently uploaded (20)

Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Nucleophilic Addition of carbonyl compounds.pptx
Nucleophilic Addition of carbonyl  compounds.pptxNucleophilic Addition of carbonyl  compounds.pptx
Nucleophilic Addition of carbonyl compounds.pptx
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 

Mastering the game of Go with deep neural networks and tree search: Presentation

  • 1. Mastering the game of Go with deep neural networks and tree search Karel Ha article by Google DeepMind Spring School of Combinatorics 2016
  • 4. Applications of AI spam filters recommender systems (Netflix, YouTube) 1
  • 5. Applications of AI spam filters recommender systems (Netflix, YouTube) predictive text (Swiftkey) 1
  • 6. Applications of AI spam filters recommender systems (Netflix, YouTube) predictive text (Swiftkey) audio recognition (Shazam, SoundHound) 1
  • 7. Applications of AI spam filters recommender systems (Netflix, YouTube) predictive text (Swiftkey) audio recognition (Shazam, SoundHound) music generation (DeepHear - Composing and harmonizing music with neural networks) 1
  • 8. Applications of AI spam filters recommender systems (Netflix, YouTube) predictive text (Swiftkey) audio recognition (Shazam, SoundHound) music generation (DeepHear - Composing and harmonizing music with neural networks) self-driving cars 1
  • 9. Auto Reply Feature of Google Inbox Corrado 2015 2
  • 10. Artistic-style Painting [1] Gatys, Ecker, and Bethge 2015 [2] Li and Wand 2016 3
  • 11. Artistic-style Painting [1] Gatys, Ecker, and Bethge 2015 [2] Li and Wand 2016 3
  • 12. Baby Names Generated Character by Character Baby Killiel Saddie Char Ahbort With Karpathy 2015 4
  • 13. Baby Names Generated Character by Character Baby Killiel Saddie Char Ahbort With Rudi Levette Berice Lussa Hany Mareanne Chrestina Carissy Karpathy 2015 4
  • 14. Baby Names Generated Character by Character Baby Killiel Saddie Char Ahbort With Rudi Levette Berice Lussa Hany Mareanne Chrestina Carissy Marylen Hammine Janye Marlise Jacacrie Hendred Romand Charienna Nenotto Ette Dorane Wallen Marly Darine Salina Elvyn Ersia Maralena Minoria Ellia Charmin Antley Nerille Chelon Walmor Evena Jeryly Stachon Charisa Allisa Anatha Cathanie Geetra Alexie Jerin Cassen Herbett Cossie Velen Daurenge Robester Shermond Terisa Licia Roselen Ferine Jayn Lusine Charyanne Sales Sanny Resa Wallon Martine Merus Jelen Candica Wallin Tel Rachene Tarine Ozila Ketia Shanne Arnande Karella Roselina Alessia Chasty Deland Berther Geamar Jackein Mellisand Sagdy Nenc Lessie Rasemy Guen Karpathy 2015 4
  • 15. C code Generated Character by Character Karpathy 2015 5
  • 16. Algebraic Geometry Generated Character by Character Karpathy 2015 6
  • 18. DeepDrumpf https://twitter.com/deepdrumpf = a Twitter bot that has learned the language of Donald Trump from his speeches Hayes 2016 7
  • 19. Atari Player by Google DeepMind https://youtu.be/0X-NdPtFKq0?t=21m13s Mnih et al. 2015 8
  • 20. 8
  • 21. Heads-up Limit Holdem Poker Is Solved! Bowling et al. 2015 9
  • 22. Heads-up Limit Holdem Poker Is Solved! Cepheus http://poker.srv.ualberta.ca/ Bowling et al. 2015 9
  • 23. Basics of Machine learning
  • 24. Supervised versus Unsupervised Learning Supervised learning: 10
  • 25. Supervised versus Unsupervised Learning Supervised learning: data set must be labelled 10
  • 26. Supervised versus Unsupervised Learning Supervised learning: data set must be labelled e.g. which e-mail is regular/spam, which image is duck/face, ... 10
  • 27. Supervised versus Unsupervised Learning Supervised learning: data set must be labelled e.g. which e-mail is regular/spam, which image is duck/face, ... 10
  • 28. Supervised versus Unsupervised Learning Supervised learning: data set must be labelled e.g. which e-mail is regular/spam, which image is duck/face, ... Unsupervised learning: 10
  • 29. Supervised versus Unsupervised Learning Supervised learning: data set must be labelled e.g. which e-mail is regular/spam, which image is duck/face, ... Unsupervised learning: data set is not labelled 10
  • 30. Supervised versus Unsupervised Learning Supervised learning: data set must be labelled e.g. which e-mail is regular/spam, which image is duck/face, ... Unsupervised learning: data set is not labelled it can try to cluster the data into different groups 10
  • 31. Supervised versus Unsupervised Learning Supervised learning: data set must be labelled e.g. which e-mail is regular/spam, which image is duck/face, ... Unsupervised learning: data set is not labelled it can try to cluster the data into different groups e.g. grouping similar news, ... 10
  • 33. Supervised Learning 1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go Server... http://www.nickgillian.com/ 11
  • 34. Supervised Learning 1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go Server... 2. training on training set http://www.nickgillian.com/ 11
  • 35. Supervised Learning 1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go Server... 2. training on training set 3. testing on testing set http://www.nickgillian.com/ 11
  • 36. Supervised Learning 1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go Server... 2. training on training set 3. testing on testing set 4. deployment http://www.nickgillian.com/ 11
  • 37. Supervised Learning 1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go Server... 2. training on training set 3. testing on testing set 4. deployment http://www.nickgillian.com/ 11
  • 38. Supervised Learning 1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go Server... 2. training on training set 3. testing on testing set 4. deployment http://www.nickgillian.com/ 11
  • 44. Underfitting and Overfitting Beware of overfitting! https://www.researchgate.net/post/How_to_Avoid_Overfitting 15
  • 45. Underfitting and Overfitting Beware of overfitting! It is like learning for a math exam by memorizing proofs. https://www.researchgate.net/post/How_to_Avoid_Overfitting 15
  • 47. Reinforcement Learning Specially: games of self-play https://youtu.be/0X-NdPtFKq0?t=16m57s 16
  • 49. Tree Search Optimal value v∗(s) determines the outcome of the game: Silver et al. 2016 17
  • 50. Tree Search Optimal value v∗(s) determines the outcome of the game: from every board position or state s Silver et al. 2016 17
  • 51. Tree Search Optimal value v∗(s) determines the outcome of the game: from every board position or state s under perfect play by all players. Silver et al. 2016 17
  • 52. Tree Search Optimal value v∗(s) determines the outcome of the game: from every board position or state s under perfect play by all players. Silver et al. 2016 17
  • 53. Tree Search Optimal value v∗(s) determines the outcome of the game: from every board position or state s under perfect play by all players. It is computed by recursively traversing a search tree containing approximately bd possible sequences of moves, where Silver et al. 2016 17
  • 54. Tree Search Optimal value v∗(s) determines the outcome of the game: from every board position or state s under perfect play by all players. It is computed by recursively traversing a search tree containing approximately bd possible sequences of moves, where b is the games breadth (number of legal moves per position) Silver et al. 2016 17
  • 55. Tree Search Optimal value v∗(s) determines the outcome of the game: from every board position or state s under perfect play by all players. It is computed by recursively traversing a search tree containing approximately bd possible sequences of moves, where b is the games breadth (number of legal moves per position) d is its depth (game length) Silver et al. 2016 17
  • 56. Game tree of Go Sizes of trees for various games: chess: b ≈ 35, d ≈ 80 Go: b ≈ 250, d ≈ 150 Allis et al. 1994 18
  • 57. Game tree of Go Sizes of trees for various games: chess: b ≈ 35, d ≈ 80 Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the universe! Allis et al. 1994 18
  • 58. Game tree of Go Sizes of trees for various games: chess: b ≈ 35, d ≈ 80 Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the universe! That makes Go a googol times more complex than chess. https://deepmind.com/alpha-go.html Allis et al. 1994 18
  • 59. Game tree of Go Sizes of trees for various games: chess: b ≈ 35, d ≈ 80 Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the universe! That makes Go a googol times more complex than chess. https://deepmind.com/alpha-go.html How to handle the size of the game tree? Allis et al. 1994 18
  • 60. Game tree of Go Sizes of trees for various games: chess: b ≈ 35, d ≈ 80 Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the universe! That makes Go a googol times more complex than chess. https://deepmind.com/alpha-go.html How to handle the size of the game tree? for the breadth: a neural network to select moves Allis et al. 1994 18
  • 61. Game tree of Go Sizes of trees for various games: chess: b ≈ 35, d ≈ 80 Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the universe! That makes Go a googol times more complex than chess. https://deepmind.com/alpha-go.html How to handle the size of the game tree? for the breadth: a neural network to select moves for the depth: a neural network to evaluate current position Allis et al. 1994 18
  • 62. Game tree of Go Sizes of trees for various games: chess: b ≈ 35, d ≈ 80 Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the universe! That makes Go a googol times more complex than chess. https://deepmind.com/alpha-go.html How to handle the size of the game tree? for the breadth: a neural network to select moves for the depth: a neural network to evaluate current position for the tree traverse: Monte Carlo tree search (MCTS) Allis et al. 1994 18
  • 63. Monte Carlo tree search 19
  • 66. Neural Network: Inspiration inspired by the neuronal structure of the mammalian cerebral cortex http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
  • 67. Neural Network: Inspiration inspired by the neuronal structure of the mammalian cerebral cortex but on much smaller scales http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
  • 68. Neural Network: Inspiration inspired by the neuronal structure of the mammalian cerebral cortex but on much smaller scales suitable to model systems with a high tolerance to error http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
  • 69. Neural Network: Inspiration inspired by the neuronal structure of the mammalian cerebral cortex but on much smaller scales suitable to model systems with a high tolerance to error e.g. audio or image recognition http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
  • 71. Neural Network: Modes Two modes Dieterle 2003 21
  • 72. Neural Network: Modes Two modes feedforward for making predictions Dieterle 2003 21
  • 73. Neural Network: Modes Two modes feedforward for making predictions backpropagation for learning Dieterle 2003 21
  • 74. Neural Network: an example of feedforward http://stevenmiller888.github.io/mind-how-to-build-a-neural-network/ 22
  • 75. Gradient Descent in Neural Networks http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 23
  • 76. Gradient Descent in Neural Networks Motto: ”Learn by mistakes!” http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 23
  • 77. Gradient Descent in Neural Networks Motto: ”Learn by mistakes!” However, error functions are not necessarily convex or so “smooth”. http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 23
  • 78. Deep Neural Network: Inspiration The hierarchy of concepts is captured in the number of layers (the deep in “deep learning”) http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 24
  • 79. Deep Neural Network: Inspiration The hierarchy of concepts is captured in the number of layers (the deep in “deep learning”) http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 24
  • 84. Classic games (1/2) Backgammon: Man vs. Fate Chess: Man vs. Man 26
  • 85. Classic games (2/2) Go: Man vs. Self Robert ˇS´amal (White) versus Karel Kr´al (Black), Spring School of Combinatorics 2016 27
  • 87. Rules of Go Black versus White. Black starts the game. 28
  • 88. Rules of Go Black versus White. Black starts the game. 28
  • 89. Rules of Go Black versus White. Black starts the game. the rule of liberty 28
  • 90. Rules of Go Black versus White. Black starts the game. the rule of liberty the “ko” rule 28
  • 91. Rules of Go Black versus White. Black starts the game. the rule of liberty the “ko” rule Handicap for difference in ranks: Black can place 1 or more stones in advance (compensation for White’s greater strength). 28
  • 92. Scoring Rules: Area Scoring https://en.wikipedia.org/wiki/Go_(game) 29
  • 93. Scoring Rules: Area Scoring A player’s score is: the number of stones that the player has on the board https://en.wikipedia.org/wiki/Go_(game) 29
  • 94. Scoring Rules: Area Scoring A player’s score is: the number of stones that the player has on the board plus the number of empty intersections surrounded by that player’s stones https://en.wikipedia.org/wiki/Go_(game) 29
  • 95. Scoring Rules: Area Scoring A player’s score is: the number of stones that the player has on the board plus the number of empty intersections surrounded by that player’s stones plus komi(dashi) points for the White player which is a compensation for the first move advantage of the Black player https://en.wikipedia.org/wiki/Go_(game) 29
  • 96. Ranks of Players Kyu and Dan ranks https://en.wikipedia.org/wiki/Go_(game) 30
  • 97. Ranks of Players Kyu and Dan ranks or alternatively, ELO ratings https://en.wikipedia.org/wiki/Go_(game) 30
  • 100. Policy and Value Networks Silver et al. 2016 31
  • 101. Training the (Deep Convolutional) Neural Networks Silver et al. 2016 32
  • 102. SL Policy Networks (1/3) 13-layer deep convolutional neural network Silver et al. 2016 33
  • 103. SL Policy Networks (1/3) 13-layer deep convolutional neural network goal: to predict expert human moves Silver et al. 2016 33
  • 104. SL Policy Networks (1/3) 13-layer deep convolutional neural network goal: to predict expert human moves task of classification Silver et al. 2016 33
  • 105. SL Policy Networks (1/3) 13-layer deep convolutional neural network goal: to predict expert human moves task of classification trained from 30 millions positions from the KGS Go Server Silver et al. 2016 33
  • 106. SL Policy Networks (1/3) 13-layer deep convolutional neural network goal: to predict expert human moves task of classification trained from 30 millions positions from the KGS Go Server stochastic gradient ascent: ∆σ ∝ ∂ log pσ(a|s) ∂σ (to maximize the likelihood of the human move a selected in state s) Silver et al. 2016 33
  • 107. SL Policy Networks (1/3) 13-layer deep convolutional neural network goal: to predict expert human moves task of classification trained from 30 millions positions from the KGS Go Server stochastic gradient ascent: ∆σ ∝ ∂ log pσ(a|s) ∂σ (to maximize the likelihood of the human move a selected in state s) Silver et al. 2016 33
  • 108. SL Policy Networks (1/3) 13-layer deep convolutional neural network goal: to predict expert human moves task of classification trained from 30 millions positions from the KGS Go Server stochastic gradient ascent: ∆σ ∝ ∂ log pσ(a|s) ∂σ (to maximize the likelihood of the human move a selected in state s) Results: Silver et al. 2016 33
  • 109. SL Policy Networks (1/3) 13-layer deep convolutional neural network goal: to predict expert human moves task of classification trained from 30 millions positions from the KGS Go Server stochastic gradient ascent: ∆σ ∝ ∂ log pσ(a|s) ∂σ (to maximize the likelihood of the human move a selected in state s) Results: 44.4% accuracy (the state-of-the-art from other groups) Silver et al. 2016 33
  • 110. SL Policy Networks (1/3) 13-layer deep convolutional neural network goal: to predict expert human moves task of classification trained from 30 millions positions from the KGS Go Server stochastic gradient ascent: ∆σ ∝ ∂ log pσ(a|s) ∂σ (to maximize the likelihood of the human move a selected in state s) Results: 44.4% accuracy (the state-of-the-art from other groups) 55.7% accuracy (raw board position + move history as input) Silver et al. 2016 33
  • 111. SL Policy Networks (1/3) 13-layer deep convolutional neural network goal: to predict expert human moves task of classification trained from 30 millions positions from the KGS Go Server stochastic gradient ascent: ∆σ ∝ ∂ log pσ(a|s) ∂σ (to maximize the likelihood of the human move a selected in state s) Results: 44.4% accuracy (the state-of-the-art from other groups) 55.7% accuracy (raw board position + move history as input) 57.0% accuracy (all input features) Silver et al. 2016 33
  • 112. SL Policy Networks (2/3) Small improvements in accuracy led to large improvements in playing strength (see the next slide) Silver et al. 2016 34
  • 113. SL Policy Networks (3/3) move probabilities taken directly from the SL policy network pσ (reported as a percentage if above 0.1%). Silver et al. 2016 35
  • 114. Training the (Deep Convolutional) Neural Networks Silver et al. 2016 36
  • 115. Rollout Policy Rollout policy pπ(a|s) is faster but less accurate than SL policy network. Silver et al. 2016 37
  • 116. Rollout Policy Rollout policy pπ(a|s) is faster but less accurate than SL policy network. accuracy of 24.2% Silver et al. 2016 37
  • 117. Rollout Policy Rollout policy pπ(a|s) is faster but less accurate than SL policy network. accuracy of 24.2% It takes 2µs to select an action, compared to 3 ms in case of SL policy network. Silver et al. 2016 37
  • 118. Training the (Deep Convolutional) Neural Networks Silver et al. 2016 38
  • 119. RL Policy Networks (1/2) identical in structure to the SL policy network Silver et al. 2016 39
  • 120. RL Policy Networks (1/2) identical in structure to the SL policy network goal: to win in the games of self-play Silver et al. 2016 39
  • 121. RL Policy Networks (1/2) identical in structure to the SL policy network goal: to win in the games of self-play task of classification Silver et al. 2016 39
  • 122. RL Policy Networks (1/2) identical in structure to the SL policy network goal: to win in the games of self-play task of classification weights ρ initialized to the same values, ρ := σ Silver et al. 2016 39
  • 123. RL Policy Networks (1/2) identical in structure to the SL policy network goal: to win in the games of self-play task of classification weights ρ initialized to the same values, ρ := σ games of self-play Silver et al. 2016 39
  • 124. RL Policy Networks (1/2) identical in structure to the SL policy network goal: to win in the games of self-play task of classification weights ρ initialized to the same values, ρ := σ games of self-play between the current RL policy network and a randomly selected previous iteration Silver et al. 2016 39
  • 125. RL Policy Networks (1/2) identical in structure to the SL policy network goal: to win in the games of self-play task of classification weights ρ initialized to the same values, ρ := σ games of self-play between the current RL policy network and a randomly selected previous iteration to prevent overfitting to the current policy Silver et al. 2016 39
  • 126. RL Policy Networks (1/2) identical in structure to the SL policy network goal: to win in the games of self-play task of classification weights ρ initialized to the same values, ρ := σ games of self-play between the current RL policy network and a randomly selected previous iteration to prevent overfitting to the current policy stochastic gradient ascent: ∆ρ ∝ ∂ log pρ(at|st) ∂ρ zt at time step t, where reward function zt is +1 for winning and −1 for losing. Silver et al. 2016 39
  • 127. RL Policy Networks (1/2) identical in structure to the SL policy network goal: to win in the games of self-play task of classification weights ρ initialized to the same values, ρ := σ games of self-play between the current RL policy network and a randomly selected previous iteration to prevent overfitting to the current policy stochastic gradient ascent: ∆ρ ∝ ∂ log pρ(at|st) ∂ρ zt at time step t, where reward function zt is +1 for winning and −1 for losing. Silver et al. 2016 39
  • 128. RL Policy Networks (1/2) identical in structure to the SL policy network goal: to win in the games of self-play task of classification weights ρ initialized to the same values, ρ := σ games of self-play between the current RL policy network and a randomly selected previous iteration to prevent overfitting to the current policy stochastic gradient ascent: ∆ρ ∝ ∂ log pρ(at|st) ∂ρ zt at time step t, where reward function zt is +1 for winning and −1 for losing. Silver et al. 2016 39
  • 129. RL Policy Networks (2/2) Results (by sampling each move at ∼ pρ(·|st)): Silver et al. 2016 40
  • 130. RL Policy Networks (2/2) Results (by sampling each move at ∼ pρ(·|st)): 80% of win rate against the SL policy network Silver et al. 2016 40
  • 131. RL Policy Networks (2/2) Results (by sampling each move at ∼ pρ(·|st)): 80% of win rate against the SL policy network 85% of win rate against the strongest open-source Go program, Pachi (Baudiˇs and Gailly 2011) Silver et al. 2016 40
  • 132. RL Policy Networks (2/2) Results (by sampling each move at ∼ pρ(·|st)): 80% of win rate against the SL policy network 85% of win rate against the strongest open-source Go program, Pachi (Baudiˇs and Gailly 2011) The previous state-of-the-art, based only on SL of CNN: Silver et al. 2016 40
  • 133. RL Policy Networks (2/2) Results (by sampling each move at ∼ pρ(·|st)): 80% of win rate against the SL policy network 85% of win rate against the strongest open-source Go program, Pachi (Baudiˇs and Gailly 2011) The previous state-of-the-art, based only on SL of CNN: Silver et al. 2016 40
  • 134. RL Policy Networks (2/2) Results (by sampling each move at ∼ pρ(·|st)): 80% of win rate against the SL policy network 85% of win rate against the strongest open-source Go program, Pachi (Baudiˇs and Gailly 2011) The previous state-of-the-art, based only on SL of CNN: 11% of “win” rate against Pachi Silver et al. 2016 40
  • 135. Training the (Deep Convolutional) Neural Networks Silver et al. 2016 41
  • 136. Value Network (1/2) similar architecture to the policy network, but outputs a single prediction instead of a probability distribution Silver et al. 2016 42
  • 137. Value Network (1/2) similar architecture to the policy network, but outputs a single prediction instead of a probability distribution goal: to estimate a value function vp (s) = E[zt|st = s, at...T ∼ p] that predicts the outcome from position s (of games played by using policy pρ) Silver et al. 2016 42
  • 138. Value Network (1/2) similar architecture to the policy network, but outputs a single prediction instead of a probability distribution goal: to estimate a value function vp (s) = E[zt|st = s, at...T ∼ p] that predicts the outcome from position s (of games played by using policy pρ) Specifically, vθ(s) ≈ vpρ (s) ≈ v∗(s). Silver et al. 2016 42
  • 139. Value Network (1/2) similar architecture to the policy network, but outputs a single prediction instead of a probability distribution goal: to estimate a value function vp (s) = E[zt|st = s, at...T ∼ p] that predicts the outcome from position s (of games played by using policy pρ) Specifically, vθ(s) ≈ vpρ (s) ≈ v∗(s). task of regression Silver et al. 2016 42
  • 140. Value Network (1/2) similar architecture to the policy network, but outputs a single prediction instead of a probability distribution goal: to estimate a value function vp (s) = E[zt|st = s, at...T ∼ p] that predicts the outcome from position s (of games played by using policy pρ) Specifically, vθ(s) ≈ vpρ (s) ≈ v∗(s). task of regression stochastic gradient descent: ∆θ ∝ ∂vθ(s) ∂θ (z − vθ(s)) (to minimize the mean squared error (MSE) between the predicted vθ(s) and the true z) Silver et al. 2016 42
  • 141. Value Network (2/2) Beware of overfitting! Silver et al. 2016 43
  • 142. Value Network (2/2) Beware of overfitting! Successive positions are strongly correlated. Silver et al. 2016 43
  • 143. Value Network (2/2) Beware of overfitting! Successive positions are strongly correlated. Value network memorized the game outcomes, rather than generalizing to new positions. Silver et al. 2016 43
  • 144. Value Network (2/2) Beware of overfitting! Successive positions are strongly correlated. Value network memorized the game outcomes, rather than generalizing to new positions. Solution: generate 30 million (new) positions, each sampled from a seperate game Silver et al. 2016 43
  • 145. Value Network (2/2) Beware of overfitting! Successive positions are strongly correlated. Value network memorized the game outcomes, rather than generalizing to new positions. Solution: generate 30 million (new) positions, each sampled from a seperate game almost the accuracy of Monte Carlo rollouts (using pρ), but 15000 times less computation! Silver et al. 2016 43
  • 146. Selection of Moves by the Value Network evaluation of all successors s of the root position s, using vθ(s) Silver et al. 2016 44
  • 147. Evaluation accuracy in various stages of a game Move number is the number of moves that had been played in the given position. Silver et al. 2016 45
  • 148. Evaluation accuracy in various stages of a game Move number is the number of moves that had been played in the given position. Each position evaluated by: forward pass of the value network vθ Silver et al. 2016 45
  • 149. Evaluation accuracy in various stages of a game Move number is the number of moves that had been played in the given position. Each position evaluated by: forward pass of the value network vθ 100 rollouts, played out using the corresponding policy Silver et al. 2016 45
  • 150. Training the (Deep Convolutional) Neural Networks Silver et al. 2016 46
  • 151. ELO Ratings for Various Combinations of Networks Silver et al. 2016 47
  • 152. MCTS Algorithm The next action is selected by lookahead search, using simulation: Silver et al. 2016 48
  • 153. MCTS Algorithm The next action is selected by lookahead search, using simulation: 1. selection phase Silver et al. 2016 48
  • 154. MCTS Algorithm The next action is selected by lookahead search, using simulation: 1. selection phase 2. expansion phase Silver et al. 2016 48
  • 155. MCTS Algorithm The next action is selected by lookahead search, using simulation: 1. selection phase 2. expansion phase 3. evaluation phase Silver et al. 2016 48
  • 156. MCTS Algorithm The next action is selected by lookahead search, using simulation: 1. selection phase 2. expansion phase 3. evaluation phase 4. backup phase (at end of simulation) Silver et al. 2016 48
  • 157. MCTS Algorithm The next action is selected by lookahead search, using simulation: 1. selection phase 2. expansion phase 3. evaluation phase 4. backup phase (at end of simulation) Silver et al. 2016 48
  • 158. MCTS Algorithm The next action is selected by lookahead search, using simulation: 1. selection phase 2. expansion phase 3. evaluation phase 4. backup phase (at end of simulation) Each edge (s, a) keeps: action value Q(s, a) Silver et al. 2016 48
  • 159. MCTS Algorithm The next action is selected by lookahead search, using simulation: 1. selection phase 2. expansion phase 3. evaluation phase 4. backup phase (at end of simulation) Each edge (s, a) keeps: action value Q(s, a) visit count N(s, a) Silver et al. 2016 48
  • 160. MCTS Algorithm The next action is selected by lookahead search, using simulation: 1. selection phase 2. expansion phase 3. evaluation phase 4. backup phase (at end of simulation) Each edge (s, a) keeps: action value Q(s, a) visit count N(s, a) prior probability P(s, a) (from SL policy network pσ) Silver et al. 2016 48
  • 161. MCTS Algorithm The next action is selected by lookahead search, using simulation: 1. selection phase 2. expansion phase 3. evaluation phase 4. backup phase (at end of simulation) Each edge (s, a) keeps: action value Q(s, a) visit count N(s, a) prior probability P(s, a) (from SL policy network pσ) Silver et al. 2016 48
  • 162. MCTS Algorithm The next action is selected by lookahead search, using simulation: 1. selection phase 2. expansion phase 3. evaluation phase 4. backup phase (at end of simulation) Each edge (s, a) keeps: action value Q(s, a) visit count N(s, a) prior probability P(s, a) (from SL policy network pσ) The tree is traversed by simulation (descending the tree) from the root state. Silver et al. 2016 48
  • 164. MCTS Algorithm: Selection At each time step t, an action at is selected from state st at = arg max a (Q(st , a) + u(st , a)) Silver et al. 2016 49
  • 165. MCTS Algorithm: Selection At each time step t, an action at is selected from state st at = arg max a (Q(st , a) + u(st , a)) where bonus u(st , a) ∝ P(s, a) 1 + N(s, a) Silver et al. 2016 49
  • 167. MCTS Algorithm: Expansion A leaf position may be expanded (just once) by the SL policy network pσ. Silver et al. 2016 50
  • 168. MCTS Algorithm: Expansion A leaf position may be expanded (just once) by the SL policy network pσ. The output probabilities are stored as priors P(s, a) := pσ(a|s). Silver et al. 2016 50
  • 170. MCTS: Evaluation evaluation from the value network vθ(s) Silver et al. 2016 51
  • 171. MCTS: Evaluation evaluation from the value network vθ(s) evaluation by the outcome z using the fast rollout policy pπ until the end of game Silver et al. 2016 51
  • 172. MCTS: Evaluation evaluation from the value network vθ(s) evaluation by the outcome z using the fast rollout policy pπ until the end of game Silver et al. 2016 51
  • 173. MCTS: Evaluation evaluation from the value network vθ(s) evaluation by the outcome z using the fast rollout policy pπ until the end of game Using a mixing parameter λ, the final leaf evaluation V (s) is V (s) = (1 − λ)vθ(s) + λz Silver et al. 2016 51
  • 174. Tree Evaluation from Value Network action values Q(s, a) for each tree-edge (s, a) from root position s (averaged over value network evaluations only) Silver et al. 2016 52
  • 175. Tree Evaluation from Rollouts action values Q(s, a), averaged over rollout evaluations only Silver et al. 2016 53
  • 176. MCTS: Backup At the end of simulation, each traversed edge is updated by accumulating: the action values Q Silver et al. 2016 54
  • 177. MCTS: Backup At the end of simulation, each traversed edge is updated by accumulating: the action values Q visit counts N Silver et al. 2016 54
  • 178. Once the search is complete, the algorithm chooses the most visited move from the root position. Silver et al. 2016 54
  • 179. Percentage of Simulations percentage frequency with which actions were selected from the root during simulations Silver et al. 2016 55
  • 180. Principal Variation (Path with Maximum Visit Count) The moves are presented in a numbered sequence. Silver et al. 2016 56
  • 181. Principal Variation (Path with Maximum Visit Count) The moves are presented in a numbered sequence. AlphaGo selected the move indicated by the red circle; Silver et al. 2016 56
  • 182. Principal Variation (Path with Maximum Visit Count) The moves are presented in a numbered sequence. AlphaGo selected the move indicated by the red circle; Fan Hui responded with the move indicated by the white square; Silver et al. 2016 56
  • 183. Principal Variation (Path with Maximum Visit Count) The moves are presented in a numbered sequence. AlphaGo selected the move indicated by the red circle; Fan Hui responded with the move indicated by the white square; in his post-game commentary, he preferred the move (labelled 1) predicted by AlphaGo. Silver et al. 2016 56
  • 186. Scalability asynchronous multi-threaded search simulations on CPUs computation of neural networks on GPUs Silver et al. 2016 57
  • 187. Scalability asynchronous multi-threaded search simulations on CPUs computation of neural networks on GPUs Silver et al. 2016 57
  • 188. Scalability asynchronous multi-threaded search simulations on CPUs computation of neural networks on GPUs AlphaGo: 40 search threads Silver et al. 2016 57
  • 189. Scalability asynchronous multi-threaded search simulations on CPUs computation of neural networks on GPUs AlphaGo: 40 search threads 40 CPUs Silver et al. 2016 57
  • 190. Scalability asynchronous multi-threaded search simulations on CPUs computation of neural networks on GPUs AlphaGo: 40 search threads 40 CPUs 8 GPUs Silver et al. 2016 57
  • 191. Scalability asynchronous multi-threaded search simulations on CPUs computation of neural networks on GPUs AlphaGo: 40 search threads 40 CPUs 8 GPUs Silver et al. 2016 57
  • 192. Scalability asynchronous multi-threaded search simulations on CPUs computation of neural networks on GPUs AlphaGo: 40 search threads 40 CPUs 8 GPUs Distributed version of AlphaGo (on multiple machines): 40 search threads Silver et al. 2016 57
  • 193. Scalability asynchronous multi-threaded search simulations on CPUs computation of neural networks on GPUs AlphaGo: 40 search threads 40 CPUs 8 GPUs Distributed version of AlphaGo (on multiple machines): 40 search threads 1202 CPUs Silver et al. 2016 57
  • 194. Scalability asynchronous multi-threaded search simulations on CPUs computation of neural networks on GPUs AlphaGo: 40 search threads 40 CPUs 8 GPUs Distributed version of AlphaGo (on multiple machines): 40 search threads 1202 CPUs 176 GPUs Silver et al. 2016 57
  • 195. ELO Ratings for Various Combinations of Threads Silver et al. 2016 58
  • 196. Results: the strength of AlphaGo
  • 197. Tournament with Other Go Programs Silver et al. 2016 59
  • 199. Fan Hui professional 2 dan https://en.wikipedia.org/wiki/Fan_Hui 60
  • 200. Fan Hui professional 2 dan European Go Champion in 2013, 2014 and 2015 https://en.wikipedia.org/wiki/Fan_Hui 60
  • 201. Fan Hui professional 2 dan European Go Champion in 2013, 2014 and 2015 European Professional Go Champion in 2016 https://en.wikipedia.org/wiki/Fan_Hui 60
  • 202. Fan Hui professional 2 dan European Go Champion in 2013, 2014 and 2015 European Professional Go Champion in 2016 biological neural network: https://en.wikipedia.org/wiki/Fan_Hui 60
  • 203. Fan Hui professional 2 dan European Go Champion in 2013, 2014 and 2015 European Professional Go Champion in 2016 biological neural network: 100 billion neurons https://en.wikipedia.org/wiki/Fan_Hui 60
  • 204. Fan Hui professional 2 dan European Go Champion in 2013, 2014 and 2015 European Professional Go Champion in 2016 biological neural network: 100 billion neurons 100 up to 1,000 trillion neuronal connections https://en.wikipedia.org/wiki/Fan_Hui 60
  • 206. AlphaGo versus Fan Hui AlphaGo won 5 - 0 in a formal match on October 2015. 61
  • 207. AlphaGo versus Fan Hui AlphaGo won 5 - 0 in a formal match on October 2015. [AlphaGo] is very strong and stable, it seems like a wall. ... I know AlphaGo is a computer, but if no one told me, maybe I would think the player was a little strange, but a very strong player, a real person. Fan Hui 61
  • 208. Lee Sedol “The Strong Stone” https://en.wikipedia.org/wiki/Lee_Sedol 62
  • 209. Lee Sedol “The Strong Stone” professional 9 dan https://en.wikipedia.org/wiki/Lee_Sedol 62
  • 210. Lee Sedol “The Strong Stone” professional 9 dan the 2nd in international titles https://en.wikipedia.org/wiki/Lee_Sedol 62
  • 211. Lee Sedol “The Strong Stone” professional 9 dan the 2nd in international titles the 5th youngest (12 years 4 months) to become a professional Go player in South Korean history https://en.wikipedia.org/wiki/Lee_Sedol 62
  • 212. Lee Sedol “The Strong Stone” professional 9 dan the 2nd in international titles the 5th youngest (12 years 4 months) to become a professional Go player in South Korean history Lee Sedol would win 97 out of 100 games against Fan Hui. https://en.wikipedia.org/wiki/Lee_Sedol 62
  • 213. Lee Sedol “The Strong Stone” professional 9 dan the 2nd in international titles the 5th youngest (12 years 4 months) to become a professional Go player in South Korean history Lee Sedol would win 97 out of 100 games against Fan Hui. biological neural network, comparable to Fan Hui’s (in number of neurons and connections) https://en.wikipedia.org/wiki/Lee_Sedol 62
  • 214. I heard Google DeepMind’s AI is surprisingly strong and getting stronger, but I am confident that I can win, at least this time. Lee Sedol 62
  • 215. I heard Google DeepMind’s AI is surprisingly strong and getting stronger, but I am confident that I can win, at least this time. Lee Sedol ...even beating AlphaGo by 4-1 may allow the Google DeepMind team to claim its de facto victory and the defeat of him [Lee Sedol], or even humankind. interview in JTBC Newsroom 62
  • 216. I heard Google DeepMind’s AI is surprisingly strong and getting stronger, but I am confident that I can win, at least this time. Lee Sedol ...even beating AlphaGo by 4-1 may allow the Google DeepMind team to claim its de facto victory and the defeat of him [Lee Sedol], or even humankind. interview in JTBC Newsroom 62
  • 217. AlphaGo versus Lee Sedol https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
  • 218. AlphaGo versus Lee Sedol In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol. https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
  • 219. AlphaGo versus Lee Sedol In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol. AlphaGo won all but the 4th game; all games were won by resignation. https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
  • 220. AlphaGo versus Lee Sedol In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol. AlphaGo won all but the 4th game; all games were won by resignation. The winner of the match was slated to win $1 million. https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
  • 221. AlphaGo versus Lee Sedol In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol. AlphaGo won all but the 4th game; all games were won by resignation. The winner of the match was slated to win $1 million. Since AlphaGo won, Google DeepMind stated that the prize will be donated to charities, including UNICEF, and Go organisations. https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
  • 222. AlphaGo versus Lee Sedol In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol. AlphaGo won all but the 4th game; all games were won by resignation. The winner of the match was slated to win $1 million. Since AlphaGo won, Google DeepMind stated that the prize will be donated to charities, including UNICEF, and Go organisations. Lee received $170,000 ($150,000 for participating in all the five games, and an additional $20,000 for each game won). https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
  • 224. Difficulties of Go challenging decision-making Silver et al. 2016 64
  • 225. Difficulties of Go challenging decision-making intractable search space Silver et al. 2016 64
  • 226. Difficulties of Go challenging decision-making intractable search space complex optimal solution It appears infeasible to directly approximate using a policy or value function! Silver et al. 2016 64
  • 227. AlphaGo: summary Monte Carlo tree search Silver et al. 2016 65
  • 228. AlphaGo: summary Monte Carlo tree search effective move selection and position evaluation Silver et al. 2016 65
  • 229. AlphaGo: summary Monte Carlo tree search effective move selection and position evaluation through deep convolutional neural networks Silver et al. 2016 65
  • 230. AlphaGo: summary Monte Carlo tree search effective move selection and position evaluation through deep convolutional neural networks trained by novel combination of supervised and reinforcement learning Silver et al. 2016 65
  • 231. AlphaGo: summary Monte Carlo tree search effective move selection and position evaluation through deep convolutional neural networks trained by novel combination of supervised and reinforcement learning new search algorithm combining Silver et al. 2016 65
  • 232. AlphaGo: summary Monte Carlo tree search effective move selection and position evaluation through deep convolutional neural networks trained by novel combination of supervised and reinforcement learning new search algorithm combining neural network evaluation Silver et al. 2016 65
  • 233. AlphaGo: summary Monte Carlo tree search effective move selection and position evaluation through deep convolutional neural networks trained by novel combination of supervised and reinforcement learning new search algorithm combining neural network evaluation Monte Carlo rollouts Silver et al. 2016 65
  • 234. AlphaGo: summary Monte Carlo tree search effective move selection and position evaluation through deep convolutional neural networks trained by novel combination of supervised and reinforcement learning new search algorithm combining neural network evaluation Monte Carlo rollouts scalable implementation Silver et al. 2016 65
  • 235. AlphaGo: summary Monte Carlo tree search effective move selection and position evaluation through deep convolutional neural networks trained by novel combination of supervised and reinforcement learning new search algorithm combining neural network evaluation Monte Carlo rollouts scalable implementation multi-threaded simulations on CPUs Silver et al. 2016 65
  • 236. AlphaGo: summary Monte Carlo tree search effective move selection and position evaluation through deep convolutional neural networks trained by novel combination of supervised and reinforcement learning new search algorithm combining neural network evaluation Monte Carlo rollouts scalable implementation multi-threaded simulations on CPUs parallel GPU computations Silver et al. 2016 65
  • 237. AlphaGo: summary Monte Carlo tree search effective move selection and position evaluation through deep convolutional neural networks trained by novel combination of supervised and reinforcement learning new search algorithm combining neural network evaluation Monte Carlo rollouts scalable implementation multi-threaded simulations on CPUs parallel GPU computations distributed version over multiple machines Silver et al. 2016 65
  • 238. Novel approach Silver et al. 2016 66
  • 239. Novel approach During the match against Fan Hui, AlphaGo evaluated thousands of times fewer positions than DeepBlue against Kasparov. Silver et al. 2016 66
  • 240. Novel approach During the match against Fan Hui, AlphaGo evaluated thousands of times fewer positions than DeepBlue against Kasparov. It compensated this by: selecting those positions more intelligently (policy network) Silver et al. 2016 66
  • 241. Novel approach During the match against Fan Hui, AlphaGo evaluated thousands of times fewer positions than DeepBlue against Kasparov. It compensated this by: selecting those positions more intelligently (policy network) evaluating them more precisely (value network) Silver et al. 2016 66
  • 242. Novel approach During the match against Fan Hui, AlphaGo evaluated thousands of times fewer positions than DeepBlue against Kasparov. It compensated this by: selecting those positions more intelligently (policy network) evaluating them more precisely (value network) Silver et al. 2016 66
  • 243. Novel approach During the match against Fan Hui, AlphaGo evaluated thousands of times fewer positions than DeepBlue against Kasparov. It compensated this by: selecting those positions more intelligently (policy network) evaluating them more precisely (value network) Deep Blue relied on a handcrafted evaluation function. Silver et al. 2016 66
  • 244. Novel approach During the match against Fan Hui, AlphaGo evaluated thousands of times fewer positions than DeepBlue against Kasparov. It compensated this by: selecting those positions more intelligently (policy network) evaluating them more precisely (value network) Deep Blue relied on a handcrafted evaluation function. AlphaGo was trained directly and automatically from gameplay. It used general-purpose learning. Silver et al. 2016 66
  • 245. Novel approach During the match against Fan Hui, AlphaGo evaluated thousands of times fewer positions than DeepBlue against Kasparov. It compensated this by: selecting those positions more intelligently (policy network) evaluating them more precisely (value network) Deep Blue relied on a handcrafted evaluation function. AlphaGo was trained directly and automatically from gameplay. It used general-purpose learning. This approach is not specific to the game of Go. The algorithm can be used for much wider class of (so far seemingly) intractable problems in AI! Silver et al. 2016 66
  • 248. Input features for rollout and tree policy Silver et al. 2016
  • 249. Results of a tournament between different Go programs Silver et al. 2016
  • 250. Results of a tournament between AlphaGo and distributed Al- phaGo, testing scalability with hardware Silver et al. 2016
  • 251. AlphaGo versus Fan Hui: Game 1 Silver et al. 2016
  • 252. AlphaGo versus Fan Hui: Game 2 Silver et al. 2016
  • 253. AlphaGo versus Fan Hui: Game 3 Silver et al. 2016
  • 254. AlphaGo versus Fan Hui: Game 4 Silver et al. 2016
  • 255. AlphaGo versus Fan Hui: Game 5 Silver et al. 2016
  • 256. AlphaGo versus Lee Sedol: Game 1 https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
  • 257. AlphaGo versus Lee Sedol: Game 2 (1/2) https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
  • 258. AlphaGo versus Lee Sedol: Game 2 (2/2) https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
  • 259. AlphaGo versus Lee Sedol: Game 3 https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
  • 260. AlphaGo versus Lee Sedol: Game 4 https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
  • 261. AlphaGo versus Lee Sedol: Game 5 (1/2) https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
  • 262. AlphaGo versus Lee Sedol: Game 5 (2/2) https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
  • 263. Further Reading I AlphaGo: Google Research Blog http://googleresearch.blogspot.cz/2016/01/alphago-mastering-ancient-game-of-go.html an article in Nature http://www.nature.com/news/google-ai-algorithm-masters-ancient-game-of-go-1.19234 a reddit article claiming that AlphaGo is even stronger than it appears to be: “AlphaGo would rather win by less points, but with higher probability.” https://www.reddit.com/r/baduk/comments/49y17z/the_true_strength_of_alphago/ Articles by Google DeepMind: Atari player: a DeepRL system which combines Deep Neural Networks with Reinforcement Learning (Mnih et al. 2015) Neural Turing Machines (Graves, Wayne, and Danihelka 2014) Artificial Intelligence: Artificial Intelligence course at MIT http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/ 6-034-artificial-intelligence-fall-2010/index.htm Introduction to Artificial Intelligence at Udacity https://www.udacity.com/course/intro-to-artificial-intelligence--cs271
  • 264. Further Reading II General Game Playing course https://www.coursera.org/course/ggp Singularity http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html + Part 2 The Singularity Is Near (Kurzweil 2005) Combinatorial Game Theory (founded by John H. Conway to study endgames in Go): Combinatorial Game Theory course https://www.coursera.org/learn/combinatorial-game-theory On Numbers and Games (Conway 1976) Machine Learning: Machine Learning course https://youtu.be/hPKJBXkyTK://www.coursera.org/learn/machine-learning/ Reinforcement Learning http://reinforcementlearning.ai-depot.com/ Deep Learning (LeCun, Bengio, and Hinton 2015) Deep Learning course https://www.udacity.com/course/deep-learning--ud730 Two Minute Papers https://www.youtube.com/user/keeroyz Applications of Deep Learning https://youtu.be/hPKJBXkyTKM Neuroscience: http://www.brainfacts.org/
  • 265. References I Allis, Louis Victor et al. (1994). Searching for solutions in games and artificial intelligence. Ponsen & Looijen. Baudiˇs, Petr and Jean-loup Gailly (2011). “Pachi: State of the art open source Go program”. In: Advances in Computer Games. Springer, pp. 24–38. Bowling, Michael et al. (2015). “Heads-up limit holdem poker is solved”. In: Science 347.6218, pp. 145–149. url: http://poker.cs.ualberta.ca/15science.html. Conway, John Horton (1976). “On Numbers and Games”. In: London Mathematical Society Monographs 6. Corrado, Greg (2015). Computer, respond to this email. url: http://googleresearch.blogspot.cz/2015/11/computer-respond-to-this-email.html#1 (visited on 03/31/2016). Dieterle, Frank Jochen (2003). “Multianalyte quantifications by means of integration of artificial neural networks, genetic algorithms and chemometrics for time-resolved analytical data”. PhD thesis. Universit¨at T¨ubingen. Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge (2015). “A Neural Algorithm of Artistic Style”. In: CoRR abs/1508.06576. url: http://arxiv.org/abs/1508.06576. Graves, Alex, Greg Wayne, and Ivo Danihelka (2014). “Neural turing machines”. In: arXiv preprint arXiv:1410.5401. Hayes, Bradley (2016). url: https://twitter.com/deepdrumpf.
  • 266. References II Karpathy, Andrej (2015). The Unreasonable Effectiveness of Recurrent Neural Networks. url: http://karpathy.github.io/2015/05/21/rnn-effectiveness/ (visited on 04/01/2016). Kurzweil, Ray (2005). The singularity is near: When humans transcend biology. Penguin. LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton (2015). “Deep learning”. In: Nature 521.7553, pp. 436–444. Li, Chuan and Michael Wand (2016). “Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis”. In: CoRR abs/1601.04589. url: http://arxiv.org/abs/1601.04589. Mnih, Volodymyr et al. (2015). “Human-level control through deep reinforcement learning”. In: Nature 518.7540, pp. 529–533. url: https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf. Munroe, Randall. Game AIs. url: https://xkcd.com/1002/ (visited on 04/02/2016). Silver, David et al. (2016). “Mastering the game of Go with deep neural networks and tree search”. In: Nature 529.7587, pp. 484–489. Sun, Felix. DeepHear - Composing and harmonizing music with neural networks. url: http://web.mit.edu/felixsun/www/neural-music.html (visited on 04/02/2016).