Mastering the game of Go with deep neural networks and tree search: Presentation

Mastering the game of Go
with deep neural networks and tree search
Karel Ha
article by Google DeepMind
Spring School of Combinatorics 2016

Applications of AI
spam ﬁlters
1

Applications of AI
spam ﬁlters
recommender systems (Netﬂix, YouTube)
1

Applications of AI
spam ﬁlters
predictive text (Swiftkey)
1

Applications of AI
spam ﬁlters
audio recognition (Shazam, SoundHound)
1

Applications of AI
spam ﬁlters
music generation (DeepHear - Composing and harmonizing
music with neural networks)
1

Applications of AI
spam ﬁlters
music generation (DeepHear - Composing and harmonizing
music with neural networks)
self-driving cars
1

Auto Reply Feature of Google Inbox
Corrado 2015 2

Artistic-style Painting
[1] Gatys, Ecker, and Bethge 2015 [2] Li and Wand 2016 3

Baby Names Generated Character by Character
Baby Killiel Saddie Char Ahbort With
Karpathy 2015 4

Rudi Levette Berice Lussa Hany Mareanne Chrestina Carissy
Karpathy 2015 4

Rudi Levette Berice Lussa Hany Mareanne Chrestina Carissy
Marylen Hammine Janye Marlise Jacacrie Hendred Romand
Charienna Nenotto Ette Dorane Wallen Marly Darine Salina
Elvyn Ersia Maralena Minoria Ellia Charmin Antley Nerille
Chelon Walmor Evena Jeryly Stachon Charisa Allisa Anatha
Cathanie Geetra Alexie Jerin Cassen Herbett Cossie Velen
Daurenge Robester Shermond Terisa Licia Roselen Ferine Jayn
Lusine Charyanne Sales Sanny Resa Wallon Martine Merus
Jelen Candica Wallin Tel Rachene Tarine Ozila Ketia Shanne
Arnande Karella Roselina Alessia Chasty Deland Berther
Geamar Jackein Mellisand Sagdy Nenc Lessie Rasemy Guen
Karpathy 2015 4

C code Generated Character by Character
Karpathy 2015 5

Algebraic Geometry Generated Character by Character
Karpathy 2015 6

DeepDrumpf
https://twitter.com/deepdrumpf
Hayes 2016 7

DeepDrumpf
https://twitter.com/deepdrumpf = a Twitter bot that has
learned the language of Donald Trump from his speeches
Hayes 2016 7

Atari Player by Google DeepMind
https://youtu.be/0X-NdPtFKq0?t=21m13s
Mnih et al. 2015 8

Heads-up Limit Holdem Poker Is Solved!
Bowling et al. 2015 9

Heads-up Limit Holdem Poker Is Solved!
Cepheus http://poker.srv.ualberta.ca/
Bowling et al. 2015 9

Supervised versus Unsupervised Learning
Supervised learning:
10

data set must be labelled
10

e.g. which e-mail is regular/spam, which image is duck/face,
...
10

...
Unsupervised learning:
10

...
data set is not labelled
10

...
it can try to cluster the data into diﬀerent groups
10

...
it can try to cluster the data into diﬀerent groups
e.g. grouping similar news, ...
10

Supervised Learning
http://www.nickgillian.com/ 11

Supervised Learning
1. data collection: Google Search, Facebook “Likes”, Siri, Netﬂix, YouTube views, LHC collisions, KGS Go
Server...

Supervised Learning
Server...
2. training on training set

Supervised Learning
Server...
3. testing on testing set

Supervised Learning
Server...
3. testing on testing set
4. deployment

Mathematical Regression
https://thermanuals.wordpress.com/descriptive-analysis/sampling-and-regression/
13

Classiﬁcation
https://kevinbinz.files.wordpress.com/2014/08/ml-svm-after-comparison.png 14

Underﬁtting and Overﬁtting
https://www.researchgate.net/post/How_to_Avoid_Overfitting 15

Beware of overﬁtting!

It is like learning for a math exam by memorizing proofs.

Reinforcement Learning
https://youtu.be/0X-NdPtFKq0?t=16m57s 16

Reinforcement Learning
Specially: games of self-play
https://youtu.be/0X-NdPtFKq0?t=16m57s 16

Tree Search
Optimal value v∗(s) determines the outcome of the game:
Silver et al. 2016 17

Tree Search
from every board position or state s

Tree Search
under perfect play by all players.

Tree Search
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where

Tree Search
b is the games breadth (number of legal moves per position)

Tree Search
b is the games breadth (number of legal moves per position)
d is its depth (game length)

Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150
Allis et al. 1994 18

Game tree of Go
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!

Game tree of Go
chess: b ≈ 35, d ≈ 80
universe!
That makes Go a googol
times more complex than
chess.
https://deepmind.com/alpha-go.html

Game tree of Go
chess: b ≈ 35, d ≈ 80
universe!
chess.
How to handle the size of the game tree?

Game tree of Go
chess: b ≈ 35, d ≈ 80
universe!
chess.
for the breadth: a neural network to select moves

Game tree of Go
chess: b ≈ 35, d ≈ 80
universe!
chess.
for the depth: a neural network to evaluate current position

Game tree of Go
chess: b ≈ 35, d ≈ 80
universe!
chess.
for the depth: a neural network to evaluate current position
for the tree traverse: Monte Carlo tree search (MCTS)

Neural Network: Inspiration
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20

inspired by the neuronal structure of the mammalian cerebral
cortex

cortex
but on much smaller scales

cortex
suitable to model systems with a high tolerance to error

cortex
suitable to model systems with a high tolerance to error
e.g. audio or image recognition

Neural Network: Modes
Dieterle 2003 21

Two modes
Dieterle 2003 21

Two modes
feedforward for making predictions
Dieterle 2003 21

Two modes
feedforward for making predictions
backpropagation for learning
Dieterle 2003 21

Neural Network: an example of feedforward
http://stevenmiller888.github.io/mind-how-to-build-a-neural-network/ 22

Gradient Descent in Neural Networks

Motto: ”Learn by mistakes!”

Motto: ”Learn by mistakes!”
However, error functions are not necessarily convex or so “smooth”.

Deep Neural Network: Inspiration
The hierarchy of concepts is captured in the number of layers (the deep in “deep learning”)

Convolutional Neural Network

Classic games (1/2)
Backgammon: Man vs. Fate
26

Classic games (1/2)
Backgammon: Man vs. Fate
Chess: Man vs. Man
26

Classic games (2/2)
Go: Man vs. Self
Robert ˇS´amal (White) versus Karel Kr´al (Black), Spring School of Combinatorics 2016 27

Rules of Go
Black versus White. Black starts the game.
28

Rules of Go
the rule of liberty
28

Rules of Go
the rule of liberty
the “ko” rule
28

Rules of Go
the rule of liberty
the “ko” rule
Handicap for diﬀerence in ranks: Black can place 1 or more stones
in advance (compensation for White’s greater strength). 28

Scoring Rules: Area Scoring
https://en.wikipedia.org/wiki/Go_(game) 29

A player’s score is:
the number of stones that the player has on the board

plus the number of empty intersections surrounded by that
player’s stones

plus the number of empty intersections surrounded by that
player’s stones
plus komi(dashi) points for the White player
which is a compensation for the ﬁrst move advantage of the Black player

Ranks of Players
Kyu and Dan ranks

Ranks of Players
Kyu and Dan ranks
or alternatively, ELO ratings

Policy and Value Networks

Training the (Deep Convolutional) Neural Networks

SL Policy Networks (1/3)
13-layer deep convolutional neural network

goal: to predict expert human moves

task of classiﬁcation

trained from 30 millions positions from the KGS Go Server

stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)

∆σ ∝
∂ log pσ(a|s)
∂σ
Results:

∆σ ∝
∂ log pσ(a|s)
∂σ
Results:
44.4% accuracy (the state-of-the-art from other groups)

∆σ ∝
∂ log pσ(a|s)
∂σ
Results:
55.7% accuracy (raw board position + move history as input)

∆σ ∝
∂ log pσ(a|s)
∂σ
Results:
55.7% accuracy (raw board position + move history as input)
57.0% accuracy (all input features)

Small improvements in accuracy led to large improvements
in playing strength (see the next slide)

move probabilities taken directly from the SL policy network pσ (reported as a percentage if above 0.1%).

Rollout Policy
Rollout policy pπ(a|s) is faster but less accurate than SL
policy network.

Rollout Policy
policy network.
accuracy of 24.2%

Rollout Policy
policy network.
accuracy of 24.2%
It takes 2µs to select an action, compared to 3 ms in case
of SL policy network.

RL Policy Networks (1/2)
identical in structure to the SL policy network

goal: to win in the games of self-play

weights ρ initialized to the same values, ρ := σ

games of self-play

games of self-play
between the current RL policy network and a randomly
selected previous iteration

games of self-play
to prevent overﬁtting to the current policy

games of self-play
to prevent overﬁtting to the current policy
∆ρ ∝
∂ log pρ(at|st)
∂ρ
zt
at time step t, where reward function zt is +1 for winning and −1 for losing.

Results (by sampling each move at ∼ pρ(·|st)):

80% of win rate against the SL policy network

85% of win rate against the strongest open-source Go
program, Pachi (Baudiˇs and Gailly 2011)

The previous state-of-the-art, based only on SL of CNN:

The previous state-of-the-art, based only on SL of CNN:
11% of “win” rate against Pachi

Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution

Value Network (1/2)
goal: to estimate a value function
vp
(s) = E[zt|st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy pρ)

Value Network (1/2)
vp
(s) = E[zt|st = s, at...T ∼ p]
Speciﬁcally, vθ(s) ≈ vpρ (s) ≈ v∗(s).

Value Network (1/2)
vp
(s) = E[zt|st = s, at...T ∼ p]
task of regression

Value Network (1/2)
vp
(s) = E[zt|st = s, at...T ∼ p]
task of regression
stochastic gradient descent:
∆θ ∝
∂vθ(s)
∂θ
(z − vθ(s))
(to minimize the mean squared error (MSE) between the predicted vθ(s) and the true z)

Value Network (2/2)

Value Network (2/2)
Successive positions are strongly correlated.

Value Network (2/2)
Value network memorized the game outcomes, rather than
generalizing to new positions.

Value Network (2/2)
Solution: generate 30 million (new) positions, each sampled
from a seperate game

Value Network (2/2)
Solution: generate 30 million (new) positions, each sampled
from a seperate game
almost the accuracy of Monte Carlo rollouts (using pρ), but
15000 times less computation!

Selection of Moves by the Value Network
evaluation of all successors s of the root position s, using vθ(s)

Evaluation accuracy in various stages of a game
Move number is the number of moves that had been played in the given position.

Each position evaluated by:
forward pass of the value network vθ

Each position evaluated by:
forward pass of the value network vθ
100 rollouts, played out using the corresponding policy

ELO Ratings for Various Combinations of Networks

MCTS Algorithm
The next action is selected by lookahead search, using simulation:

MCTS Algorithm
1. selection phase

MCTS Algorithm
1. selection phase
2. expansion phase

MCTS Algorithm
1. selection phase
2. expansion phase
3. evaluation phase

MCTS Algorithm
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)

MCTS Algorithm
1. selection phase
2. expansion phase
3. evaluation phase
Each edge (s, a) keeps:
action value Q(s, a)

MCTS Algorithm
1. selection phase
2. expansion phase
3. evaluation phase
visit count N(s, a)

MCTS Algorithm
1. selection phase
2. expansion phase
3. evaluation phase
visit count N(s, a)
prior probability P(s, a) (from SL policy network pσ)

MCTS Algorithm
1. selection phase
2. expansion phase
3. evaluation phase
visit count N(s, a)
prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.

MCTS Algorithm: Selection

At each time step t, an action at is selected from state st
at = arg max
a
(Q(st , a) + u(st , a))

At each time step t, an action at is selected from state st
at = arg max
a
(Q(st , a) + u(st , a))
where bonus
u(st , a) ∝
P(s, a)
1 + N(s, a)

MCTS Algorithm: Expansion

A leaf position may be expanded (just once) by the SL policy network pσ.

A leaf position may be expanded (just once) by the SL policy network pσ.
The output probabilities are stored as priors P(s, a) := pσ(a|s).

MCTS: Evaluation

MCTS: Evaluation
evaluation from the value network vθ(s)

MCTS: Evaluation
evaluation by the outcome z using the fast rollout policy pπ until the end of game

MCTS: Evaluation
evaluation by the outcome z using the fast rollout policy pπ until the end of game
Using a mixing parameter λ, the ﬁnal leaf evaluation V (s) is
V (s) = (1 − λ)vθ(s) + λz

Tree Evaluation from Value Network
action values Q(s, a) for each tree-edge (s, a) from root position s (averaged over value network evaluations only)

Tree Evaluation from Rollouts
action values Q(s, a), averaged over rollout evaluations only

MCTS: Backup
At the end of simulation, each traversed edge is updated by accumulating:
the action values Q

MCTS: Backup
At the end of simulation, each traversed edge is updated by accumulating:
the action values Q
visit counts N

Once the search is complete, the algorithm
chooses the most visited move from the root
position.

Percentage of Simulations
percentage frequency with which actions were selected from the root during simulations

Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.

AlphaGo selected the move indicated by the red circle;

Fan Hui responded with the move indicated by the white square;

Fan Hui responded with the move indicated by the white square;
in his post-game commentary, he preferred the move (labelled 1) predicted by AlphaGo.

Scalability
asynchronous multi-threaded search

Scalability
simulations on CPUs

Scalability
simulations on CPUs
computation of neural networks on GPUs

Scalability
simulations on CPUs
AlphaGo:
40 search threads

Scalability
simulations on CPUs
AlphaGo:
40 search threads
40 CPUs

Scalability
simulations on CPUs
AlphaGo:
40 search threads
40 CPUs
8 GPUs

Scalability
simulations on CPUs
AlphaGo:
40 search threads
40 CPUs
8 GPUs
Distributed version of AlphaGo (on multiple machines):
40 search threads

Scalability
simulations on CPUs
AlphaGo:
40 search threads
40 CPUs
8 GPUs
40 search threads
1202 CPUs

Scalability
simulations on CPUs
AlphaGo:
40 search threads
40 CPUs
8 GPUs
40 search threads
1202 CPUs
176 GPUs

ELO Ratings for Various Combinations of Threads

Results: the strength of AlphaGo

Tournament with Other Go Programs

Fan Hui
https://en.wikipedia.org/wiki/Fan_Hui 60

Fan Hui
professional 2 dan

Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015

Fan Hui
professional 2 dan
European Professional Go Champion in 2016

Fan Hui
professional 2 dan
biological neural network:

Fan Hui
professional 2 dan
100 billion neurons

Fan Hui
professional 2 dan
100 billion neurons
100 up to 1,000 trillion neuronal connections

AlphaGo versus Fan Hui
AlphaGo won 5 - 0 in a formal match on October 2015.
61

AlphaGo versus Fan Hui
AlphaGo won 5 - 0 in a formal match on October 2015.
[AlphaGo] is very strong and stable, it seems
like a wall. ... I know AlphaGo is a computer,
but if no one told me, maybe I would think
the player was a little strange, but a very
strong player, a real person.
Fan Hui 61

Lee Sedol “The Strong Stone”
https://en.wikipedia.org/wiki/Lee_Sedol 62

professional 9 dan

professional 9 dan
the 2nd in international titles

professional 9 dan
the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history

professional 9 dan
Lee Sedol would win 97 out of 100 games against Fan Hui.

professional 9 dan
Lee Sedol would win 97 out of 100 games against Fan Hui.
biological neural network, comparable to Fan Hui’s (in number
of neurons and connections)

I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
conﬁdent that I can win, at least this time.
Lee Sedol
62

I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
conﬁdent that I can win, at least this time.
Lee Sedol
...even beating AlphaGo by 4-1 may allow
the Google DeepMind team to claim its de
facto victory and the defeat of him
[Lee Sedol], or even humankind.
interview in JTBC
Newsroom
62

AlphaGo versus Lee Sedol
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63

In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol.

AlphaGo won all but the 4th game; all games were won
by resignation.

by resignation.
The winner of the match was slated to win $1 million.

by resignation.
Since AlphaGo won, Google DeepMind stated that the prize will be
donated to charities, including UNICEF, and Go organisations.

by resignation.
Since AlphaGo won, Google DeepMind stated that the prize will be
donated to charities, including UNICEF, and Go organisations.
Lee received $170,000 ($150,000 for participating in all the ﬁve
games, and an additional $20,000 for each game won).

Diﬃculties of Go
challenging decision-making

Diﬃculties of Go
intractable search space

Diﬃculties of Go
intractable search space
complex optimal solution
It appears infeasible to directly approximate using a policy or value function!

AlphaGo: summary
Monte Carlo tree search

AlphaGo: summary
eﬀective move selection and position evaluation

AlphaGo: summary
through deep convolutional neural networks

AlphaGo: summary
trained by novel combination of supervised and reinforcement
learning

AlphaGo: summary
learning
new search algorithm combining

AlphaGo: summary
learning
neural network evaluation

AlphaGo: summary
learning
Monte Carlo rollouts

AlphaGo: summary
learning
scalable implementation

AlphaGo: summary
learning
multi-threaded simulations on CPUs

AlphaGo: summary
learning
parallel GPU computations

AlphaGo: summary
learning
parallel GPU computations
distributed version over multiple machines

Novel approach

Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.

Novel approach
It compensated this by:
selecting those positions more intelligently (policy network)

Novel approach
evaluating them more precisely (value network)

Novel approach
Deep Blue relied on a handcrafted evaluation function.

Novel approach
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.

Novel approach
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
This approach is not speciﬁc to the game of Go. The algorithm
can be used for much wider class of (so far seemingly)
intractable problems in AI!

Input features for rollout and tree policy
Silver et al. 2016

Results of a tournament between diﬀerent Go programs
Silver et al. 2016

Results of a tournament between AlphaGo and distributed Al-
phaGo, testing scalability with hardware
Silver et al. 2016

AlphaGo versus Fan Hui: Game 1
Silver et al. 2016

Silver et al. 2016

AlphaGo versus Lee Sedol: Game 1
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol

AlphaGo versus Lee Sedol: Game 2 (1/2)

Further Reading I
AlphaGo:
Google Research Blog
http://googleresearch.blogspot.cz/2016/01/alphago-mastering-ancient-game-of-go.html
an article in Nature
http://www.nature.com/news/google-ai-algorithm-masters-ancient-game-of-go-1.19234
a reddit article claiming that AlphaGo is even stronger than it appears to be:
“AlphaGo would rather win by less points, but with higher probability.”
https://www.reddit.com/r/baduk/comments/49y17z/the_true_strength_of_alphago/
Articles by Google DeepMind:
Atari player: a DeepRL system which combines Deep Neural Networks with Reinforcement Learning (Mnih
et al. 2015)
Neural Turing Machines (Graves, Wayne, and Danihelka 2014)
Artificial Intelligence:
Artificial Intelligence course at MIT
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/
6-034-artificial-intelligence-fall-2010/index.htm
Introduction to Artificial Intelligence at Udacity
https://www.udacity.com/course/intro-to-artificial-intelligence--cs271

Further Reading II
General Game Playing course https://www.coursera.org/course/ggp
Singularity http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html + Part 2
The Singularity Is Near (Kurzweil 2005)
Combinatorial Game Theory (founded by John H. Conway to study endgames in Go):
Combinatorial Game Theory course https://www.coursera.org/learn/combinatorial-game-theory
On Numbers and Games (Conway 1976)
Machine Learning:
Machine Learning course
https://youtu.be/hPKJBXkyTK://www.coursera.org/learn/machine-learning/
Reinforcement Learning http://reinforcementlearning.ai-depot.com/
Deep Learning (LeCun, Bengio, and Hinton 2015)
Deep Learning course https://www.udacity.com/course/deep-learning--ud730
Two Minute Papers https://www.youtube.com/user/keeroyz
Applications of Deep Learning https://youtu.be/hPKJBXkyTKM
Neuroscience:
http://www.brainfacts.org/

References I
Allis, Louis Victor et al. (1994). Searching for solutions in games and artificial intelligence. Ponsen & Looijen.
Baudiˇs, Petr and Jean-loup Gailly (2011). “Pachi: State of the art open source Go program”. In: Advances in
Computer Games. Springer, pp. 24–38.
Bowling, Michael et al. (2015). “Heads-up limit holdem poker is solved”. In: Science 347.6218, pp. 145–149. url:
http://poker.cs.ualberta.ca/15science.html.
Conway, John Horton (1976). “On Numbers and Games”. In: London Mathematical Society Monographs 6.
Corrado, Greg (2015). Computer, respond to this email. url:
http://googleresearch.blogspot.cz/2015/11/computer-respond-to-this-email.html#1 (visited on
03/31/2016).
Dieterle, Frank Jochen (2003). “Multianalyte quantifications by means of integration of artificial neural networks,
genetic algorithms and chemometrics for time-resolved analytical data”. PhD thesis. Universität Tübingen.
Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge (2015). “A Neural Algorithm of Artistic Style”. In:
CoRR abs/1508.06576. url: http://arxiv.org/abs/1508.06576.
Graves, Alex, Greg Wayne, and Ivo Danihelka (2014). “Neural turing machines”. In: arXiv preprint
arXiv:1410.5401.
Hayes, Bradley (2016). url: https://twitter.com/deepdrumpf.

References II
Karpathy, Andrej (2015). The Unreasonable Eﬀectiveness of Recurrent Neural Networks. url:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/ (visited on 04/01/2016).
Kurzweil, Ray (2005). The singularity is near: When humans transcend biology. Penguin.
LeCun, Yann, Yoshua Bengio, and Geoﬀrey Hinton (2015). “Deep learning”. In: Nature 521.7553, pp. 436–444.
Li, Chuan and Michael Wand (2016). “Combining Markov Random Fields and Convolutional Neural Networks for
Image Synthesis”. In: CoRR abs/1601.04589. url: http://arxiv.org/abs/1601.04589.
Mnih, Volodymyr et al. (2015). “Human-level control through deep reinforcement learning”. In: Nature 518.7540,
pp. 529–533. url:
https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf.
Munroe, Randall. Game AIs. url: https://xkcd.com/1002/ (visited on 04/02/2016).
Silver, David et al. (2016). “Mastering the game of Go with deep neural networks and tree search”. In: Nature
529.7587, pp. 484–489.
Sun, Felix. DeepHear - Composing and harmonizing music with neural networks. url:
http://web.mit.edu/felixsun/www/neural-music.html (visited on 04/02/2016).

Mastering the game of Go with deep neural networks and tree search: Presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Viewers also liked

Viewers also liked (20)

Similar to Mastering the game of Go with deep neural networks and tree search: Presentation

Similar to Mastering the game of Go with deep neural networks and tree search: Presentation (20)

More from Karel Ha

More from Karel Ha (7)

Recently uploaded

Recently uploaded (20)

Mastering the game of Go with deep neural networks and tree search: Presentation