SlideShare a Scribd company logo
1 of 39
Download to read offline
HOW DEEPMIND
MASTERED THE
GAME OF GO
TIM RISER - PWL BOSTON/CAMBRIDGE
Photo: Nature7/28/2016
PRELUDE
Video: DeepMind
HELLO~
My name is Tim Riser
▸ Researcher @ Berkman Klein Center for Internet & Society
▸ Applied Computational Mathematics & Economics @ BYU
▸ @riserup
1. CONTEXT
NO PAPER WE
LOVE IS AN
ISLAND
Shannon’s
“Programming a Computer
for Playing Chess”
Game tree complexity
First attempts at a Go AI
Enter AlphaGo
“
The computer should make more use of brutal
calculation than humans, but a little selection
goes a long way forward improving blind trial
and error…
It appears that to improve the speed and
strength of play the machine must … select the
variations to be explored by some process so
that the machine does not waste its time in
totally pointless variations.
Claude Shannon
Programming a Computer to Play Chess
“
SHANNON & COMPUTER CHESS
Photo: Computer History Museum
“
SHANNON & COMPUTER CHESS
Claude Shannon - the father of information
theory -thought that creating a computer chess
program would lead to practical results.
He envisioned that the methods devised in
solving chess would be applied to areas such as
circuitry design, call routing, proving math
theorem, language translation, military
planning, and even creative works like music
composition.
HOW DO YOU SOLVE
A GAME LIKE CHESS OR GO?
▸ Chess has ~35 legal moves per position
and a game length of ~80 moves
▸ Go has ~250 legal moves per position
and a game length of ~150 moves
▸ Maybe brute force is out, then
250150
> 3580
≅ 3.3x10123
>
# of atoms in the
(known) universe
1082
GAME TREE COMPLEXITY
▸ Let b is the approximate breadth
(# of legal moves/position)
and d is the approximate depth
(game length)
▸ Then bd
is a reasonable lower bound on
the complexity of the search space
▸ What can we do to make this problem
computationally tractable?
GAME TREE COMPLEXITY
1. Remove unneeded branches from the
search tree to reduce search breadth (b)
2. Truncate branches to reduce search
depth (d) and then use a function to
approximate the value of the branches
underneath the cut-off point
BREADTH REDUCTION
Photo: UCLA Stat232B
Remove highlighted
branches from
search candidates
in advance
BREADTH REDUCTION METHODS
1. Train with experts to learn which
branches are uncommon or improbable
2. Randomly sample sequences from the
search tree to avoid exhaustive search
DEPTH REDUCTION
If we have an function v(s) that
can evaluate a state s, we can
truncate a branch and replace
its subtree with a value. This
lets us avoid searching it to the
maximum depth
Photo: UCLA Stat232B
DEPTH REDUCTION METHOD
1. Find a function to approximate the
value of a state, and truncate the search
branches beneath that state.
To choose an action, the program can
then compare that assigned value with
the values of other actions.
SURVEY OF AI GAMING MILESTONES
1914 Torres y Quevedo built a device that played a limited chess endgame
1950 Claude Shannon writes “Programming a Computer to Play Chess”
1952 Alan Turing writes a chess program (with no computer to run it!)
1968 Alfred Zobrist’s Go program beats total beginners [minimax tree search]
1979 Bruce Wilson’s Go program beats low-level amateurs [knowledge systems]
1980 Moor plays the world Othello champion (1-5)
1994 Chinook plays the world checkers champion (2-4-33)
1997 Deep Blue plays Gary Kasparov at chess (2-1-3)
2006 “The year of the Monte Carlo revolution in Go” - Coulum discovers MCTS
2011 IBM Watson plays Ken Jennings at Jeopardy ($77,147 - $24,000)
2013 Crazy Stone beats a handicapped Go pro player [Monte Carlo tree search]
2015 AlphaGo beats the European Go champion Fan Hui (5-0) [neural networks]
2016 AlphaGo beats the Go world champion Lee Sedol (4-1)
A FEW EXPERT PREDICTIONS
2007
Deep Blue’s chief engineer
Feng-Hsiung Hsu
Monte Carlo
techniques “won’t play a
significant role in creating a
machine that can top the best
human players” in Go.
It will take “10 years” for a
machine to win against a pro
Go player without a handicap...
“I don’t like making
predictions.”
2014
MCTS Researcher
Rémi Coulom
ENTER ALPHAGO
The hunt
Google DeepMind,
enjoying the
success of their
Atari-playing deep Q
network (DQN), was
looking for the next
milestone to tackle.
The opening
In 2006, computer
Go was reenergized
by the discovery of a
new search method:
Monte Carlo search
trees. However,
move evaluation
and selection
policies were still
shallow and/or
linear functions.
The gamble
DeepMind saw the
chance to build a
hybrid system to
beat Go, combining
MCST with their
state-of-the-art
neural networks,
which have excelled
at complex pattern
recognition.
Photo: Google DeepMind
LEE
SEDOL
IN GAME 5
AGAINST
ALPHAGO
1.
2. METHOD
THE ALPHAGO
APPROACH
Overview
Training Pipeline
Visualization
Policy & Value Networks
Searching with MCTS
OVERVIEW
Program reads in board state s
AlphaGo selects an action using a novel MCST implementation that is
integrated with neural networks, which evaluate and choose actions.
MCST simulates n search tree sequences, selecting an action at each
timestep based on action value (initially zero) and prior probability
(output of policy network) and an exploration parameter.
The action value of a leaf node is a weighted sum of output of the value
network and the output (win/loss) of a direct terminal rollout from that node
At the end of the n simulations, action values and visit counts for each
traversed leaf node are updated
AlphaGo then backpropogates - or returns and reports - the mean value of
all subtree evaluations up to each adjacent, legal action. Then, it chooses the
action with the highest action value (or the leaf node most visited).
OVERVIEW
Photo: Wikimedia
OVERVIEW
Photo: Wikimedia
AlphaGo is essentially a search algorithm - specifically, a modified implementation of
Monte Carlo tree search with DeepMind’s proprietary neural networks plugged in. The
basic idea of MCTS is to expand the search tree through random sampling of the search
space.
Backpropogation (the reporting stage of the algorithm) ensures that good paths are
weighted more heavily in each additional simulation. Thus, the tree grows
asymmetrically as the method concentrates on the more promising subtrees.
Here comes the main difficulty: 250150
sequences is still a lot of moves to explore, so the
effect of MCTS, though considerable, is ultimately limited. That is, unless we have some
other method for sorting out which branches are worth exploring… like DeepMind’s
neural networks.
AN ASIDE ON NEURAL NETWORKS
Photo: Wikimedia
Chief among ongoing mysteries is why AlphaGo’s neural networks continue to
outperform other hybrids of MCTS and neural nets.
The realistic answer is there is much more to their implementation of neural nets than is
outlined in their paper.
What follows does not remotely approximate pseudocode, and is merely the skeleton of
AlphaGo’s structure and training pipeline.
TRAINING PIPELINE VISUALIZATION
Photo: Nature
POLICY SUPERVISED LEARNING (SL)
Photo: Nature
Training data
30 million positions
from the KGS Go server.
Methodology
The SL policy network (13 layers)
and rollout policy network (fewer
layers) are trained on state-action
pairs randomly sampled from the
data.
Outcome
The networks are learning to
recognize which possible action a
has the greatest likelihood of
being selected in state s.
POLICY NETWORK TRANSITION
Photo: Nature
Parallel structures
The weights of the reinforcement
learning (RL) policy network is
initialized to the current weights
as the SL network
POLICY REINFORCEMENT LEARNING
Photo: Nature
Training the RL policy network
The current policy network plays
games with a random past
iteration of the network - not with
its immediate predecessor, which
would lead to overfitting.
Methodology
The weights in the network are
updated using policy gradient
learning to appropriately reward
winning and losing policies.
Outcome
The policy network is now
optimized for winning games, not
just accurately predicting the next
move.
VALUE REINFORCEMENT LEARNING
Photo: Nature
Training the RL value network
Uses 30 million distinct positions
sampled from separate games
between policy networks.
Methodology
The weights in the network trained
using regression on
state-outcome pairs, minimizing
the error between the predicted
and actual result.
Outcome
The value network outputs a
probability of a game’s expected
outcome given any state of the
board.
VALUE REINFORCEMENT LEARNING
Photo: Nature
Note: the network is trained on
state-outcome pairs, not on the
complete game.
DeepMind says: “The problem is
that successive positions are
strongly correlated, differing by
just one stone, but the regression
target is shared for the entire
game.”
Q: What would be the output of the
value network if we could perfectly
search every node in the tree?
A: We would know the outcome
with 100% confidence, so it would
output 1, 0, or -1.
SEARCHING
WITH POLICY & VALUE NETWORKS
Photo: NaturePhoto: Nature
Actual Gameplay
Let’s say AlphaGo begins a game of Go. By this time, it has
already “asynchronously” trained its neural networks
(learned probability weights for each board position,
hopefully in a smart way) long before the game begins.
At each step of the game, AlphaGo reads in a board position.
Since there are so many board positions, it needs a fast way
to look them up.
AlphaGo uses a modified Monte Carlo tree search to run n
simulations (however many it can run in the time allotted
per turn), evaluates what it learned, and then chooses its
next move.
SEARCHING
WITH POLICY & VALUE NETWORKS
Photo: NaturePhoto: Nature
SEARCHING
WITH POLICY & VALUE NETWORKS
Photo: NaturePhoto: Nature
At each step in a simulation, AlphaGo
selects and expands into the next node
based on three things:
1) The action value of the node
(initialized at zero in each simulation)
2) The prior probability associated with
the node (learned from expert games
and self-play)
3) An exploration parameter (to favor
nodes not yet visited)
SEARCHING
WITH POLICY & VALUE NETWORKS
Photo: NaturePhoto: Nature
After expanding into a node,
AlphaGo evaluates it using two
methods :
1) The value network processes
the node and returns a value.
2) The fast policy network runs a
rollout to the end of the tree,
and returns the outcome: win,
loss, or draw.
SEARCHING
WITH POLICY & VALUE NETWORKS
Photo: NaturePhoto: Nature
The algorithm then takes a
weighted sum of those two values,
and propagates the new
information back up the tree.
That updating changes the action
values to reflect what this
simulation learned, allowing the
next simulation to incorporate
that information into its decision
process.
SEARCHING
WITH POLICY & VALUE NETWORKS
When n simulations have been completed (i.e. the
allotted time per turn is almost expired), AlphaGo
updates the action values and visit counts of all
traversed edges so that “each edge accumulates the visit
count and mean evaluation of all simulations passing
through that edge.”
At that point, the algorithm will have visit counts and
action values for each legal action, and will choose the
“principal variation”, the move from the current position
that was most frequently visited in the simulations.
SEARCHING
WITH POLICY & VALUE NETWORKS
Photo: Nature
At this point,
all of our
simulations
combine to
single out the
principal
variation,
the move
that was most
frequently
visited over the
n simulations.
COMPETITION WIN RATES
AlphaGo (single machine) vs. other Go programs
494/495
AlphaGo (distributed version) vs. other Go programs
-/-
AlphaGo (distributed version) vs. AlphaGo (single machine)
-/-
AlphaGo (distributed version) vs. Fan Hui (European Go champion)
5/5
AlphaGo (distributed version) vs. Lee Sedol (World Go champion)
4/5
98.0%
100.0%
77.0%
100.0%
80.0%
The chief weakness is that the machine will not
learn by mistakes. The only way to improve its
play is by improving the program.
Some thought has been given to designing a
program which is self-improving but, although
it appears to be possible, the methods thought
of so far do not seem to be very practical.
Claude Shannon
Programming a Computer to Play Chess
THANK
YOU
Resources
Programming a Computer to Play Chess
Mastering the Game of Go with Deep Neural
Networks and Tree Search
The Mystery Of Go, The Ancient Game That
Computers Still Can’t Win
UCLA Stats 232B Course Notes
Deep Learning for Artificial General Intelligence

More Related Content

What's hot

[DL輪読会]Details or Artifacts: A Locally Discriminative Learning Approach to Re...
[DL輪読会]Details or Artifacts: A Locally Discriminative Learning Approach to Re...[DL輪読会]Details or Artifacts: A Locally Discriminative Learning Approach to Re...
[DL輪読会]Details or Artifacts: A Locally Discriminative Learning Approach to Re...Deep Learning JP
 
深層学習と確率プログラミングを融合したEdwardについて
深層学習と確率プログラミングを融合したEdwardについて深層学習と確率プログラミングを融合したEdwardについて
深層学習と確率プログラミングを融合したEdwardについてryosuke-kojima
 
Genetic Algorithm by Example
Genetic Algorithm by ExampleGenetic Algorithm by Example
Genetic Algorithm by ExampleNobal Niraula
 
[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of FunctionsJaeJun Yoo
 
言語表現モデルBERTで文章生成してみた
言語表現モデルBERTで文章生成してみた言語表現モデルBERTで文章生成してみた
言語表現モデルBERTで文章生成してみたTakuya Koumura
 
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...KIT Cognitive Interaction Design
 
モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019Yusuke Uchida
 
【DL輪読会】The Forward-Forward Algorithm: Some Preliminary
【DL輪読会】The Forward-Forward Algorithm: Some Preliminary【DL輪読会】The Forward-Forward Algorithm: Some Preliminary
【DL輪読会】The Forward-Forward Algorithm: Some PreliminaryDeep Learning JP
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language modelJiWenKim
 
15. Transformerを用いた言語処理技術の発展.pdf
15. Transformerを用いた言語処理技術の発展.pdf15. Transformerを用いた言語処理技術の発展.pdf
15. Transformerを用いた言語処理技術の発展.pdf幸太朗 岩澤
 
AlphaGo in Depth
AlphaGo in Depth AlphaGo in Depth
AlphaGo in Depth Mark Chang
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - IntroductionJungwon Kim
 
差分プライバシーとは何か? (定義 & 解釈編)
差分プライバシーとは何か? (定義 & 解釈編)差分プライバシーとは何か? (定義 & 解釈編)
差分プライバシーとは何か? (定義 & 解釈編)Kentaro Minami
 
Openfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part IOpenfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part IMaho Nakata
 
Capsule Graph Neural Network
Capsule Graph Neural NetworkCapsule Graph Neural Network
Capsule Graph Neural Networkharmonylab
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangVivian S. Zhang
 
[論文紹介] LSTM (LONG SHORT-TERM MEMORY)
[論文紹介] LSTM (LONG SHORT-TERM MEMORY)[論文紹介] LSTM (LONG SHORT-TERM MEMORY)
[論文紹介] LSTM (LONG SHORT-TERM MEMORY)Tomoyuki Hioki
 

What's hot (20)

[DL輪読会]Details or Artifacts: A Locally Discriminative Learning Approach to Re...
[DL輪読会]Details or Artifacts: A Locally Discriminative Learning Approach to Re...[DL輪読会]Details or Artifacts: A Locally Discriminative Learning Approach to Re...
[DL輪読会]Details or Artifacts: A Locally Discriminative Learning Approach to Re...
 
深層学習と確率プログラミングを融合したEdwardについて
深層学習と確率プログラミングを融合したEdwardについて深層学習と確率プログラミングを融合したEdwardについて
深層学習と確率プログラミングを融合したEdwardについて
 
Genetic Algorithm by Example
Genetic Algorithm by ExampleGenetic Algorithm by Example
Genetic Algorithm by Example
 
[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions
 
言語表現モデルBERTで文章生成してみた
言語表現モデルBERTで文章生成してみた言語表現モデルBERTで文章生成してみた
言語表現モデルBERTで文章生成してみた
 
Quantum neural network
Quantum neural networkQuantum neural network
Quantum neural network
 
How AlphaGo Works
How AlphaGo WorksHow AlphaGo Works
How AlphaGo Works
 
第7回WBAシンポジウム:基調講演
第7回WBAシンポジウム:基調講演第7回WBAシンポジウム:基調講演
第7回WBAシンポジウム:基調講演
 
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
 
モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019モデルアーキテクチャ観点からの高速化2019
モデルアーキテクチャ観点からの高速化2019
 
【DL輪読会】The Forward-Forward Algorithm: Some Preliminary
【DL輪読会】The Forward-Forward Algorithm: Some Preliminary【DL輪読会】The Forward-Forward Algorithm: Some Preliminary
【DL輪読会】The Forward-Forward Algorithm: Some Preliminary
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
15. Transformerを用いた言語処理技術の発展.pdf
15. Transformerを用いた言語処理技術の発展.pdf15. Transformerを用いた言語処理技術の発展.pdf
15. Transformerを用いた言語処理技術の発展.pdf
 
AlphaGo in Depth
AlphaGo in Depth AlphaGo in Depth
AlphaGo in Depth
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
差分プライバシーとは何か? (定義 & 解釈編)
差分プライバシーとは何か? (定義 & 解釈編)差分プライバシーとは何か? (定義 & 解釈編)
差分プライバシーとは何か? (定義 & 解釈編)
 
Openfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part IOpenfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part I
 
Capsule Graph Neural Network
Capsule Graph Neural NetworkCapsule Graph Neural Network
Capsule Graph Neural Network
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang
 
[論文紹介] LSTM (LONG SHORT-TERM MEMORY)
[論文紹介] LSTM (LONG SHORT-TERM MEMORY)[論文紹介] LSTM (LONG SHORT-TERM MEMORY)
[論文紹介] LSTM (LONG SHORT-TERM MEMORY)
 

Viewers also liked

AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationKarel Ha
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
 
Iaap 20_práctica - 2017 cómo crear un mapa en google maps
 Iaap 20_práctica - 2017 cómo crear un mapa en google maps Iaap 20_práctica - 2017 cómo crear un mapa en google maps
Iaap 20_práctica - 2017 cómo crear un mapa en google mapsAlberto Aranda
 
從AlphaGo的設計淺談資安領域的異常分析流程
從AlphaGo的設計淺談資安領域的異常分析流程從AlphaGo的設計淺談資安領域的異常分析流程
從AlphaGo的設計淺談資安領域的異常分析流程Jhang Raymond
 
Réseaux neuronaux profonds & intelligence artificielle
Réseaux neuronaux profonds & intelligence artificielleRéseaux neuronaux profonds & intelligence artificielle
Réseaux neuronaux profonds & intelligence artificielleOlivier Teytaud
 
兩岸三地50種柳鷹類辨識
兩岸三地50種柳鷹類辨識兩岸三地50種柳鷹類辨識
兩岸三地50種柳鷹類辨識Tai selina
 
EMIGO BROCHURE
EMIGO BROCHUREEMIGO BROCHURE
EMIGO BROCHURE효재 전
 
20140901【廚房】室內裝修-最愛霧光白鄉村風廚具設計
20140901【廚房】室內裝修-最愛霧光白鄉村風廚具設計20140901【廚房】室內裝修-最愛霧光白鄉村風廚具設計
20140901【廚房】室內裝修-最愛霧光白鄉村風廚具設計Lee Leo
 
20150522【談設計】壁紙+窗簾的完美配搭-美式簡約居家風
20150522【談設計】壁紙+窗簾的完美配搭-美式簡約居家風20150522【談設計】壁紙+窗簾的完美配搭-美式簡約居家風
20150522【談設計】壁紙+窗簾的完美配搭-美式簡約居家風Lee Leo
 
白茶White Tea
白茶White Tea白茶White Tea
白茶White Teamikejiang
 
【住宅設計】淡水李公館 美式休閒風格
【住宅設計】淡水李公館   美式休閒風格【住宅設計】淡水李公館   美式休閒風格
【住宅設計】淡水李公館 美式休閒風格Lee Leo
 
Geek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine LearningGeek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine LearningGeekNightHyderabad
 
Science year 3 maixing substances
Science year 3   maixing substancesScience year 3   maixing substances
Science year 3 maixing substancesallyong1117
 
Lake Crest について調べてみた
Lake Crest について調べてみたLake Crest について調べてみた
Lake Crest について調べてみたYutaka Yasuda
 
禪與茶
禪與茶禪與茶
禪與茶沫 林
 
White Horse Stream Ecology Park (蘇州白馬澗生態公園)
White Horse Stream Ecology Park (蘇州白馬澗生態公園)White Horse Stream Ecology Park (蘇州白馬澗生態公園)
White Horse Stream Ecology Park (蘇州白馬澗生態公園)Chung Yen Chang
 

Viewers also liked (20)

Alphago Tech talk
Alphago Tech talkAlphago Tech talk
Alphago Tech talk
 
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: Presentation
 
AlphaGoのしくみ
AlphaGoのしくみAlphaGoのしくみ
AlphaGoのしくみ
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
Iaap 20_práctica - 2017 cómo crear un mapa en google maps
 Iaap 20_práctica - 2017 cómo crear un mapa en google maps Iaap 20_práctica - 2017 cómo crear un mapa en google maps
Iaap 20_práctica - 2017 cómo crear un mapa en google maps
 
從AlphaGo的設計淺談資安領域的異常分析流程
從AlphaGo的設計淺談資安領域的異常分析流程從AlphaGo的設計淺談資安領域的異常分析流程
從AlphaGo的設計淺談資安領域的異常分析流程
 
Réseaux neuronaux profonds & intelligence artificielle
Réseaux neuronaux profonds & intelligence artificielleRéseaux neuronaux profonds & intelligence artificielle
Réseaux neuronaux profonds & intelligence artificielle
 
兩岸三地50種柳鷹類辨識
兩岸三地50種柳鷹類辨識兩岸三地50種柳鷹類辨識
兩岸三地50種柳鷹類辨識
 
EMIGO BROCHURE
EMIGO BROCHUREEMIGO BROCHURE
EMIGO BROCHURE
 
20140901【廚房】室內裝修-最愛霧光白鄉村風廚具設計
20140901【廚房】室內裝修-最愛霧光白鄉村風廚具設計20140901【廚房】室內裝修-最愛霧光白鄉村風廚具設計
20140901【廚房】室內裝修-最愛霧光白鄉村風廚具設計
 
20150522【談設計】壁紙+窗簾的完美配搭-美式簡約居家風
20150522【談設計】壁紙+窗簾的完美配搭-美式簡約居家風20150522【談設計】壁紙+窗簾的完美配搭-美式簡約居家風
20150522【談設計】壁紙+窗簾的完美配搭-美式簡約居家風
 
白茶White Tea
白茶White Tea白茶White Tea
白茶White Tea
 
【住宅設計】淡水李公館 美式休閒風格
【住宅設計】淡水李公館   美式休閒風格【住宅設計】淡水李公館   美式休閒風格
【住宅設計】淡水李公館 美式休閒風格
 
Geek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine LearningGeek Night 17.0 - Artificial Intelligence and Machine Learning
Geek Night 17.0 - Artificial Intelligence and Machine Learning
 
Science year 3 maixing substances
Science year 3   maixing substancesScience year 3   maixing substances
Science year 3 maixing substances
 
Lake Crest について調べてみた
Lake Crest について調べてみたLake Crest について調べてみた
Lake Crest について調べてみた
 
Alpha gor 2
Alpha gor 2Alpha gor 2
Alpha gor 2
 
禪與茶
禪與茶禪與茶
禪與茶
 
White Horse Stream Ecology Park (蘇州白馬澗生態公園)
White Horse Stream Ecology Park (蘇州白馬澗生態公園)White Horse Stream Ecology Park (蘇州白馬澗生態公園)
White Horse Stream Ecology Park (蘇州白馬澗生態公園)
 

Similar to How DeepMind Mastered The Game Of Go

Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우영우 김
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingRichard Abbuhl
 
La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationAlexandre Monnin
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introductionSungminYou
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...AdityaSuryavamshi
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesOlivier Teytaud
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?Tobias Pfeiffer
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?Tobias Pfeiffer
 
Mastering the game of go with deep neural networks and tree search
Mastering the game of go with deep neural networks and tree searchMastering the game of go with deep neural networks and tree search
Mastering the game of go with deep neural networks and tree searchSanFengChang
 
Getting Started with Machine Learning
Getting Started with Machine LearningGetting Started with Machine Learning
Getting Started with Machine LearningHumberto Marchezi
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago ZeroChia-Ching Lin
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
 
Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysisDave Selinger
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science ChallengeMark Nichols, P.E.
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
 

Similar to How DeepMind Mastered The Game Of Go (20)

Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game Playing
 
La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunication
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introduction
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
 
Google Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research PaperGoogle Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research Paper
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
Mastering the game of go with deep neural networks and tree search
Mastering the game of go with deep neural networks and tree searchMastering the game of go with deep neural networks and tree search
Mastering the game of go with deep neural networks and tree search
 
Getting Started with Machine Learning
Getting Started with Machine LearningGetting Started with Machine Learning
Getting Started with Machine Learning
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago Zero
 
Ai in games
Ai in gamesAi in games
Ai in games
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysis
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
 

Recently uploaded

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 

Recently uploaded (20)

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 

How DeepMind Mastered The Game Of Go

  • 1. HOW DEEPMIND MASTERED THE GAME OF GO TIM RISER - PWL BOSTON/CAMBRIDGE Photo: Nature7/28/2016
  • 3. HELLO~ My name is Tim Riser ▸ Researcher @ Berkman Klein Center for Internet & Society ▸ Applied Computational Mathematics & Economics @ BYU ▸ @riserup
  • 4. 1. CONTEXT NO PAPER WE LOVE IS AN ISLAND Shannon’s “Programming a Computer for Playing Chess” Game tree complexity First attempts at a Go AI Enter AlphaGo
  • 5. “ The computer should make more use of brutal calculation than humans, but a little selection goes a long way forward improving blind trial and error… It appears that to improve the speed and strength of play the machine must … select the variations to be explored by some process so that the machine does not waste its time in totally pointless variations. Claude Shannon Programming a Computer to Play Chess
  • 6. “ SHANNON & COMPUTER CHESS Photo: Computer History Museum
  • 7. “ SHANNON & COMPUTER CHESS Claude Shannon - the father of information theory -thought that creating a computer chess program would lead to practical results. He envisioned that the methods devised in solving chess would be applied to areas such as circuitry design, call routing, proving math theorem, language translation, military planning, and even creative works like music composition.
  • 8. HOW DO YOU SOLVE A GAME LIKE CHESS OR GO? ▸ Chess has ~35 legal moves per position and a game length of ~80 moves ▸ Go has ~250 legal moves per position and a game length of ~150 moves ▸ Maybe brute force is out, then 250150 > 3580 ≅ 3.3x10123 > # of atoms in the (known) universe 1082
  • 9. GAME TREE COMPLEXITY ▸ Let b is the approximate breadth (# of legal moves/position) and d is the approximate depth (game length) ▸ Then bd is a reasonable lower bound on the complexity of the search space ▸ What can we do to make this problem computationally tractable?
  • 10. GAME TREE COMPLEXITY 1. Remove unneeded branches from the search tree to reduce search breadth (b) 2. Truncate branches to reduce search depth (d) and then use a function to approximate the value of the branches underneath the cut-off point
  • 11. BREADTH REDUCTION Photo: UCLA Stat232B Remove highlighted branches from search candidates in advance
  • 12. BREADTH REDUCTION METHODS 1. Train with experts to learn which branches are uncommon or improbable 2. Randomly sample sequences from the search tree to avoid exhaustive search
  • 13. DEPTH REDUCTION If we have an function v(s) that can evaluate a state s, we can truncate a branch and replace its subtree with a value. This lets us avoid searching it to the maximum depth Photo: UCLA Stat232B
  • 14. DEPTH REDUCTION METHOD 1. Find a function to approximate the value of a state, and truncate the search branches beneath that state. To choose an action, the program can then compare that assigned value with the values of other actions.
  • 15. SURVEY OF AI GAMING MILESTONES 1914 Torres y Quevedo built a device that played a limited chess endgame 1950 Claude Shannon writes “Programming a Computer to Play Chess” 1952 Alan Turing writes a chess program (with no computer to run it!) 1968 Alfred Zobrist’s Go program beats total beginners [minimax tree search] 1979 Bruce Wilson’s Go program beats low-level amateurs [knowledge systems] 1980 Moor plays the world Othello champion (1-5) 1994 Chinook plays the world checkers champion (2-4-33) 1997 Deep Blue plays Gary Kasparov at chess (2-1-3) 2006 “The year of the Monte Carlo revolution in Go” - Coulum discovers MCTS 2011 IBM Watson plays Ken Jennings at Jeopardy ($77,147 - $24,000) 2013 Crazy Stone beats a handicapped Go pro player [Monte Carlo tree search] 2015 AlphaGo beats the European Go champion Fan Hui (5-0) [neural networks] 2016 AlphaGo beats the Go world champion Lee Sedol (4-1)
  • 16. A FEW EXPERT PREDICTIONS 2007 Deep Blue’s chief engineer Feng-Hsiung Hsu Monte Carlo techniques “won’t play a significant role in creating a machine that can top the best human players” in Go. It will take “10 years” for a machine to win against a pro Go player without a handicap... “I don’t like making predictions.” 2014 MCTS Researcher Rémi Coulom
  • 17. ENTER ALPHAGO The hunt Google DeepMind, enjoying the success of their Atari-playing deep Q network (DQN), was looking for the next milestone to tackle. The opening In 2006, computer Go was reenergized by the discovery of a new search method: Monte Carlo search trees. However, move evaluation and selection policies were still shallow and/or linear functions. The gamble DeepMind saw the chance to build a hybrid system to beat Go, combining MCST with their state-of-the-art neural networks, which have excelled at complex pattern recognition. Photo: Google DeepMind
  • 19. 1. 2. METHOD THE ALPHAGO APPROACH Overview Training Pipeline Visualization Policy & Value Networks Searching with MCTS
  • 20. OVERVIEW Program reads in board state s AlphaGo selects an action using a novel MCST implementation that is integrated with neural networks, which evaluate and choose actions. MCST simulates n search tree sequences, selecting an action at each timestep based on action value (initially zero) and prior probability (output of policy network) and an exploration parameter. The action value of a leaf node is a weighted sum of output of the value network and the output (win/loss) of a direct terminal rollout from that node At the end of the n simulations, action values and visit counts for each traversed leaf node are updated AlphaGo then backpropogates - or returns and reports - the mean value of all subtree evaluations up to each adjacent, legal action. Then, it chooses the action with the highest action value (or the leaf node most visited).
  • 22. OVERVIEW Photo: Wikimedia AlphaGo is essentially a search algorithm - specifically, a modified implementation of Monte Carlo tree search with DeepMind’s proprietary neural networks plugged in. The basic idea of MCTS is to expand the search tree through random sampling of the search space. Backpropogation (the reporting stage of the algorithm) ensures that good paths are weighted more heavily in each additional simulation. Thus, the tree grows asymmetrically as the method concentrates on the more promising subtrees. Here comes the main difficulty: 250150 sequences is still a lot of moves to explore, so the effect of MCTS, though considerable, is ultimately limited. That is, unless we have some other method for sorting out which branches are worth exploring… like DeepMind’s neural networks.
  • 23. AN ASIDE ON NEURAL NETWORKS Photo: Wikimedia Chief among ongoing mysteries is why AlphaGo’s neural networks continue to outperform other hybrids of MCTS and neural nets. The realistic answer is there is much more to their implementation of neural nets than is outlined in their paper. What follows does not remotely approximate pseudocode, and is merely the skeleton of AlphaGo’s structure and training pipeline.
  • 25. POLICY SUPERVISED LEARNING (SL) Photo: Nature Training data 30 million positions from the KGS Go server. Methodology The SL policy network (13 layers) and rollout policy network (fewer layers) are trained on state-action pairs randomly sampled from the data. Outcome The networks are learning to recognize which possible action a has the greatest likelihood of being selected in state s.
  • 26. POLICY NETWORK TRANSITION Photo: Nature Parallel structures The weights of the reinforcement learning (RL) policy network is initialized to the current weights as the SL network
  • 27. POLICY REINFORCEMENT LEARNING Photo: Nature Training the RL policy network The current policy network plays games with a random past iteration of the network - not with its immediate predecessor, which would lead to overfitting. Methodology The weights in the network are updated using policy gradient learning to appropriately reward winning and losing policies. Outcome The policy network is now optimized for winning games, not just accurately predicting the next move.
  • 28. VALUE REINFORCEMENT LEARNING Photo: Nature Training the RL value network Uses 30 million distinct positions sampled from separate games between policy networks. Methodology The weights in the network trained using regression on state-outcome pairs, minimizing the error between the predicted and actual result. Outcome The value network outputs a probability of a game’s expected outcome given any state of the board.
  • 29. VALUE REINFORCEMENT LEARNING Photo: Nature Note: the network is trained on state-outcome pairs, not on the complete game. DeepMind says: “The problem is that successive positions are strongly correlated, differing by just one stone, but the regression target is shared for the entire game.” Q: What would be the output of the value network if we could perfectly search every node in the tree? A: We would know the outcome with 100% confidence, so it would output 1, 0, or -1.
  • 30. SEARCHING WITH POLICY & VALUE NETWORKS Photo: NaturePhoto: Nature Actual Gameplay Let’s say AlphaGo begins a game of Go. By this time, it has already “asynchronously” trained its neural networks (learned probability weights for each board position, hopefully in a smart way) long before the game begins. At each step of the game, AlphaGo reads in a board position. Since there are so many board positions, it needs a fast way to look them up. AlphaGo uses a modified Monte Carlo tree search to run n simulations (however many it can run in the time allotted per turn), evaluates what it learned, and then chooses its next move.
  • 31. SEARCHING WITH POLICY & VALUE NETWORKS Photo: NaturePhoto: Nature
  • 32. SEARCHING WITH POLICY & VALUE NETWORKS Photo: NaturePhoto: Nature At each step in a simulation, AlphaGo selects and expands into the next node based on three things: 1) The action value of the node (initialized at zero in each simulation) 2) The prior probability associated with the node (learned from expert games and self-play) 3) An exploration parameter (to favor nodes not yet visited)
  • 33. SEARCHING WITH POLICY & VALUE NETWORKS Photo: NaturePhoto: Nature After expanding into a node, AlphaGo evaluates it using two methods : 1) The value network processes the node and returns a value. 2) The fast policy network runs a rollout to the end of the tree, and returns the outcome: win, loss, or draw.
  • 34. SEARCHING WITH POLICY & VALUE NETWORKS Photo: NaturePhoto: Nature The algorithm then takes a weighted sum of those two values, and propagates the new information back up the tree. That updating changes the action values to reflect what this simulation learned, allowing the next simulation to incorporate that information into its decision process.
  • 35. SEARCHING WITH POLICY & VALUE NETWORKS When n simulations have been completed (i.e. the allotted time per turn is almost expired), AlphaGo updates the action values and visit counts of all traversed edges so that “each edge accumulates the visit count and mean evaluation of all simulations passing through that edge.” At that point, the algorithm will have visit counts and action values for each legal action, and will choose the “principal variation”, the move from the current position that was most frequently visited in the simulations.
  • 36. SEARCHING WITH POLICY & VALUE NETWORKS Photo: Nature At this point, all of our simulations combine to single out the principal variation, the move that was most frequently visited over the n simulations.
  • 37. COMPETITION WIN RATES AlphaGo (single machine) vs. other Go programs 494/495 AlphaGo (distributed version) vs. other Go programs -/- AlphaGo (distributed version) vs. AlphaGo (single machine) -/- AlphaGo (distributed version) vs. Fan Hui (European Go champion) 5/5 AlphaGo (distributed version) vs. Lee Sedol (World Go champion) 4/5 98.0% 100.0% 77.0% 100.0% 80.0%
  • 38. The chief weakness is that the machine will not learn by mistakes. The only way to improve its play is by improving the program. Some thought has been given to designing a program which is self-improving but, although it appears to be possible, the methods thought of so far do not seem to be very practical. Claude Shannon Programming a Computer to Play Chess
  • 39. THANK YOU Resources Programming a Computer to Play Chess Mastering the Game of Go with Deep Neural Networks and Tree Search The Mystery Of Go, The Ancient Game That Computers Still Can’t Win UCLA Stats 232B Course Notes Deep Learning for Artificial General Intelligence