SlideShare a Scribd company logo
Mastering the game of Go with
deep neural networks and tree
search
Speaker: San-Feng Chang
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche,
G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman,
S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach,
M., Kavukcuoglu, K., Graepel, T., and Hassabis, D.
Nature, 529(7587):484–489, 2016.
2016/3/22 1
Outline
• AI in Game Playing
• Previous Work of Go Research
• Architecture of AlphaGo
• AlphaGo’s methods
• The playing strength of AlphaGo
• Conclusion
2016/3/22 2
AI in Game Playing(1/3)
• Game-playing is a specific problem to measure
the performance of an AI.
• One classification for outcomes of an AI test is:
2016/3/22 3
Optimal It is not possible to perform better
Strong super-human Performs better than all humans
Super-human Performs better than most humans
Sub-human Performs worse than most humans
AI in Game Playing(2/3)
Game Players
Branching
Factor
Depth Length Complexity
Chess
Deep Blue vs
Kasparov (1997)
35 80
35^80 ≈
10^123
Go
AlphaGo vs Lee
Sedol (2016)
250 150
250^150≈
10^360
2016/3/22 4
Evolution of Gaming Tree Search:
Brute Force
Minmax &
Alpha-Beta
MCTS
AlphaGo’s
Method
AI in Game Playing(3/3)
• Minmax & Alpha-Beta Pruning
2016/3/22 5
The complexity is still too high.
https://upload.wikimedia.org/wikipedia/commons/thumb/9/91/AB_pruning.svg/1280px-AB_pruning.svg.png?1458451165542
Previous Work of Go Research (1/4)
• Monte Carlo rollouts search to maximum
depth without branching at all, by sampling
long sequences of actions for both players
from a policy p.
• Monte Carlo tree search (MCTS) uses Monte
Carlo rollouts to estimate the value of each
state in a search tree.
2016/3/22 6
Previous Work of Go Research (2/4)
• Monte Carlo Tree Search:
2016/3/22 7
2/3
1/1 1/2
1/1 0/1
2/3
1/1 1/2
1/1 0/1
Selection
(Randomly)
Expansion
0/0
Player 1
Player 2
Player 1
• Monte Carlo Tree Search:
Previous Work of Go Research (3/4)
2016/3/22 8
2/3
1/1 1/2
1/1 0/1
Simulation
0/0
......
3/4
1/1 2/3
2/2 0/1
Back-Propagation
1/1
Player 1
Player 2
Player 1
Player 2
Previous Work of Go Research (4/4)
• The strongest current Go programs are based
on MCTS, enhanced by policies that are
trained to predict human expert moves.
• However, prior work has been limited to
shallow policies or value functions based on a
linear combination of input features.
2016/3/22 9
Architecture of AlphaGo
2016/3/22 10
Neural Network Training Pipeline
s: board position
a: legal moves
p(a|s): probability distribution
v(s): scalar value
Two Brains
Human expert dataset:
KGS server ~ 160,000 games
29.4 million positions
Convolution Neural Network(1/2)
2016/3/22 11
A regular 3-layer Neural Network A convolutional neural network
Input volume of size: W1 x H1 x D1
Requires four hyperparameters:
1. Number of filters K (depth)
2. Spatial extent F (kernel size)
3. The stride S
4. The amount of zero padding P
Output volume size: W2 x H2 x D2
W2 = (W1 – F + 2P)/S + 1
H2 = (H1 – F + 2P)/S + 1
D2 = k
• Parameter sharing:
total weights: (F * F * D1) * K
http://cs231n.github.io/convolutional-networks/
Convolution Neural Network(2/2)
2016/3/22 12http://cs231n.github.io/convolutional-networks/
Number of filter K: 2
Spatial extent F: 3 x 3
Stride S: 2
Zero padding P: 1
AlphaGo’s methods –
Trained by Human Expert (1/6)
• Rollout Policy :
– Using 2μs to select an action but only 24.2% accuracy
to predict expert moves correctly
– Using a linear softmax of small pattern features with
weights π
2016/3/22 13
p
n1
n2
n3
n1,in
n2,in
n3,in
ininin
in
nnn
n
out
eee
e
n ,3,2,1
,1
,1


https://qph.fs.quoracdn.net/main-qimg-9e2d012ef7cb8b29d2bed14d2975c986
AlphaGo’s methods –
Trained by Human Expert (2/6)
• SL policy :
– Using 3ms to select an action and 57.0% accuracy
to predict expert moves correctly
– Using 13 layers convolutional neural network with
weights σ
2016/3/22 14
p
......
Input
Size: 19*19
48 planes
First layer
Conv + ReLU
Kernel size: 5 x 5
2nd~12th layers
Conv + ReLU
Kernel size: 3 x 3
13th layers
Kernel size: 1 x 1, 1 filter, softmax
AlphaGo’s methods –
Reinforcement Learning pρ (3/6)
2016/3/22 15
SL policy
pσ
Initialize Weights
ρ = ρ- = σ
RL policy
pρ
pρ- pρ
Opponent pool
Play ...... End
r
reward
Policy Gradient
Method
Add pρ to
opponent pool
AlphaGo’s methods –
Value Network vθ (4/6)
• Supervised Learning:
– Used to estimate the positions’ winning rate at
current state
– Using 15 layers CNN
2016/3/22 16
......
Input
Size: 19*19
48 planes
+1 unit
(current color)
1st~13th layers
The same as
RL Policy networks
15th layers
Full-connected
1 tanh unit
14th layer
Fully-connected
256 ReLU unit
AlphaGo’s methods –
Value Network vθ (5/6)
• Randomly sample an integer U in 1 ~ 450
– t = 1 ~ U-1 – Played by SL policy network pσ
– t = U – Random action
– t = U+1 ~ End – Played by RL policy network pρ
• Reward
• Only a single training example (sU+1, zU+1) is
added to the data set from each game.
2016/3/22 17
 Tt srz 
AlphaGo’s methods –
Searching (6/6)
2016/3/22 18
• Q: Action Value  Winning scores
• u(P): Upper Confidence bound  Exploration vs. Exploitation
• P: Prior probability  using pσ (SL performed better than RL)
More
The playing strength of AlphaGo
2016/3/22 19
Conclusion
• Reaching a milestone is the beginning of the
next milestone.
• Stay hungry, stay foolish!
2016/3/22 20
References(1/2)
• Nature:
– Mastering the game of Go with deep neural
networks and tree search
• Mark Chang:
– http://www.slideshare.net/ckmarkohchang/alph
ago-in-depth
• CNN:
– http://cs231n.github.io/convolutional-networks/
2016/3/22 21
References(2/2)
• 陳鍾誠
– http://www.slideshare.net/ccckmit/30alphago
• Monte Carlo Tree Search
– https://jeffbradberry.com/posts/2015/09/intro-
to-monte-carlo-tree-search/
• How AlphaGo Works
– http://www.slideshare.net/ShaneSeungwhanMo
on/how-alphago-works
2016/3/22 22
2016/3/22 23
Formula(1/2)
• Policy Network: classification
• Policy Network: reinforcement learning
• Value Network: regression
2016/3/22 24
 
 



m
k
kk
sap
m 1
log




    i
t
i
t
n
i
i
t
i
t
i
t
svz
sap
n



   1 1
log




    


 



  
k
m
k
kk sv
svz
m 1
Formula(2/2)
• Searching:
2016/3/22 25
    asuasQa tt
a
t ,,maxarg 
   
 asN
asP
asu
,1
,
,


   

n
i
iaslasN
1
,,,
 
 
   

n
i
L
i
sViasl
asN
asQ
1
,,
,
1
,
l(s,a,i) indicates whether an edge (s,a) ith simulation
si
L is the leaf node from ith simulation
      LLL zsvsV    1
Back
   
 
 asN
bsN
asPcasu b r
puct
,1
,
,,



How AlphaGo selected its move
2016/3/22 26
The playing strength of AlphaGo
(Bonus 1)
2016/3/22 27
The playing strength of AlphaGo
(Bonus 2)
2016/3/22 28

More Related Content

What's hot

20171128分散深層学習とChainerMNについて
20171128分散深層学習とChainerMNについて20171128分散深層学習とChainerMNについて
20171128分散深層学習とChainerMNについて
Preferred Networks
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
Jie-Han Chen
 
Introduction to batch normalization
Introduction to batch normalizationIntroduction to batch normalization
Introduction to batch normalization
Jamie (Taka) Wang
 
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
Edureka!
 
Andrew Ng, Chief Scientist at Baidu
Andrew Ng, Chief Scientist at BaiduAndrew Ng, Chief Scientist at Baidu
Andrew Ng, Chief Scientist at Baidu
Extract Data Conference
 
알파고 풀어보기 / Alpha Technical Review
알파고 풀어보기 / Alpha Technical Review알파고 풀어보기 / Alpha Technical Review
알파고 풀어보기 / Alpha Technical Review
상은 박
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago Zero
Chia-Ching Lin
 
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Yoshiyasu SAEKI
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
Jun Young Park
 
2022_11_11 «The promise and challenges of Multimodal Learning Analytics»
2022_11_11 «The promise and challenges of Multimodal Learning Analytics»2022_11_11 «The promise and challenges of Multimodal Learning Analytics»
2022_11_11 «The promise and challenges of Multimodal Learning Analytics»
eMadrid network
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
Stepan Pushkarev
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systems
Jack (Jaegeun) Han
 
【DL輪読会】Segment Anything
【DL輪読会】Segment Anything【DL輪読会】Segment Anything
【DL輪読会】Segment Anything
Deep Learning JP
 
[DL輪読会]Making Sense of Vision and Touch: Self-Supervised Learning of Multimod...
[DL輪読会]Making Sense of Vision and Touch: Self-Supervised Learning of Multimod...[DL輪読会]Making Sense of Vision and Touch: Self-Supervised Learning of Multimod...
[DL輪読会]Making Sense of Vision and Touch: Self-Supervised Learning of Multimod...
Deep Learning JP
 
時系列問題に対するCNNの有用性検証
時系列問題に対するCNNの有用性検証時系列問題に対するCNNの有用性検証
時系列問題に対するCNNの有用性検証
Masaharu Kinoshita
 
【論文紹介】 Attention Based Spatial-Temporal Graph Convolutional Networks for Traf...
【論文紹介】 Attention Based Spatial-Temporal Graph Convolutional Networks for Traf...【論文紹介】 Attention Based Spatial-Temporal Graph Convolutional Networks for Traf...
【論文紹介】 Attention Based Spatial-Temporal Graph Convolutional Networks for Traf...
ddnpaa
 
ML Production Pipelines: A Classification Model
ML Production Pipelines: A Classification ModelML Production Pipelines: A Classification Model
ML Production Pipelines: A Classification Model
Databricks
 
有名論文から学ぶディープラーニング 2016.03.25
有名論文から学ぶディープラーニング 2016.03.25有名論文から学ぶディープラーニング 2016.03.25
有名論文から学ぶディープラーニング 2016.03.25
Minoru Chikamune
 
Semi supervised learning machine learning made simple
Semi supervised learning  machine learning made simpleSemi supervised learning  machine learning made simple
Semi supervised learning machine learning made simple
Devansh16
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 

What's hot (20)

20171128分散深層学習とChainerMNについて
20171128分散深層学習とChainerMNについて20171128分散深層学習とChainerMNについて
20171128分散深層学習とChainerMNについて
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Introduction to batch normalization
Introduction to batch normalizationIntroduction to batch normalization
Introduction to batch normalization
 
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
 
Andrew Ng, Chief Scientist at Baidu
Andrew Ng, Chief Scientist at BaiduAndrew Ng, Chief Scientist at Baidu
Andrew Ng, Chief Scientist at Baidu
 
알파고 풀어보기 / Alpha Technical Review
알파고 풀어보기 / Alpha Technical Review알파고 풀어보기 / Alpha Technical Review
알파고 풀어보기 / Alpha Technical Review
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago Zero
 
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
2022_11_11 «The promise and challenges of Multimodal Learning Analytics»
2022_11_11 «The promise and challenges of Multimodal Learning Analytics»2022_11_11 «The promise and challenges of Multimodal Learning Analytics»
2022_11_11 «The promise and challenges of Multimodal Learning Analytics»
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systems
 
【DL輪読会】Segment Anything
【DL輪読会】Segment Anything【DL輪読会】Segment Anything
【DL輪読会】Segment Anything
 
[DL輪読会]Making Sense of Vision and Touch: Self-Supervised Learning of Multimod...
[DL輪読会]Making Sense of Vision and Touch: Self-Supervised Learning of Multimod...[DL輪読会]Making Sense of Vision and Touch: Self-Supervised Learning of Multimod...
[DL輪読会]Making Sense of Vision and Touch: Self-Supervised Learning of Multimod...
 
時系列問題に対するCNNの有用性検証
時系列問題に対するCNNの有用性検証時系列問題に対するCNNの有用性検証
時系列問題に対するCNNの有用性検証
 
【論文紹介】 Attention Based Spatial-Temporal Graph Convolutional Networks for Traf...
【論文紹介】 Attention Based Spatial-Temporal Graph Convolutional Networks for Traf...【論文紹介】 Attention Based Spatial-Temporal Graph Convolutional Networks for Traf...
【論文紹介】 Attention Based Spatial-Temporal Graph Convolutional Networks for Traf...
 
ML Production Pipelines: A Classification Model
ML Production Pipelines: A Classification ModelML Production Pipelines: A Classification Model
ML Production Pipelines: A Classification Model
 
有名論文から学ぶディープラーニング 2016.03.25
有名論文から学ぶディープラーニング 2016.03.25有名論文から学ぶディープラーニング 2016.03.25
有名論文から学ぶディープラーニング 2016.03.25
 
Semi supervised learning machine learning made simple
Semi supervised learning  machine learning made simpleSemi supervised learning  machine learning made simple
Semi supervised learning machine learning made simple
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 

Viewers also liked

Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysis
Dave Selinger
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
영우 김
 
Deep Learning Meetup 7 - Building a Deep Learning-powered Search Engine
Deep Learning Meetup 7 - Building a Deep Learning-powered Search EngineDeep Learning Meetup 7 - Building a Deep Learning-powered Search Engine
Deep Learning Meetup 7 - Building a Deep Learning-powered Search EngineKoby Karp
 
바둑인을 위한 알파고
바둑인을 위한 알파고바둑인을 위한 알파고
바둑인을 위한 알파고
Donghun Lee
 
(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2
Serhii Havrylov
 
さくっとはじめるテキストマイニング(R言語)  スタートアップ編
さくっとはじめるテキストマイニング(R言語)  スタートアップ編さくっとはじめるテキストマイニング(R言語)  スタートアップ編
さくっとはじめるテキストマイニング(R言語)  スタートアップ編
Yutaka Shimada
 
How AlphaGo Works
How AlphaGo WorksHow AlphaGo Works
How AlphaGo Works
Shane (Seungwhan) Moon
 

Viewers also liked (7)

Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysis
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
Deep Learning Meetup 7 - Building a Deep Learning-powered Search Engine
Deep Learning Meetup 7 - Building a Deep Learning-powered Search EngineDeep Learning Meetup 7 - Building a Deep Learning-powered Search Engine
Deep Learning Meetup 7 - Building a Deep Learning-powered Search Engine
 
바둑인을 위한 알파고
바둑인을 위한 알파고바둑인을 위한 알파고
바둑인을 위한 알파고
 
(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2
 
さくっとはじめるテキストマイニング(R言語)  スタートアップ編
さくっとはじめるテキストマイニング(R言語)  スタートアップ編さくっとはじめるテキストマイニング(R言語)  スタートアップ編
さくっとはじめるテキストマイニング(R言語)  スタートアップ編
 
How AlphaGo Works
How AlphaGo WorksHow AlphaGo Works
How AlphaGo Works
 

Similar to Mastering the game of go with deep neural networks and tree search

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
Tim Riser
 
Gdmc v11 presentation
Gdmc v11 presentationGdmc v11 presentation
Gdmc v11 presentation
jihoon jeon
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
AdityaSuryavamshi
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
Tobias Pfeiffer
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
Tobias Pfeiffer
 
La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunication
Alexandre Monnin
 
Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...
Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...
Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...
Masahito Ohue
 
Google Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research PaperGoogle Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research Paper
Business of Software Conference
 
AlphaGo and AlphaGo Zero
AlphaGo and AlphaGo ZeroAlphaGo and AlphaGo Zero
AlphaGo and AlphaGo Zero
☕ Keita Watanabe
 
Kaggle kenneth
Kaggle kennethKaggle kenneth
Kaggle kenneth
kenluck2001
 
Game tech 2014 cognition and game design designing for cognitive adaptabilit...
Game tech 2014 cognition and game design  designing for cognitive adaptabilit...Game tech 2014 cognition and game design  designing for cognitive adaptabilit...
Game tech 2014 cognition and game design designing for cognitive adaptabilit...
Shane Gallagher
 
Learning analytics for improving educational games jcsg2017
Learning analytics for improving educational games jcsg2017Learning analytics for improving educational games jcsg2017
Learning analytics for improving educational games jcsg2017
Baltasar Fernández-Manjón
 
Neural network
Neural networkNeural network
Neural network
Babu Priyavrat
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
Li Shen
 
Ai final module (1)
Ai final module (1)Ai final module (1)
Ai final module (1)
Devansh Chawla
 
Jing Ma - 2017 - Detect Rumors in Microblog Posts Using Propagation Structur...
Jing Ma - 2017 -  Detect Rumors in Microblog Posts Using Propagation Structur...Jing Ma - 2017 -  Detect Rumors in Microblog Posts Using Propagation Structur...
Jing Ma - 2017 - Detect Rumors in Microblog Posts Using Propagation Structur...
Association for Computational Linguistics
 
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...
Silicon Studio Corporation
 
Phx dl meetup
Phx dl meetupPhx dl meetup
Phx dl meetup
James Sirota
 

Similar to Mastering the game of go with deep neural networks and tree search (20)

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
Gdmc v11 presentation
Gdmc v11 presentationGdmc v11 presentation
Gdmc v11 presentation
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunication
 
Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...
Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...
Parallelized pipeline for whole genome shotgun metagenomics with GHOSTZ-GPU a...
 
Google Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research PaperGoogle Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research Paper
 
AlphaGo and AlphaGo Zero
AlphaGo and AlphaGo ZeroAlphaGo and AlphaGo Zero
AlphaGo and AlphaGo Zero
 
Kaggle kenneth
Kaggle kennethKaggle kenneth
Kaggle kenneth
 
Game tech 2014 cognition and game design designing for cognitive adaptabilit...
Game tech 2014 cognition and game design  designing for cognitive adaptabilit...Game tech 2014 cognition and game design  designing for cognitive adaptabilit...
Game tech 2014 cognition and game design designing for cognitive adaptabilit...
 
Learning analytics for improving educational games jcsg2017
Learning analytics for improving educational games jcsg2017Learning analytics for improving educational games jcsg2017
Learning analytics for improving educational games jcsg2017
 
ConvNets_C_Focke2
ConvNets_C_Focke2ConvNets_C_Focke2
ConvNets_C_Focke2
 
Neural network
Neural networkNeural network
Neural network
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Ai final module (1)
Ai final module (1)Ai final module (1)
Ai final module (1)
 
Jing Ma - 2017 - Detect Rumors in Microblog Posts Using Propagation Structur...
Jing Ma - 2017 -  Detect Rumors in Microblog Posts Using Propagation Structur...Jing Ma - 2017 -  Detect Rumors in Microblog Posts Using Propagation Structur...
Jing Ma - 2017 - Detect Rumors in Microblog Posts Using Propagation Structur...
 
Ai in games
Ai in gamesAi in games
Ai in games
 
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...
 
Phx dl meetup
Phx dl meetupPhx dl meetup
Phx dl meetup
 

Recently uploaded

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 

Recently uploaded (20)

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 

Mastering the game of go with deep neural networks and tree search

  • 1. Mastering the game of Go with deep neural networks and tree search Speaker: San-Feng Chang Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., and Hassabis, D. Nature, 529(7587):484–489, 2016. 2016/3/22 1
  • 2. Outline • AI in Game Playing • Previous Work of Go Research • Architecture of AlphaGo • AlphaGo’s methods • The playing strength of AlphaGo • Conclusion 2016/3/22 2
  • 3. AI in Game Playing(1/3) • Game-playing is a specific problem to measure the performance of an AI. • One classification for outcomes of an AI test is: 2016/3/22 3 Optimal It is not possible to perform better Strong super-human Performs better than all humans Super-human Performs better than most humans Sub-human Performs worse than most humans
  • 4. AI in Game Playing(2/3) Game Players Branching Factor Depth Length Complexity Chess Deep Blue vs Kasparov (1997) 35 80 35^80 ≈ 10^123 Go AlphaGo vs Lee Sedol (2016) 250 150 250^150≈ 10^360 2016/3/22 4 Evolution of Gaming Tree Search: Brute Force Minmax & Alpha-Beta MCTS AlphaGo’s Method
  • 5. AI in Game Playing(3/3) • Minmax & Alpha-Beta Pruning 2016/3/22 5 The complexity is still too high. https://upload.wikimedia.org/wikipedia/commons/thumb/9/91/AB_pruning.svg/1280px-AB_pruning.svg.png?1458451165542
  • 6. Previous Work of Go Research (1/4) • Monte Carlo rollouts search to maximum depth without branching at all, by sampling long sequences of actions for both players from a policy p. • Monte Carlo tree search (MCTS) uses Monte Carlo rollouts to estimate the value of each state in a search tree. 2016/3/22 6
  • 7. Previous Work of Go Research (2/4) • Monte Carlo Tree Search: 2016/3/22 7 2/3 1/1 1/2 1/1 0/1 2/3 1/1 1/2 1/1 0/1 Selection (Randomly) Expansion 0/0 Player 1 Player 2 Player 1
  • 8. • Monte Carlo Tree Search: Previous Work of Go Research (3/4) 2016/3/22 8 2/3 1/1 1/2 1/1 0/1 Simulation 0/0 ...... 3/4 1/1 2/3 2/2 0/1 Back-Propagation 1/1 Player 1 Player 2 Player 1 Player 2
  • 9. Previous Work of Go Research (4/4) • The strongest current Go programs are based on MCTS, enhanced by policies that are trained to predict human expert moves. • However, prior work has been limited to shallow policies or value functions based on a linear combination of input features. 2016/3/22 9
  • 10. Architecture of AlphaGo 2016/3/22 10 Neural Network Training Pipeline s: board position a: legal moves p(a|s): probability distribution v(s): scalar value Two Brains Human expert dataset: KGS server ~ 160,000 games 29.4 million positions
  • 11. Convolution Neural Network(1/2) 2016/3/22 11 A regular 3-layer Neural Network A convolutional neural network Input volume of size: W1 x H1 x D1 Requires four hyperparameters: 1. Number of filters K (depth) 2. Spatial extent F (kernel size) 3. The stride S 4. The amount of zero padding P Output volume size: W2 x H2 x D2 W2 = (W1 – F + 2P)/S + 1 H2 = (H1 – F + 2P)/S + 1 D2 = k • Parameter sharing: total weights: (F * F * D1) * K http://cs231n.github.io/convolutional-networks/
  • 12. Convolution Neural Network(2/2) 2016/3/22 12http://cs231n.github.io/convolutional-networks/ Number of filter K: 2 Spatial extent F: 3 x 3 Stride S: 2 Zero padding P: 1
  • 13. AlphaGo’s methods – Trained by Human Expert (1/6) • Rollout Policy : – Using 2μs to select an action but only 24.2% accuracy to predict expert moves correctly – Using a linear softmax of small pattern features with weights π 2016/3/22 13 p n1 n2 n3 n1,in n2,in n3,in ininin in nnn n out eee e n ,3,2,1 ,1 ,1   https://qph.fs.quoracdn.net/main-qimg-9e2d012ef7cb8b29d2bed14d2975c986
  • 14. AlphaGo’s methods – Trained by Human Expert (2/6) • SL policy : – Using 3ms to select an action and 57.0% accuracy to predict expert moves correctly – Using 13 layers convolutional neural network with weights σ 2016/3/22 14 p ...... Input Size: 19*19 48 planes First layer Conv + ReLU Kernel size: 5 x 5 2nd~12th layers Conv + ReLU Kernel size: 3 x 3 13th layers Kernel size: 1 x 1, 1 filter, softmax
  • 15. AlphaGo’s methods – Reinforcement Learning pρ (3/6) 2016/3/22 15 SL policy pσ Initialize Weights ρ = ρ- = σ RL policy pρ pρ- pρ Opponent pool Play ...... End r reward Policy Gradient Method Add pρ to opponent pool
  • 16. AlphaGo’s methods – Value Network vθ (4/6) • Supervised Learning: – Used to estimate the positions’ winning rate at current state – Using 15 layers CNN 2016/3/22 16 ...... Input Size: 19*19 48 planes +1 unit (current color) 1st~13th layers The same as RL Policy networks 15th layers Full-connected 1 tanh unit 14th layer Fully-connected 256 ReLU unit
  • 17. AlphaGo’s methods – Value Network vθ (5/6) • Randomly sample an integer U in 1 ~ 450 – t = 1 ~ U-1 – Played by SL policy network pσ – t = U – Random action – t = U+1 ~ End – Played by RL policy network pρ • Reward • Only a single training example (sU+1, zU+1) is added to the data set from each game. 2016/3/22 17  Tt srz 
  • 18. AlphaGo’s methods – Searching (6/6) 2016/3/22 18 • Q: Action Value  Winning scores • u(P): Upper Confidence bound  Exploration vs. Exploitation • P: Prior probability  using pσ (SL performed better than RL) More
  • 19. The playing strength of AlphaGo 2016/3/22 19
  • 20. Conclusion • Reaching a milestone is the beginning of the next milestone. • Stay hungry, stay foolish! 2016/3/22 20
  • 21. References(1/2) • Nature: – Mastering the game of Go with deep neural networks and tree search • Mark Chang: – http://www.slideshare.net/ckmarkohchang/alph ago-in-depth • CNN: – http://cs231n.github.io/convolutional-networks/ 2016/3/22 21
  • 22. References(2/2) • 陳鍾誠 – http://www.slideshare.net/ccckmit/30alphago • Monte Carlo Tree Search – https://jeffbradberry.com/posts/2015/09/intro- to-monte-carlo-tree-search/ • How AlphaGo Works – http://www.slideshare.net/ShaneSeungwhanMo on/how-alphago-works 2016/3/22 22
  • 24. Formula(1/2) • Policy Network: classification • Policy Network: reinforcement learning • Value Network: regression 2016/3/22 24        m k kk sap m 1 log         i t i t n i i t i t i t svz sap n       1 1 log                    k m k kk sv svz m 1
  • 25. Formula(2/2) • Searching: 2016/3/22 25     asuasQa tt a t ,,maxarg       asN asP asu ,1 , ,        n i iaslasN 1 ,,,          n i L i sViasl asN asQ 1 ,, , 1 , l(s,a,i) indicates whether an edge (s,a) ith simulation si L is the leaf node from ith simulation       LLL zsvsV    1 Back        asN bsN asPcasu b r puct ,1 , ,,   
  • 26. How AlphaGo selected its move 2016/3/22 26
  • 27. The playing strength of AlphaGo (Bonus 1) 2016/3/22 27
  • 28. The playing strength of AlphaGo (Bonus 2) 2016/3/22 28

Editor's Notes

  1. 50 GPUs, 3 weeks
  2. 50 GPUs, 1 days, 80% win rate to SL