SlideShare a Scribd company logo
AlphaGo	
  Analysis	
  from	
  Deep	
  
Learning	
  Perspec6ve	
  
Chayan	
  Chakrabar6	
  
July	
  11,	
  2016	
  
Pleasanton,	
  CA	
  
Mastering	
  the	
  game	
  of	
  GO	
  
•  DeepMind	
  problem	
  domain	
  
•  Deep	
  learning	
  and	
  reinforcement	
  learning	
  
concepts	
  
•  Design	
  of	
  AlphaGo	
  
•  Execu6on	
  
GO:	
  perfect	
  informa6on	
  game	
  
All	
  possible	
  GO	
  boards	
  =	
  250150	
  >	
  Number	
  of	
  atoms	
  in	
  the	
  universe	
  	
  	
  
Reduce	
  search	
  space	
  
•  Reduce	
  breadth	
  
– Not	
  all	
  moves	
  are	
  equally	
  likely	
  
– Some	
  moves	
  are	
  bePer	
  
– Leverage	
  moves	
  made	
  by	
  expert	
  players	
  
•  Reduce	
  depth	
  
– Evaluate	
  strength	
  of	
  board	
  (likelihood	
  of	
  winning)	
  
– Collapse	
  symmetrical	
  or	
  similar	
  boards	
  
– Simulate	
  the	
  games	
  
	
  
	
  
Monte	
  Carlo	
  tree	
  search	
  
Supervised	
  learning	
  using	
  neural	
  networks	
  
Convolu6onal	
  neural	
  networks	
  
Encode	
  local	
  or	
  spa6al	
  features	
  
Reinforcement	
  learning	
  
Reinforcement"Learning""
State:" St
Reward"
(Feedback):"Rt
AcIon:"At
•  Feedback"is"delayed."
•  No"supervisor,"only"a"reward"signal."
•  Rules"of"the"game"are"unknown."
Agent"
Environment"
Determinis6c	
  policy	
  
Stochas6c	
  policy	
  
Value:	
  expected	
  long	
  term	
  reward	
  
Monte	
  Carlo	
  tree	
  search	
  combined	
  
with	
  deep	
  neural	
  networks	
  AlphaGo
neural networks
normal MCTS
AlphaGO	
  schema6c	
  architecture	
  
AlphaGo neural networks
selectionevaluation evaluation
Reducing	
  breadth	
  of	
  moves	
  
Predic6ng	
  the	
  move	
  
1.*Reducing*“action*candidates”
(1) Imitating+expert+moves+(supervised+learning)
Expert$Moves$Imitator$Model
(w/$CNN)
Current$Board
Training:
ng*“action*candidates”
+expert+moves+(supervised+learning)
Expert$Moves$Imitator$Model
(w/$CNN)
Next$Action
Training:
ng*“action*candidates”
+expert+moves+(supervised+learning)
Prediction$
Model
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
g:$s ! p(a|s) p(a|s) aargmax
Next$Action
Reducing*“action*candidates”
Imitating+expert+moves+(supervised+learning)
Expert$Moves$Imitator$Model
(w/$CNN)
nt$Board Next$A
Training:
Two	
  kinds	
  of	
  policies	
  
● used a large database of online expert games
● learned two versions of the neural network
○ a fast network P for use in evaluation
○ an accurate network P for use in selection
Step 1: learn to predict human moves
CS63 topic
neural networks
week 7, 14?
Further	
  reduce	
  search	
  space	
  Symmetries"
Input"
RotaIon""
90"degrees"
RotaIon""
180"degrees"
RotaIon""
270"degrees"
VerIcal"
reflecIon"
VerIcal"
reflecIon"
VerIcal"
reflecIon"
VerIcal"
reflecIon"
Reduce	
  depth	
  by	
  board	
  evalua6on	
  
Updated$Model
ver 1,000,000
Board$Position
Training:
Value$
Predictio
Model
(Regressio
Evaluation
Updated$Model
W
Value$
Prediction$
Adds$a reg
Predicts$v
Close$to$1
Close$to$0
Win$/$Loss
e$
Adds$a regression$layer$to$the$model
Predicts$values$between$0~1
Close$to$1:$a$good$board$position
Close$to$0:$a$bad$board$position
aluation
Updated$Model
ver 1,000,000
Training:
Win$/$Loss
Win
(0~1)
Value$
Prediction$
Model
(Regression)
Adds$a regression$layer$to$the$model
Predicts$values$between$0~1
Close$to$1:$a$good$board$position
Close$to$0:$a$bad$board$position
Value	
  follows	
  from	
  policy	
  
Step 3: learn a board evaluation network, V
● use random samples from the self-play database
● prediction target: probability that black wins from a
given board
PuWng	
  it	
  all	
  together	
  
Looking*ahead*(w/*Monte*Carlo*Search*Tree)
Action$Candidates$Reduction
(Policy$Network)
Board$Evaluation
(Value$Network)
(Rollout):$Faster$version$of$estimating$p(a|s)
! uses shallow$networks$(3$ms ! 2µs)
Selec6on	
  
Expansion	
  Expansion"
s
a
s0
Insert"the"node"for"the"successor"
state""""".""s0
1"
2"
Nv(s0
, a0
) = Nr(s0
, a0
) = 0
Wr(s0
, a0
) = Wv(s0
, a0
) = 0
P(s0
, a0
) = p (a0
|s0
)
p (a0
|s0
)
If"visit"count"exceed"a"threshold":"
"""""","Nr(s, a) > nthr
a0
a0
For"every"possible"""""","iniIalize"
the"staIsIcs:""""
a0
75"
Evalua6on	
  EvaluaIon"
p⇡
1"
2" Simulate"the"acIon"by""
rollout"policy"network""""""""."p⇡
Evaluate""""""""""""""by"value"network""""""."v✓(s0
) v✓
r(sT )
v✓(s0
)
When"reaching"terminal""""""",""
calculate"the"reward""""""""""""".""
sT
r(sT )
76"
Backup	
  
Distribute	
  search	
  through	
  GPUs	
  Distributed"Search""
p⇡
r(sT )
v✓(s0
)
p (a0
|s0
)
Main"search"tree"
Master"CPU"
Policy"&"value"networks"
176"GPUs"
Rollout"policy"networks"
1,202"CPUs""
78"
Apply	
  trained	
  networks	
  to	
  tasks	
  with	
  
different	
  loss	
  func6on	
  Takeaways
Use+the+networks+trained+for+a+certain+task+(with+different+loss+objectives)+for+several+other+ta
Single	
  most	
  important	
  takeaway	
  
•  Feature	
  abstrac6on	
  is	
  the	
  key	
  component	
  of	
  
any	
  machine	
  learning	
  algorithm	
  
•  Convolu6onal	
  neural	
  networks	
  are	
  great	
  at	
  
automated	
  feature	
  abstrac6on	
  
Reference	
  
Silver	
  et.	
  al.	
  Mastering	
  the	
  Game	
  of	
  Go	
  with	
  
Deep	
  Neural	
  Networks	
  and	
  Tree	
  Search.	
  	
  Nature.	
  
529,	
  484–489.	
  January	
  2016.	
  
	
  
About	
  the	
  speaker	
  
Chayan	
  Chakrabar6	
  
hPps://www.linkedin.com/in/chayanchakrabar6	
  
	
  

More Related Content

What's hot

AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
Joonhyung Lee
 
(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)
MeetupDataScienceRoma
 
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeAlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
Joonhyung Lee
 
ModuLab DLC-Medical3
ModuLab DLC-Medical3ModuLab DLC-Medical3
ModuLab DLC-Medical3
Dongheon Lee
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
Olivier Teytaud
 
Deliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement MethodsDeliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement Methods
Ruofei Du
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習
Mark Chang
 
AlphaGo Zero Introduction
AlphaGo Zero IntroductionAlphaGo Zero Introduction
AlphaGo Zero Introduction
友誠 張
 
Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017
Julian Lee
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
Mark Chang
 
A Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningA Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep Learning
Suntae Kim
 
Deep Reasoning
Deep ReasoningDeep Reasoning
Deep Reasoning
Taehoon Kim
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)
Dongheon Lee
 
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
หัดเขียน A.I. แบบ AlphaGo กันชิวๆหัดเขียน A.I. แบบ AlphaGo กันชิวๆ
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
Kan Ouivirach, Ph.D.
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection
James Huang
 

What's hot (15)

AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
 
(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)
 
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeAlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
 
ModuLab DLC-Medical3
ModuLab DLC-Medical3ModuLab DLC-Medical3
ModuLab DLC-Medical3
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
 
Deliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement MethodsDeliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement Methods
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習
 
AlphaGo Zero Introduction
AlphaGo Zero IntroductionAlphaGo Zero Introduction
AlphaGo Zero Introduction
 
Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
 
A Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningA Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep Learning
 
Deep Reasoning
Deep ReasoningDeep Reasoning
Deep Reasoning
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)
 
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
หัดเขียน A.I. แบบ AlphaGo กันชิวๆหัดเขียน A.I. แบบ AlphaGo กันชิวๆ
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection
 

Similar to Chakrabarti alpha go analysis

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
Tim Riser
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
재연 윤
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
AdityaSuryavamshi
 
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Chuyang Liu
 
Understanding AlphaGo
Understanding AlphaGoUnderstanding AlphaGo
Understanding AlphaGo
Amit Mandelbaum
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018
Juantomás García Molina
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018
Apache MXNet
 
GamePlaying.ppt
GamePlaying.pptGamePlaying.ppt
GamePlaying.ppt
VihaanN2
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates Itself
Mohammad Shaker
 
AlphaGo zero
AlphaGo zeroAlphaGo zero
AlphaGo zero
Dong Guo
 
Adversarial search with Game Playing
Adversarial search with Game PlayingAdversarial search with Game Playing
Adversarial search with Game Playing
Aman Patel
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
Khaled Saleh
 
La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunication
Alexandre Monnin
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
영우 김
 
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB
 
Devoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingDevoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game Playing
Richard Abbuhl
 
MongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB
 
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific NoveltyCuriosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Hyunwoo Kim
 
ch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.ppt
SanGeet25
 

Similar to Chakrabarti alpha go analysis (20)

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
 
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
 
Understanding AlphaGo
Understanding AlphaGoUnderstanding AlphaGo
Understanding AlphaGo
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018
 
Games.4
Games.4Games.4
Games.4
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018
 
GamePlaying.ppt
GamePlaying.pptGamePlaying.ppt
GamePlaying.ppt
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates Itself
 
AlphaGo zero
AlphaGo zeroAlphaGo zero
AlphaGo zero
 
Adversarial search with Game Playing
Adversarial search with Game PlayingAdversarial search with Game Playing
Adversarial search with Game Playing
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunication
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
 
Devoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingDevoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game Playing
 
MongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
 
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific NoveltyCuriosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
 
ch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.ppt
 

Recently uploaded

Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
JEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questionsJEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questions
ShivajiThube2
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
DhatriParmar
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 

Recently uploaded (20)

Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
JEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questionsJEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questions
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
The Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptxThe Diamond Necklace by Guy De Maupassant.pptx
The Diamond Necklace by Guy De Maupassant.pptx
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 

Chakrabarti alpha go analysis

  • 1. AlphaGo  Analysis  from  Deep   Learning  Perspec6ve   Chayan  Chakrabar6   July  11,  2016   Pleasanton,  CA  
  • 2. Mastering  the  game  of  GO   •  DeepMind  problem  domain   •  Deep  learning  and  reinforcement  learning   concepts   •  Design  of  AlphaGo   •  Execu6on  
  • 3. GO:  perfect  informa6on  game   All  possible  GO  boards  =  250150  >  Number  of  atoms  in  the  universe      
  • 4. Reduce  search  space   •  Reduce  breadth   – Not  all  moves  are  equally  likely   – Some  moves  are  bePer   – Leverage  moves  made  by  expert  players   •  Reduce  depth   – Evaluate  strength  of  board  (likelihood  of  winning)   – Collapse  symmetrical  or  similar  boards   – Simulate  the  games      
  • 5. Monte  Carlo  tree  search  
  • 6. Supervised  learning  using  neural  networks  
  • 8. Encode  local  or  spa6al  features  
  • 9. Reinforcement  learning   Reinforcement"Learning"" State:" St Reward" (Feedback):"Rt AcIon:"At •  Feedback"is"delayed." •  No"supervisor,"only"a"reward"signal." •  Rules"of"the"game"are"unknown." Agent" Environment"
  • 12. Value:  expected  long  term  reward  
  • 13. Monte  Carlo  tree  search  combined   with  deep  neural  networks  AlphaGo neural networks normal MCTS
  • 14. AlphaGO  schema6c  architecture   AlphaGo neural networks selectionevaluation evaluation
  • 16. Predic6ng  the  move   1.*Reducing*“action*candidates” (1) Imitating+expert+moves+(supervised+learning) Expert$Moves$Imitator$Model (w/$CNN) Current$Board Training: ng*“action*candidates” +expert+moves+(supervised+learning) Expert$Moves$Imitator$Model (w/$CNN) Next$Action Training: ng*“action*candidates” +expert+moves+(supervised+learning) Prediction$ Model 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 g:$s ! p(a|s) p(a|s) aargmax Next$Action Reducing*“action*candidates” Imitating+expert+moves+(supervised+learning) Expert$Moves$Imitator$Model (w/$CNN) nt$Board Next$A Training:
  • 17. Two  kinds  of  policies   ● used a large database of online expert games ● learned two versions of the neural network ○ a fast network P for use in evaluation ○ an accurate network P for use in selection Step 1: learn to predict human moves CS63 topic neural networks week 7, 14?
  • 18. Further  reduce  search  space  Symmetries" Input" RotaIon"" 90"degrees" RotaIon"" 180"degrees" RotaIon"" 270"degrees" VerIcal" reflecIon" VerIcal" reflecIon" VerIcal" reflecIon" VerIcal" reflecIon"
  • 19. Reduce  depth  by  board  evalua6on   Updated$Model ver 1,000,000 Board$Position Training: Value$ Predictio Model (Regressio Evaluation Updated$Model W Value$ Prediction$ Adds$a reg Predicts$v Close$to$1 Close$to$0 Win$/$Loss e$ Adds$a regression$layer$to$the$model Predicts$values$between$0~1 Close$to$1:$a$good$board$position Close$to$0:$a$bad$board$position aluation Updated$Model ver 1,000,000 Training: Win$/$Loss Win (0~1) Value$ Prediction$ Model (Regression) Adds$a regression$layer$to$the$model Predicts$values$between$0~1 Close$to$1:$a$good$board$position Close$to$0:$a$bad$board$position
  • 20. Value  follows  from  policy   Step 3: learn a board evaluation network, V ● use random samples from the self-play database ● prediction target: probability that black wins from a given board
  • 21. PuWng  it  all  together   Looking*ahead*(w/*Monte*Carlo*Search*Tree) Action$Candidates$Reduction (Policy$Network) Board$Evaluation (Value$Network) (Rollout):$Faster$version$of$estimating$p(a|s) ! uses shallow$networks$(3$ms ! 2µs)
  • 23. Expansion  Expansion" s a s0 Insert"the"node"for"the"successor" state""""".""s0 1" 2" Nv(s0 , a0 ) = Nr(s0 , a0 ) = 0 Wr(s0 , a0 ) = Wv(s0 , a0 ) = 0 P(s0 , a0 ) = p (a0 |s0 ) p (a0 |s0 ) If"visit"count"exceed"a"threshold":" """""","Nr(s, a) > nthr a0 a0 For"every"possible"""""","iniIalize" the"staIsIcs:"""" a0 75"
  • 24. Evalua6on  EvaluaIon" p⇡ 1" 2" Simulate"the"acIon"by"" rollout"policy"network""""""""."p⇡ Evaluate""""""""""""""by"value"network""""""."v✓(s0 ) v✓ r(sT ) v✓(s0 ) When"reaching"terminal""""""","" calculate"the"reward"""""""""""""."" sT r(sT ) 76"
  • 26. Distribute  search  through  GPUs  Distributed"Search"" p⇡ r(sT ) v✓(s0 ) p (a0 |s0 ) Main"search"tree" Master"CPU" Policy"&"value"networks" 176"GPUs" Rollout"policy"networks" 1,202"CPUs"" 78"
  • 27. Apply  trained  networks  to  tasks  with   different  loss  func6on  Takeaways Use+the+networks+trained+for+a+certain+task+(with+different+loss+objectives)+for+several+other+ta
  • 28. Single  most  important  takeaway   •  Feature  abstrac6on  is  the  key  component  of   any  machine  learning  algorithm   •  Convolu6onal  neural  networks  are  great  at   automated  feature  abstrac6on  
  • 29. Reference   Silver  et.  al.  Mastering  the  Game  of  Go  with   Deep  Neural  Networks  and  Tree  Search.    Nature.   529,  484–489.  January  2016.    
  • 30. About  the  speaker   Chayan  Chakrabar6   hPps://www.linkedin.com/in/chayanchakrabar6