SlideShare a Scribd company logo
AlphaGo	
  Analysis	
  from	
  Deep	
  
Learning	
  Perspec6ve	
  
Chayan	
  Chakrabar6	
  
July	
  11,	
  2016	
  
Pleasanton,	
  CA	
  
Mastering	
  the	
  game	
  of	
  GO	
  
•  DeepMind	
  problem	
  domain	
  
•  Deep	
  learning	
  and	
  reinforcement	
  learning	
  
concepts	
  
•  Design	
  of	
  AlphaGo	
  
•  Execu6on	
  
GO:	
  perfect	
  informa6on	
  game	
  
All	
  possible	
  GO	
  boards	
  =	
  250150	
  >	
  Number	
  of	
  atoms	
  in	
  the	
  universe	
  	
  	
  
Reduce	
  search	
  space	
  
•  Reduce	
  breadth	
  
– Not	
  all	
  moves	
  are	
  equally	
  likely	
  
– Some	
  moves	
  are	
  bePer	
  
– Leverage	
  moves	
  made	
  by	
  expert	
  players	
  
•  Reduce	
  depth	
  
– Evaluate	
  strength	
  of	
  board	
  (likelihood	
  of	
  winning)	
  
– Collapse	
  symmetrical	
  or	
  similar	
  boards	
  
– Simulate	
  the	
  games	
  
	
  
	
  
Monte	
  Carlo	
  tree	
  search	
  
Supervised	
  learning	
  using	
  neural	
  networks	
  
Convolu6onal	
  neural	
  networks	
  
Encode	
  local	
  or	
  spa6al	
  features	
  
Reinforcement	
  learning	
  
Reinforcement"Learning""
State:" St
Reward"
(Feedback):"Rt
AcIon:"At
•  Feedback"is"delayed."
•  No"supervisor,"only"a"reward"signal."
•  Rules"of"the"game"are"unknown."
Agent"
Environment"
Determinis6c	
  policy	
  
Stochas6c	
  policy	
  
Value:	
  expected	
  long	
  term	
  reward	
  
Monte	
  Carlo	
  tree	
  search	
  combined	
  
with	
  deep	
  neural	
  networks	
  AlphaGo
neural networks
normal MCTS
AlphaGO	
  schema6c	
  architecture	
  
AlphaGo neural networks
selectionevaluation evaluation
Reducing	
  breadth	
  of	
  moves	
  
Predic6ng	
  the	
  move	
  
1.*Reducing*“action*candidates”
(1) Imitating+expert+moves+(supervised+learning)
Expert$Moves$Imitator$Model
(w/$CNN)
Current$Board
Training:
ng*“action*candidates”
+expert+moves+(supervised+learning)
Expert$Moves$Imitator$Model
(w/$CNN)
Next$Action
Training:
ng*“action*candidates”
+expert+moves+(supervised+learning)
Prediction$
Model
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
g:$s ! p(a|s) p(a|s) aargmax
Next$Action
Reducing*“action*candidates”
Imitating+expert+moves+(supervised+learning)
Expert$Moves$Imitator$Model
(w/$CNN)
nt$Board Next$A
Training:
Two	
  kinds	
  of	
  policies	
  
● used a large database of online expert games
● learned two versions of the neural network
○ a fast network P for use in evaluation
○ an accurate network P for use in selection
Step 1: learn to predict human moves
CS63 topic
neural networks
week 7, 14?
Further	
  reduce	
  search	
  space	
  Symmetries"
Input"
RotaIon""
90"degrees"
RotaIon""
180"degrees"
RotaIon""
270"degrees"
VerIcal"
reflecIon"
VerIcal"
reflecIon"
VerIcal"
reflecIon"
VerIcal"
reflecIon"
Reduce	
  depth	
  by	
  board	
  evalua6on	
  
Updated$Model
ver 1,000,000
Board$Position
Training:
Value$
Predictio
Model
(Regressio
Evaluation
Updated$Model
W
Value$
Prediction$
Adds$a reg
Predicts$v
Close$to$1
Close$to$0
Win$/$Loss
e$
Adds$a regression$layer$to$the$model
Predicts$values$between$0~1
Close$to$1:$a$good$board$position
Close$to$0:$a$bad$board$position
aluation
Updated$Model
ver 1,000,000
Training:
Win$/$Loss
Win
(0~1)
Value$
Prediction$
Model
(Regression)
Adds$a regression$layer$to$the$model
Predicts$values$between$0~1
Close$to$1:$a$good$board$position
Close$to$0:$a$bad$board$position
Value	
  follows	
  from	
  policy	
  
Step 3: learn a board evaluation network, V
● use random samples from the self-play database
● prediction target: probability that black wins from a
given board
PuWng	
  it	
  all	
  together	
  
Looking*ahead*(w/*Monte*Carlo*Search*Tree)
Action$Candidates$Reduction
(Policy$Network)
Board$Evaluation
(Value$Network)
(Rollout):$Faster$version$of$estimating$p(a|s)
! uses shallow$networks$(3$ms ! 2µs)
Selec6on	
  
Expansion	
  Expansion"
s
a
s0
Insert"the"node"for"the"successor"
state""""".""s0
1"
2"
Nv(s0
, a0
) = Nr(s0
, a0
) = 0
Wr(s0
, a0
) = Wv(s0
, a0
) = 0
P(s0
, a0
) = p (a0
|s0
)
p (a0
|s0
)
If"visit"count"exceed"a"threshold":"
"""""","Nr(s, a) > nthr
a0
a0
For"every"possible"""""","iniIalize"
the"staIsIcs:""""
a0
75"
Evalua6on	
  EvaluaIon"
p⇡
1"
2" Simulate"the"acIon"by""
rollout"policy"network""""""""."p⇡
Evaluate""""""""""""""by"value"network""""""."v✓(s0
) v✓
r(sT )
v✓(s0
)
When"reaching"terminal""""""",""
calculate"the"reward""""""""""""".""
sT
r(sT )
76"
Backup	
  
Distribute	
  search	
  through	
  GPUs	
  Distributed"Search""
p⇡
r(sT )
v✓(s0
)
p (a0
|s0
)
Main"search"tree"
Master"CPU"
Policy"&"value"networks"
176"GPUs"
Rollout"policy"networks"
1,202"CPUs""
78"
Apply	
  trained	
  networks	
  to	
  tasks	
  with	
  
different	
  loss	
  func6on	
  Takeaways
Use+the+networks+trained+for+a+certain+task+(with+different+loss+objectives)+for+several+other+ta
Single	
  most	
  important	
  takeaway	
  
•  Feature	
  abstrac6on	
  is	
  the	
  key	
  component	
  of	
  
any	
  machine	
  learning	
  algorithm	
  
•  Convolu6onal	
  neural	
  networks	
  are	
  great	
  at	
  
automated	
  feature	
  abstrac6on	
  
Reference	
  
Silver	
  et.	
  al.	
  Mastering	
  the	
  Game	
  of	
  Go	
  with	
  
Deep	
  Neural	
  Networks	
  and	
  Tree	
  Search.	
  	
  Nature.	
  
529,	
  484–489.	
  January	
  2016.	
  
	
  
About	
  the	
  speaker	
  
Chayan	
  Chakrabar6	
  
hPps://www.linkedin.com/in/chayanchakrabar6	
  
	
  

More Related Content

What's hot

AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
Joonhyung Lee
 
(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)
MeetupDataScienceRoma
 
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeAlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
Joonhyung Lee
 
ModuLab DLC-Medical3
ModuLab DLC-Medical3ModuLab DLC-Medical3
ModuLab DLC-Medical3
Dongheon Lee
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
Olivier Teytaud
 
Deliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement MethodsDeliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement Methods
Ruofei Du
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習
Mark Chang
 
AlphaGo Zero Introduction
AlphaGo Zero IntroductionAlphaGo Zero Introduction
AlphaGo Zero Introduction
友誠 張
 
Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017
Julian Lee
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
Mark Chang
 
A Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningA Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep Learning
Suntae Kim
 
Deep Reasoning
Deep ReasoningDeep Reasoning
Deep Reasoning
Taehoon Kim
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)
Dongheon Lee
 
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
หัดเขียน A.I. แบบ AlphaGo กันชิวๆหัดเขียน A.I. แบบ AlphaGo กันชิวๆ
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
Kan Ouivirach, Ph.D.
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection
James Huang
 

What's hot (15)

AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
 
(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)
 
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeAlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
 
ModuLab DLC-Medical3
ModuLab DLC-Medical3ModuLab DLC-Medical3
ModuLab DLC-Medical3
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
 
Deliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement MethodsDeliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement Methods
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習
 
AlphaGo Zero Introduction
AlphaGo Zero IntroductionAlphaGo Zero Introduction
AlphaGo Zero Introduction
 
Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
 
A Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningA Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep Learning
 
Deep Reasoning
Deep ReasoningDeep Reasoning
Deep Reasoning
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)
 
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
หัดเขียน A.I. แบบ AlphaGo กันชิวๆหัดเขียน A.I. แบบ AlphaGo กันชิวๆ
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection
 

Similar to Chakrabarti alpha go analysis

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
Tim Riser
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
재연 윤
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
AdityaSuryavamshi
 
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Chuyang Liu
 
Understanding AlphaGo
Understanding AlphaGoUnderstanding AlphaGo
Understanding AlphaGo
Amit Mandelbaum
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018
Juantomás García Molina
 
Games.4
Games.4Games.4
Games.4
Praveen Kumar
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018
Apache MXNet
 
GamePlaying.ppt
GamePlaying.pptGamePlaying.ppt
GamePlaying.ppt
VihaanN2
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates Itself
Mohammad Shaker
 
AlphaGo zero
AlphaGo zeroAlphaGo zero
AlphaGo zero
Dong Guo
 
Adversarial search with Game Playing
Adversarial search with Game PlayingAdversarial search with Game Playing
Adversarial search with Game Playing
Aman Patel
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
Khaled Saleh
 
La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunication
Alexandre Monnin
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
영우 김
 
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB
 
Devoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingDevoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game Playing
Richard Abbuhl
 
MongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB
 
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific NoveltyCuriosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Hyunwoo Kim
 
ch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.ppt
SanGeet25
 

Similar to Chakrabarti alpha go analysis (20)

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
 
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
 
Understanding AlphaGo
Understanding AlphaGoUnderstanding AlphaGo
Understanding AlphaGo
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018
 
Games.4
Games.4Games.4
Games.4
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018
 
GamePlaying.ppt
GamePlaying.pptGamePlaying.ppt
GamePlaying.ppt
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates Itself
 
AlphaGo zero
AlphaGo zeroAlphaGo zero
AlphaGo zero
 
Adversarial search with Game Playing
Adversarial search with Game PlayingAdversarial search with Game Playing
Adversarial search with Game Playing
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunication
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
 
Devoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingDevoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game Playing
 
MongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
 
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific NoveltyCuriosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
 
ch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.ppt
 

Recently uploaded

How To Create a Transient Model in Odoo 17
How To Create a Transient Model in Odoo 17How To Create a Transient Model in Odoo 17
How To Create a Transient Model in Odoo 17
Celine George
 
The Cruelty of Animal Testing in the Industry.pdf
The Cruelty of Animal Testing in the Industry.pdfThe Cruelty of Animal Testing in the Industry.pdf
The Cruelty of Animal Testing in the Industry.pdf
luzmilaglez334
 
Odoo 17 Social Marketing - Lead Generation On Facebook
Odoo 17 Social Marketing - Lead Generation On FacebookOdoo 17 Social Marketing - Lead Generation On Facebook
Odoo 17 Social Marketing - Lead Generation On Facebook
Celine George
 
Webinar Innovative assessments for SOcial Emotional Skills
Webinar Innovative assessments for SOcial Emotional SkillsWebinar Innovative assessments for SOcial Emotional Skills
Webinar Innovative assessments for SOcial Emotional Skills
EduSkills OECD
 
C Interview Questions PDF By Scholarhat.pdf
C Interview Questions PDF By Scholarhat.pdfC Interview Questions PDF By Scholarhat.pdf
C Interview Questions PDF By Scholarhat.pdf
Scholarhat
 
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Murugan Solaiyappan
 
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUMENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
HappieMontevirgenCas
 
How to Manage Line Discount in Odoo 17 POS
How to Manage Line Discount in Odoo 17 POSHow to Manage Line Discount in Odoo 17 POS
How to Manage Line Discount in Odoo 17 POS
Celine George
 
Individual Performance Commitment Review Form-Developmental Plan.docx
Individual Performance Commitment Review Form-Developmental Plan.docxIndividual Performance Commitment Review Form-Developmental Plan.docx
Individual Performance Commitment Review Form-Developmental Plan.docx
monicaaringo1
 
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdfThe Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
JackieSparrow3
 
Edukasyong Pantahanan at Pangkabuhayan 1: Personal Hygiene
Edukasyong Pantahanan at  Pangkabuhayan 1: Personal HygieneEdukasyong Pantahanan at  Pangkabuhayan 1: Personal Hygiene
Edukasyong Pantahanan at Pangkabuhayan 1: Personal Hygiene
MJDuyan
 
Bedok NEWater Photostory - COM322 Assessment (Story 2)
Bedok NEWater Photostory - COM322 Assessment (Story 2)Bedok NEWater Photostory - COM322 Assessment (Story 2)
Bedok NEWater Photostory - COM322 Assessment (Story 2)
Liyana Rozaini
 
C# Interview Questions PDF By ScholarHat.pdf
C# Interview Questions PDF By ScholarHat.pdfC# Interview Questions PDF By ScholarHat.pdf
C# Interview Questions PDF By ScholarHat.pdf
Scholarhat
 
How to Manage Early Receipt Printing in Odoo 17 POS
How to Manage Early Receipt Printing in Odoo 17 POSHow to Manage Early Receipt Printing in Odoo 17 POS
How to Manage Early Receipt Printing in Odoo 17 POS
Celine George
 
Howe Writing Center - Orientation Summer 2024
Howe Writing Center - Orientation Summer 2024Howe Writing Center - Orientation Summer 2024
Howe Writing Center - Orientation Summer 2024
Elizabeth Walsh
 
formative Evaluation By Dr.Kshirsagar R.V
formative Evaluation By Dr.Kshirsagar R.Vformative Evaluation By Dr.Kshirsagar R.V
formative Evaluation By Dr.Kshirsagar R.V
DrRavindrakshirsagar1
 
modul ajar kelas x bahasa inggris 2024-2025
modul ajar kelas x bahasa inggris 2024-2025modul ajar kelas x bahasa inggris 2024-2025
modul ajar kelas x bahasa inggris 2024-2025
NurFitriah45
 
New Features in Odoo 17 Sign - Odoo 17 Slides
New Features in Odoo 17 Sign - Odoo 17 SlidesNew Features in Odoo 17 Sign - Odoo 17 Slides
New Features in Odoo 17 Sign - Odoo 17 Slides
Celine George
 
Genetics Teaching Plan: Dr.Kshirsagar R.V.
Genetics Teaching Plan: Dr.Kshirsagar R.V.Genetics Teaching Plan: Dr.Kshirsagar R.V.
Genetics Teaching Plan: Dr.Kshirsagar R.V.
DrRavindrakshirsagar1
 
How to Create a New Article in Knowledge App in Odoo 17
How to Create a New Article in Knowledge App in Odoo 17How to Create a New Article in Knowledge App in Odoo 17
How to Create a New Article in Knowledge App in Odoo 17
Celine George
 

Recently uploaded (20)

How To Create a Transient Model in Odoo 17
How To Create a Transient Model in Odoo 17How To Create a Transient Model in Odoo 17
How To Create a Transient Model in Odoo 17
 
The Cruelty of Animal Testing in the Industry.pdf
The Cruelty of Animal Testing in the Industry.pdfThe Cruelty of Animal Testing in the Industry.pdf
The Cruelty of Animal Testing in the Industry.pdf
 
Odoo 17 Social Marketing - Lead Generation On Facebook
Odoo 17 Social Marketing - Lead Generation On FacebookOdoo 17 Social Marketing - Lead Generation On Facebook
Odoo 17 Social Marketing - Lead Generation On Facebook
 
Webinar Innovative assessments for SOcial Emotional Skills
Webinar Innovative assessments for SOcial Emotional SkillsWebinar Innovative assessments for SOcial Emotional Skills
Webinar Innovative assessments for SOcial Emotional Skills
 
C Interview Questions PDF By Scholarhat.pdf
C Interview Questions PDF By Scholarhat.pdfC Interview Questions PDF By Scholarhat.pdf
C Interview Questions PDF By Scholarhat.pdf
 
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
Lecture_Notes_Unit4_Chapter_8_9_10_RDBMS for the students affiliated by alaga...
 
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUMENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
ENGLISH-7-CURRICULUM MAP- MATATAG CURRICULUM
 
How to Manage Line Discount in Odoo 17 POS
How to Manage Line Discount in Odoo 17 POSHow to Manage Line Discount in Odoo 17 POS
How to Manage Line Discount in Odoo 17 POS
 
Individual Performance Commitment Review Form-Developmental Plan.docx
Individual Performance Commitment Review Form-Developmental Plan.docxIndividual Performance Commitment Review Form-Developmental Plan.docx
Individual Performance Commitment Review Form-Developmental Plan.docx
 
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdfThe Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
The Jewish Trinity : Sabbath,Shekinah and Sanctuary 4.pdf
 
Edukasyong Pantahanan at Pangkabuhayan 1: Personal Hygiene
Edukasyong Pantahanan at  Pangkabuhayan 1: Personal HygieneEdukasyong Pantahanan at  Pangkabuhayan 1: Personal Hygiene
Edukasyong Pantahanan at Pangkabuhayan 1: Personal Hygiene
 
Bedok NEWater Photostory - COM322 Assessment (Story 2)
Bedok NEWater Photostory - COM322 Assessment (Story 2)Bedok NEWater Photostory - COM322 Assessment (Story 2)
Bedok NEWater Photostory - COM322 Assessment (Story 2)
 
C# Interview Questions PDF By ScholarHat.pdf
C# Interview Questions PDF By ScholarHat.pdfC# Interview Questions PDF By ScholarHat.pdf
C# Interview Questions PDF By ScholarHat.pdf
 
How to Manage Early Receipt Printing in Odoo 17 POS
How to Manage Early Receipt Printing in Odoo 17 POSHow to Manage Early Receipt Printing in Odoo 17 POS
How to Manage Early Receipt Printing in Odoo 17 POS
 
Howe Writing Center - Orientation Summer 2024
Howe Writing Center - Orientation Summer 2024Howe Writing Center - Orientation Summer 2024
Howe Writing Center - Orientation Summer 2024
 
formative Evaluation By Dr.Kshirsagar R.V
formative Evaluation By Dr.Kshirsagar R.Vformative Evaluation By Dr.Kshirsagar R.V
formative Evaluation By Dr.Kshirsagar R.V
 
modul ajar kelas x bahasa inggris 2024-2025
modul ajar kelas x bahasa inggris 2024-2025modul ajar kelas x bahasa inggris 2024-2025
modul ajar kelas x bahasa inggris 2024-2025
 
New Features in Odoo 17 Sign - Odoo 17 Slides
New Features in Odoo 17 Sign - Odoo 17 SlidesNew Features in Odoo 17 Sign - Odoo 17 Slides
New Features in Odoo 17 Sign - Odoo 17 Slides
 
Genetics Teaching Plan: Dr.Kshirsagar R.V.
Genetics Teaching Plan: Dr.Kshirsagar R.V.Genetics Teaching Plan: Dr.Kshirsagar R.V.
Genetics Teaching Plan: Dr.Kshirsagar R.V.
 
How to Create a New Article in Knowledge App in Odoo 17
How to Create a New Article in Knowledge App in Odoo 17How to Create a New Article in Knowledge App in Odoo 17
How to Create a New Article in Knowledge App in Odoo 17
 

Chakrabarti alpha go analysis

  • 1. AlphaGo  Analysis  from  Deep   Learning  Perspec6ve   Chayan  Chakrabar6   July  11,  2016   Pleasanton,  CA  
  • 2. Mastering  the  game  of  GO   •  DeepMind  problem  domain   •  Deep  learning  and  reinforcement  learning   concepts   •  Design  of  AlphaGo   •  Execu6on  
  • 3. GO:  perfect  informa6on  game   All  possible  GO  boards  =  250150  >  Number  of  atoms  in  the  universe      
  • 4. Reduce  search  space   •  Reduce  breadth   – Not  all  moves  are  equally  likely   – Some  moves  are  bePer   – Leverage  moves  made  by  expert  players   •  Reduce  depth   – Evaluate  strength  of  board  (likelihood  of  winning)   – Collapse  symmetrical  or  similar  boards   – Simulate  the  games      
  • 5. Monte  Carlo  tree  search  
  • 6. Supervised  learning  using  neural  networks  
  • 8. Encode  local  or  spa6al  features  
  • 9. Reinforcement  learning   Reinforcement"Learning"" State:" St Reward" (Feedback):"Rt AcIon:"At •  Feedback"is"delayed." •  No"supervisor,"only"a"reward"signal." •  Rules"of"the"game"are"unknown." Agent" Environment"
  • 12. Value:  expected  long  term  reward  
  • 13. Monte  Carlo  tree  search  combined   with  deep  neural  networks  AlphaGo neural networks normal MCTS
  • 14. AlphaGO  schema6c  architecture   AlphaGo neural networks selectionevaluation evaluation
  • 16. Predic6ng  the  move   1.*Reducing*“action*candidates” (1) Imitating+expert+moves+(supervised+learning) Expert$Moves$Imitator$Model (w/$CNN) Current$Board Training: ng*“action*candidates” +expert+moves+(supervised+learning) Expert$Moves$Imitator$Model (w/$CNN) Next$Action Training: ng*“action*candidates” +expert+moves+(supervised+learning) Prediction$ Model 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 g:$s ! p(a|s) p(a|s) aargmax Next$Action Reducing*“action*candidates” Imitating+expert+moves+(supervised+learning) Expert$Moves$Imitator$Model (w/$CNN) nt$Board Next$A Training:
  • 17. Two  kinds  of  policies   ● used a large database of online expert games ● learned two versions of the neural network ○ a fast network P for use in evaluation ○ an accurate network P for use in selection Step 1: learn to predict human moves CS63 topic neural networks week 7, 14?
  • 18. Further  reduce  search  space  Symmetries" Input" RotaIon"" 90"degrees" RotaIon"" 180"degrees" RotaIon"" 270"degrees" VerIcal" reflecIon" VerIcal" reflecIon" VerIcal" reflecIon" VerIcal" reflecIon"
  • 19. Reduce  depth  by  board  evalua6on   Updated$Model ver 1,000,000 Board$Position Training: Value$ Predictio Model (Regressio Evaluation Updated$Model W Value$ Prediction$ Adds$a reg Predicts$v Close$to$1 Close$to$0 Win$/$Loss e$ Adds$a regression$layer$to$the$model Predicts$values$between$0~1 Close$to$1:$a$good$board$position Close$to$0:$a$bad$board$position aluation Updated$Model ver 1,000,000 Training: Win$/$Loss Win (0~1) Value$ Prediction$ Model (Regression) Adds$a regression$layer$to$the$model Predicts$values$between$0~1 Close$to$1:$a$good$board$position Close$to$0:$a$bad$board$position
  • 20. Value  follows  from  policy   Step 3: learn a board evaluation network, V ● use random samples from the self-play database ● prediction target: probability that black wins from a given board
  • 21. PuWng  it  all  together   Looking*ahead*(w/*Monte*Carlo*Search*Tree) Action$Candidates$Reduction (Policy$Network) Board$Evaluation (Value$Network) (Rollout):$Faster$version$of$estimating$p(a|s) ! uses shallow$networks$(3$ms ! 2µs)
  • 23. Expansion  Expansion" s a s0 Insert"the"node"for"the"successor" state""""".""s0 1" 2" Nv(s0 , a0 ) = Nr(s0 , a0 ) = 0 Wr(s0 , a0 ) = Wv(s0 , a0 ) = 0 P(s0 , a0 ) = p (a0 |s0 ) p (a0 |s0 ) If"visit"count"exceed"a"threshold":" """""","Nr(s, a) > nthr a0 a0 For"every"possible"""""","iniIalize" the"staIsIcs:"""" a0 75"
  • 24. Evalua6on  EvaluaIon" p⇡ 1" 2" Simulate"the"acIon"by"" rollout"policy"network""""""""."p⇡ Evaluate""""""""""""""by"value"network""""""."v✓(s0 ) v✓ r(sT ) v✓(s0 ) When"reaching"terminal""""""","" calculate"the"reward"""""""""""""."" sT r(sT ) 76"
  • 26. Distribute  search  through  GPUs  Distributed"Search"" p⇡ r(sT ) v✓(s0 ) p (a0 |s0 ) Main"search"tree" Master"CPU" Policy"&"value"networks" 176"GPUs" Rollout"policy"networks" 1,202"CPUs"" 78"
  • 27. Apply  trained  networks  to  tasks  with   different  loss  func6on  Takeaways Use+the+networks+trained+for+a+certain+task+(with+different+loss+objectives)+for+several+other+ta
  • 28. Single  most  important  takeaway   •  Feature  abstrac6on  is  the  key  component  of   any  machine  learning  algorithm   •  Convolu6onal  neural  networks  are  great  at   automated  feature  abstrac6on  
  • 29. Reference   Silver  et.  al.  Mastering  the  Game  of  Go  with   Deep  Neural  Networks  and  Tree  Search.    Nature.   529,  484–489.  January  2016.    
  • 30. About  the  speaker   Chayan  Chakrabar6   hPps://www.linkedin.com/in/chayanchakrabar6