SlideShare a Scribd company logo
1 of 37
UNDERSTANDING ALPHA GO
How Deep Learning Made the Impossible Possible
ABOUT MYSELF
 Ms.c. In computer Science, HUJI
 Research interest: Deep Learning in Computer
Vision, NLP, Reinforcement learning.
 Also, DL Theory and other ML stuff.
 Works in a DL start-up (Imubit)
 Contact: mangate@gmail.com
CREDITS
 A lot of slides were taken from the following publicly
available slideshows:
 https://www.slideshare.net/ShaneSeungwhanMoon/how-
alphago-works
 https://www.slideshare.net/ckmarkohchang/alphago-in-depth
 https://www.slideshare.net/KarelHa1/alphago-mastering-the-
game-of-go-with-deep-neural-networks-and-tree-search
 Original AlphaGo article:
Silver, David, et al. "Mastering the game of Go with
deep neural networks and tree search.“Nature 529.7587
(2016): 484-489.
Available here:
http://web.iitd.ac.in/~sumeet/Silver16.pdf
DEEP LEARNING IS CHANGING OUR LIVES
 Search Engine (also for images and audio)
 Spam filters
 Recommender systems (Netflix, Youtube)
 Self-Driving Cars
 Cyber security (and regular one via computer
vision)
 Machine translation.
 Speech to text, audio recognition.
 Image recognition, smart shopping
 And more and more and more…
AI VERSUS HUMAN
 In 1997, a super computer called Deep Blue (IBM) won Garry
Kasparov.
 This was the first defeat of a reigning world chess champion
by a computer under tournament conditions.
AI VERSUS HUMAN
 In 2011 Watson, another super-computer by IBM, “crashed”
the 2 best player in Jepoerdy, a popular question-answering
tv-show.
GO
 An ancient Chinese Game
(2,500 years old!)
 Despite its relatively simple
rules, Go is very complex,
even more so than chess.
 Winning Go requires a
great deal of intuition and
therefore was considered
unachievable by computer for at least the next 30
years.
AI VESUS HUMAN
 In 2016 a AlphaGo, a computer program by
DeepMind (part of Google) played a 5-games Go
match aginst Lee Sedol.
 Lee Sedol:
 professional 9-Dan (highest ranking in Go) considered
among top 3 players in the world.
 2nd in international titles.
 Won 97 out of 100 games
against european Go
champion Fan Hui.
AI VERSUS HUMAN
 “I’m confident that I can win, at least this time” – Lee Sedol
 Alpha Go won 4-1
 “I kind of felt powerless… misjudged the capabilities of
AlphaGo” – Lee Sedol
 How is it possible? Deep Learning.
AI IN GAME PLAYING
 Almost every game can be “simulated” with a tree search.
 A move is done if it has to most chances to end in a victory.
AI IN GAMES
 More formally: an optimal value function V*(s)
determines the outcome of the game:
 From every board position (state=s)
 Under perfect play by all players.
 This is done by going over the tree containing
possible move sequences where:
 b is the games breadth (number of legal moves in each
position)
 d is the game depth (game length in moves)
 Tic-Tac-Toe:
 Chess:
d
b
4, 4b d 
35 80b d 
TREE SEARCH IN GO
 However in GO:
 This is more than the number of atoms in the entire universe!
 Go Is more complex than chess!
250, 150b d 
100
10 ( )Googol
KEY: REDUCE THE SEARCH SPACE
 Reducing b (possible actions space)
KEY: REDUCE THE SEARCH SPACE
 Reducing d – Position evaluation ahead of time
 Instead of simulating all the way to the end:
Both reductions are done with Deep Learning.
SOME CONCEPTS
 Supervised Learning (classification)
 On a given data, predict a class (or choose 1 option
out of some known number of options)
SOME CONCEPTS
 Supervised Learning (regression)
 On a given data, predict some real number
SOME CONCEPTS
 Reinforcement Learning
 Upon given state (observation) perform some
action which leads to the goal (i.e. winning a game)
SOME CONCEPTS
 CNN’s are able to learn abstract features of a given image
REDUCING ACTION CANDIDATES
 Done by learning to “imitate” expert moves
 Data: Online Go experts. 160K Games 300M moves.
 This is supervised classification (on given data predict the
expert action out of all possible ones)
REDUCING ACTION CANDIDATES
 This deep CNN achieved 55% test accuracy on predicting
expert moves.
 Imitators with no Deep Learning reached only 22% accuracy.
 Small improvement in accuracy lead to big improvement in
playing ability.
ROLLOUT NETWORK
 Train additional smaller network
(Ppi ) for imitating.
 This network achieves only 24.2%
accuracy.
 Works 1000 times faster (2us
compared to 3ms).
 This network is used for rollouts
(explained later).
IMPROVING THE NETWORK
 Improve the imitator network through self playing
(Reinforcement learning)
 An entire game is played and the parameters are
updates according to the results.
IMPROVING THE NETWORK
 Keep generating better models by self-play newer models
against old ones
 The final network also won 85% against the best GO software
(model without self play won only 11%)
 However, the model was eventually not used during the
games. It was used to generate the value function.
REDUCING SEARCH DEPTH - DATASET
 Self-play with the imitator model for some steps (0
to 450).
 Make some random move. This is the starting
position ‘s’.
 Self play until the end with the RL network (latest
model).
 If black won z=1 otherwise z=0.
 Save (s,z) to the dataset.
 Generated 30M (s,z) pairs from 30M games.
REDUCING SEARCH DEPTH –
VALUE FUNCTION
 Regression task, for a given position S give number between
0 and 1.
 Now, for each possible position we can have an evaluation of
how “good” it is for the black player.
REDUCING SEARCH SPACE
PUTTING IT ALL TOGETHER - MCST
 During game time a method called Monte-Carlo
Search Tree (MCTS) is applied.
 This method have 4 steps:
 Selection
 Expansion
 Evaluation
 Backup (update)
 For each play in the game this process is repeated
about 10K times.
MCTS - SELECTION
 At each step we have a starting
position (the board at this point).
 An action is selected
using a combination of the imitator
network and some other value
(Q) which is set to 0 at the start.
 we divide by the
times a state/action pair was
visited to encourage diversity.
( , )
( )
1 ( , )
P s a
u p
N s a


MCTS - EXPANSION
 When building the tree,
position can be expended once
(create new leafs in the tree)
with the imitator network.
 This way we have the new u(P)
for the next searches.
MCTS - EVALUATION
 After simulating 3-4 steps
with the imitating network
we evaluate the board
position.
 This is done in two ways:
 The value network prediction.
 Using the smaller imitator
network to self-play to the end
(rollout), and save the result
(1 for black win 0 for white)
 Both evaluation are combined
to give this board position a
number between 0 and 1.
MCTS – BACKUP (UPDATE)
 After the simulation we
update the tree.
 Update Q (which was
0 in the beginning) with
the value computed with
the value network and the
rollouts.
 Update N(s,a): Increase
by one for each
state/action pair visited.
CHOOSING AN ACTION
 For each step during the game MCTS is done for
10K times.
 In the end the action which was visited the most
times from the root position (the current board) is
taken.
 Notes:
 Since this process is long they had to use the smaller
network for rollouts to keep it feasible (otherwise each
move would have taken the computer several days to
compute).
 The imitator network was better in choosing the first
actions compared to the RL network, probably due to
human taking more diverse actions.
ALPHA GO WEAKNESSES
 In the 4th game, Lee Sedol got the board to a
position which was not on Alpha Go search tree,
causing the program to choose worse actions and
losing the game eventually.
 Most assumptions made for Alpha-Go are not
relevant in real life RL problems. See:
https://medium.com/@karpathy/alphago-in-context-
c47718cb95a5
RETIREMENT
 In March 2017 alpha go won Ke Jie, the 1st ranked in the
world, 3-0.
 Google’s DeepMind unit announced that it would be the last
event match the AI plays.
SUMMARY
 To this day, AlphaGo is considered one of the greatest AI
achievements in recent history.
 This achievement was made by combining Deep
Learning with standard method (like MCST) to “simplify”
the very complex game of Go.
 4 Deep Neural Networks were used:
 3 almost identical Convolutional Neural Network:
 Imitating network for action space reduction.
 RL network created through self-play, for generating the dataset
for the value network.
 Value network for search depth reduction.
 1 small network for rollouts.
 Deep Learning keeps achieving new amazing goals
every day, and is one of the fastest growing fields in
both academy and industry.
QUESTIONS?
Thank you!

More Related Content

What's hot

MuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupMuZero - ML + Security Reading Group
MuZero - ML + Security Reading Group
Kim Hammar
 
이희영, 온라인 게임에서 모바일 게임으로 이어지는 메타플레이 트렌드, NDC2017
이희영, 온라인 게임에서 모바일 게임으로 이어지는 메타플레이 트렌드, NDC2017이희영, 온라인 게임에서 모바일 게임으로 이어지는 메타플레이 트렌드, NDC2017
이희영, 온라인 게임에서 모바일 게임으로 이어지는 메타플레이 트렌드, NDC2017
devCAT Studio, NEXON
 
Audiences and video games
Audiences and video gamesAudiences and video games
Audiences and video games
HeworthMedia1
 

What's hot (20)

Chess Engine Programming
Chess Engine ProgrammingChess Engine Programming
Chess Engine Programming
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
 
알파고 해부하기 2부
알파고 해부하기 2부알파고 해부하기 2부
알파고 해부하기 2부
 
AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약
 
AlphaGo Zero Introduction
AlphaGo Zero IntroductionAlphaGo Zero Introduction
AlphaGo Zero Introduction
 
MuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupMuZero - ML + Security Reading Group
MuZero - ML + Security Reading Group
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
알파고 해부하기 3부
알파고 해부하기 3부알파고 해부하기 3부
알파고 해부하기 3부
 
이희영, 온라인 게임에서 모바일 게임으로 이어지는 메타플레이 트렌드, NDC2017
이희영, 온라인 게임에서 모바일 게임으로 이어지는 메타플레이 트렌드, NDC2017이희영, 온라인 게임에서 모바일 게임으로 이어지는 메타플레이 트렌드, NDC2017
이희영, 온라인 게임에서 모바일 게임으로 이어지는 메타플레이 트렌드, NDC2017
 
알파고의 알고리즘
알파고의 알고리즘알파고의 알고리즘
알파고의 알고리즘
 
Audiences and video games
Audiences and video gamesAudiences and video games
Audiences and video games
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
 
[2017 PYCON 튜토리얼]OpenAI Gym을 이용한 강화학습 에이전트 만들기
[2017 PYCON 튜토리얼]OpenAI Gym을 이용한 강화학습 에이전트 만들기[2017 PYCON 튜토리얼]OpenAI Gym을 이용한 강화학습 에이전트 만들기
[2017 PYCON 튜토리얼]OpenAI Gym을 이용한 강화학습 에이전트 만들기
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
 

Similar to Understanding AlphaGo

Similar to Understanding AlphaGo (20)

J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game Playing
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
Devoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingDevoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game Playing
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
 
Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)
 
Badiya haihn
Badiya haihnBadiya haihn
Badiya haihn
 
Adversarial search
Adversarial searchAdversarial search
Adversarial search
 
Mastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: PresentationMastering the game of Go with deep neural networks and tree search: Presentation
Mastering the game of Go with deep neural networks and tree search: Presentation
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysis
 
Ai in games
Ai in gamesAi in games
Ai in games
 
Machine Learning: A gentle Introduction
Machine Learning: A gentle IntroductionMachine Learning: A gentle Introduction
Machine Learning: A gentle Introduction
 
(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)
 
Scaling Deep Learning
Scaling Deep LearningScaling Deep Learning
Scaling Deep Learning
 
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018
 
Google Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research PaperGoogle Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research Paper
 
What did AlphaGo do to beat the strongest human Go player? (Strange Group Ver...
What did AlphaGo do to beat the strongest human Go player? (Strange Group Ver...What did AlphaGo do to beat the strongest human Go player? (Strange Group Ver...
What did AlphaGo do to beat the strongest human Go player? (Strange Group Ver...
 

Recently uploaded

POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Cherry
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Cherry
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Cherry
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Cherry
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 

Recently uploaded (20)

Early Development of Mammals (Mouse and Human).pdf
Early Development of Mammals (Mouse and Human).pdfEarly Development of Mammals (Mouse and Human).pdf
Early Development of Mammals (Mouse and Human).pdf
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptx
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 

Understanding AlphaGo

  • 1. UNDERSTANDING ALPHA GO How Deep Learning Made the Impossible Possible
  • 2. ABOUT MYSELF  Ms.c. In computer Science, HUJI  Research interest: Deep Learning in Computer Vision, NLP, Reinforcement learning.  Also, DL Theory and other ML stuff.  Works in a DL start-up (Imubit)  Contact: mangate@gmail.com
  • 3. CREDITS  A lot of slides were taken from the following publicly available slideshows:  https://www.slideshare.net/ShaneSeungwhanMoon/how- alphago-works  https://www.slideshare.net/ckmarkohchang/alphago-in-depth  https://www.slideshare.net/KarelHa1/alphago-mastering-the- game-of-go-with-deep-neural-networks-and-tree-search  Original AlphaGo article: Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search.“Nature 529.7587 (2016): 484-489. Available here: http://web.iitd.ac.in/~sumeet/Silver16.pdf
  • 4. DEEP LEARNING IS CHANGING OUR LIVES  Search Engine (also for images and audio)  Spam filters  Recommender systems (Netflix, Youtube)  Self-Driving Cars  Cyber security (and regular one via computer vision)  Machine translation.  Speech to text, audio recognition.  Image recognition, smart shopping  And more and more and more…
  • 5. AI VERSUS HUMAN  In 1997, a super computer called Deep Blue (IBM) won Garry Kasparov.  This was the first defeat of a reigning world chess champion by a computer under tournament conditions.
  • 6. AI VERSUS HUMAN  In 2011 Watson, another super-computer by IBM, “crashed” the 2 best player in Jepoerdy, a popular question-answering tv-show.
  • 7. GO  An ancient Chinese Game (2,500 years old!)  Despite its relatively simple rules, Go is very complex, even more so than chess.  Winning Go requires a great deal of intuition and therefore was considered unachievable by computer for at least the next 30 years.
  • 8. AI VESUS HUMAN  In 2016 a AlphaGo, a computer program by DeepMind (part of Google) played a 5-games Go match aginst Lee Sedol.  Lee Sedol:  professional 9-Dan (highest ranking in Go) considered among top 3 players in the world.  2nd in international titles.  Won 97 out of 100 games against european Go champion Fan Hui.
  • 9. AI VERSUS HUMAN  “I’m confident that I can win, at least this time” – Lee Sedol  Alpha Go won 4-1  “I kind of felt powerless… misjudged the capabilities of AlphaGo” – Lee Sedol  How is it possible? Deep Learning.
  • 10. AI IN GAME PLAYING  Almost every game can be “simulated” with a tree search.  A move is done if it has to most chances to end in a victory.
  • 11. AI IN GAMES  More formally: an optimal value function V*(s) determines the outcome of the game:  From every board position (state=s)  Under perfect play by all players.  This is done by going over the tree containing possible move sequences where:  b is the games breadth (number of legal moves in each position)  d is the game depth (game length in moves)  Tic-Tac-Toe:  Chess: d b 4, 4b d  35 80b d 
  • 12. TREE SEARCH IN GO  However in GO:  This is more than the number of atoms in the entire universe!  Go Is more complex than chess! 250, 150b d  100 10 ( )Googol
  • 13. KEY: REDUCE THE SEARCH SPACE  Reducing b (possible actions space)
  • 14. KEY: REDUCE THE SEARCH SPACE  Reducing d – Position evaluation ahead of time  Instead of simulating all the way to the end: Both reductions are done with Deep Learning.
  • 15. SOME CONCEPTS  Supervised Learning (classification)  On a given data, predict a class (or choose 1 option out of some known number of options)
  • 16. SOME CONCEPTS  Supervised Learning (regression)  On a given data, predict some real number
  • 17. SOME CONCEPTS  Reinforcement Learning  Upon given state (observation) perform some action which leads to the goal (i.e. winning a game)
  • 18. SOME CONCEPTS  CNN’s are able to learn abstract features of a given image
  • 19. REDUCING ACTION CANDIDATES  Done by learning to “imitate” expert moves  Data: Online Go experts. 160K Games 300M moves.  This is supervised classification (on given data predict the expert action out of all possible ones)
  • 20. REDUCING ACTION CANDIDATES  This deep CNN achieved 55% test accuracy on predicting expert moves.  Imitators with no Deep Learning reached only 22% accuracy.  Small improvement in accuracy lead to big improvement in playing ability.
  • 21. ROLLOUT NETWORK  Train additional smaller network (Ppi ) for imitating.  This network achieves only 24.2% accuracy.  Works 1000 times faster (2us compared to 3ms).  This network is used for rollouts (explained later).
  • 22. IMPROVING THE NETWORK  Improve the imitator network through self playing (Reinforcement learning)  An entire game is played and the parameters are updates according to the results.
  • 23. IMPROVING THE NETWORK  Keep generating better models by self-play newer models against old ones  The final network also won 85% against the best GO software (model without self play won only 11%)  However, the model was eventually not used during the games. It was used to generate the value function.
  • 24. REDUCING SEARCH DEPTH - DATASET  Self-play with the imitator model for some steps (0 to 450).  Make some random move. This is the starting position ‘s’.  Self play until the end with the RL network (latest model).  If black won z=1 otherwise z=0.  Save (s,z) to the dataset.  Generated 30M (s,z) pairs from 30M games.
  • 25. REDUCING SEARCH DEPTH – VALUE FUNCTION  Regression task, for a given position S give number between 0 and 1.  Now, for each possible position we can have an evaluation of how “good” it is for the black player.
  • 27. PUTTING IT ALL TOGETHER - MCST  During game time a method called Monte-Carlo Search Tree (MCTS) is applied.  This method have 4 steps:  Selection  Expansion  Evaluation  Backup (update)  For each play in the game this process is repeated about 10K times.
  • 28. MCTS - SELECTION  At each step we have a starting position (the board at this point).  An action is selected using a combination of the imitator network and some other value (Q) which is set to 0 at the start.  we divide by the times a state/action pair was visited to encourage diversity. ( , ) ( ) 1 ( , ) P s a u p N s a  
  • 29. MCTS - EXPANSION  When building the tree, position can be expended once (create new leafs in the tree) with the imitator network.  This way we have the new u(P) for the next searches.
  • 30. MCTS - EVALUATION  After simulating 3-4 steps with the imitating network we evaluate the board position.  This is done in two ways:  The value network prediction.  Using the smaller imitator network to self-play to the end (rollout), and save the result (1 for black win 0 for white)  Both evaluation are combined to give this board position a number between 0 and 1.
  • 31. MCTS – BACKUP (UPDATE)  After the simulation we update the tree.  Update Q (which was 0 in the beginning) with the value computed with the value network and the rollouts.  Update N(s,a): Increase by one for each state/action pair visited.
  • 32. CHOOSING AN ACTION  For each step during the game MCTS is done for 10K times.  In the end the action which was visited the most times from the root position (the current board) is taken.  Notes:  Since this process is long they had to use the smaller network for rollouts to keep it feasible (otherwise each move would have taken the computer several days to compute).  The imitator network was better in choosing the first actions compared to the RL network, probably due to human taking more diverse actions.
  • 33. ALPHA GO WEAKNESSES  In the 4th game, Lee Sedol got the board to a position which was not on Alpha Go search tree, causing the program to choose worse actions and losing the game eventually.  Most assumptions made for Alpha-Go are not relevant in real life RL problems. See: https://medium.com/@karpathy/alphago-in-context- c47718cb95a5
  • 34. RETIREMENT  In March 2017 alpha go won Ke Jie, the 1st ranked in the world, 3-0.  Google’s DeepMind unit announced that it would be the last event match the AI plays.
  • 35. SUMMARY  To this day, AlphaGo is considered one of the greatest AI achievements in recent history.  This achievement was made by combining Deep Learning with standard method (like MCST) to “simplify” the very complex game of Go.  4 Deep Neural Networks were used:  3 almost identical Convolutional Neural Network:  Imitating network for action space reduction.  RL network created through self-play, for generating the dataset for the value network.  Value network for search depth reduction.  1 small network for rollouts.  Deep Learning keeps achieving new amazing goals every day, and is one of the fastest growing fields in both academy and industry.