SlideShare a Scribd company logo
1 of 32
@rabbuhl#Devoxx #AISelfLearningGamePlaying
AI Self-Learning Game Playing
Richard Abbuhl
ING
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Overview
• Introduction
• Machine Learning? Why Game Playing?
• History Game Playing and Machine Learning
• Machine Learning Basics / Neural Networks
• TD Gammon / Reinforcement Learning
• Tic-Tac-Toe / Q-Learning / Demo
• AlphaGo / AlphaGo Zero
• Useful Links? Questions?
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Introduction
•Thesis: Enhancements to Back-propagation and its Application
to Large-Scale Pattern Recognition Problems
•Product: Database Mining Marksman
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Machine Learning?
Drive a car?
Play a game?
Tell a joke?
Tell a story?
Make a prediction?
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Game Playing: Why?
•Data Warehouse (processed)
•Data Lake (raw)
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Game Playing: ML (50’s – 70’s)
When Who, What Year Book/Article Category
1950’s/60’s Arthur Samuels,
invents alpha-beta
pruning
1959 Some Studies in
Machine Learning
Using the Game
of Checkers
Algorithm (early
AI is search tree)
Frank Rosenblatt,
invents
Perceptrons
1962 Principles of
Neurodynamics
Connectionist
(neural net)
1970’s Marvin Minsky,
criticizes
Perceptrons
(XOR)
1969 Perceptrons Symbolist (Expert
Systems), AI
Winter except
Expert Systems
(until 80’s)
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Game Playing: ML (80’s)
When Who, What Year Book/Article Category
1980’s Rumelhart,
McCelland, Hinton
multi-layer
perceptrons
trained with back-
propagation,
UCSD
1986 Parallel
Distributed
Processing
Connectionist
(neural net),
AI Revived
Christopher
Watkins develops
Q-learning
1989 Q-learning Reinforcement
Learning
Richard Sutton,
and Andrew Barto
1989 Reinforcement
Learning: An
Introduction
Reinforcement
Learning
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Game Playing: ML (90’s)
When Who, What Year Book/Article Category
1990’s RA, Mentor 1991 to 1994 Neural Nets and
Tic-Tac-Toe
Neural network
N. Schraudolph,
HNC, UCSD PhD
Student
1993 Temporal
Difference
Learning and Go
Neural network
Gerald Tesauro,
develops TD-
Gammon
1995 Temporal
Difference
Learning and TD-
Gammon
Connectionist
(neural net) and
Reinforcement
Learning (TD-
lambda)
IBM Deep Blue
chess playing
computer
1997 Deep Blue
Overview, 1997
Brute-force
hardware, used
alpha-beta
minimax search
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Game Playing: ML (00’s and beyond)
When Who, What Year Book/Article Category
2000’s RA / JMentor 2004 Q-Learning and Tic-
Tac-Toe
Reinforcement
Learning (Q-
Learning), QMiniMax
IBM Watson 2011 Jeopardy! Machine Learning,
Nat Lang Processing,
information retrieval
Facebook DeepFace,
Facial Recognition,
Humans 97.53%
correct, DeepFace
97.25% correct
2014 Facebook Creates
Software That
Matches Faces
Almost as Well as
You Do, 2014
Connectionist, Neural
Networks
2016 AlphaGo, beats
human player
2016 Mastering the game
of Go with deep
neural networks and
tree search
RL, Monte Carlo Tree
Search, Machine
Learning, etc.
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Machine Learning Basics
•Basics:
•Machine learning can be implemented using a feed-forward
multi-layer neural network
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Machine Learning Basics
Basics:
•Training is done by presenting two members sets of patterns
to the network:
• Ki = {Ai, Bi}, I = 0,…,p – 1
• Where
Ai = {Xi,0, …, Xi,n-1}
Bi = {Yi,0, …, Yi,m-1}
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Machine Learning Basics
Basics:
• Training is done as follows:
1.Initialize the weights and thresholds
2.Present training set Ki to the network
3.Calculate the forward pass of the network
4.Calculate the desired output
5.Adapt the weights
6.Calculate the error for the training set
7.Repeat by going to step 2 (*)
Training stops when the error for all training sets is less than 0.01 (generalize).
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Machine Learning Basics
Example:
• For the XOR problem the network is:
• 2 inputs, 8 hidden, 1 output
• The training set is defined as:
• 0.0 0.0 0.9
• 0.0 1.0 -0.9
• 1.0 0.0 -0.9
• 1.0 1.0 0.9
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Machine Learning Basics: Kolmogorov’s Theorem
•Robert Hecht-Neilsen (1987): Kolmogorov's mapping
neural network existence theorem
@rabbuhl#Devoxx #AISelfLearningGamePlaying
TD Gammon
•Back-gammon (1995):
http://www.bkgm.com/articles/tesauro/tdl.html
•Tesauro used this approach to teach a multi-layer perceptrons
to play back-gammon
@rabbuhl#Devoxx #AISelfLearningGamePlaying
TD Gammon / Reinforcement Learning
• Back-gammon:
• Tesuaro switched to reinforcement learning and self-learning to teach TD-
gammon to play at a world-class level.
• Version 2.1 used a heuristic 2-ply search in real time
• A 3-ply search would improve its playing ability
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Reinforcement Learning
RL differs from supervised learning where learning is done from examples
provided by a knowledgeable external supervisor.
RL attempts to learn from its own experience, four parts:
• Policy: defines the learning agents way of behaving at a give time,
• Reward function: defines the goal of the RL problem,
• Value function: defines what is good in the long run,
• Model: mimics the behavior of the environment
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Tic-Tac-Toe / Q-Learning
• Tic-Tac-Toe:
• Board size is 3 x 3
• Training is done using Q-Learning.
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Tic-Tac-Toe / Q-Learning
Policy:
•Rule which tells the player which move to make for every
state of the game
Values:
•First, set up a table of numbers, one for each state of the
game
•Each number is the probability of winning from the state
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Tic-Tac-Toe / Q-Learning
We play many games against our opponent:
•We examine states which result from each possible move
•We look up their current values in the table
Most of the time:
•We move greedily and select the move which has the highest
probability of winning
•However, sometimes we randomly select from other moves
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Tic-Tac-Toe / Q-Learning
When we are playing:
• We adjust the states using the temporal difference:
• V(s1) = V(s1) + alpha [V(s2) – V(s1)]
• s1 is the state before the greedy move
• s2 is the state after the move
• Alpha is the step-size parameter which is the rate of learning
• Number of states for Tic-Tac-Toe: 3 ^ 9 = 19,683
• Number of states for Backgammon: 10 ^ 20 = 100,000,000,000,000,000,000
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Tic-Tac-Toe / Demo
JMentor: https://github.com/richardabbuhl/jmentor
Download and build with Maven
• XOR problem: jmentorjbackpropoutjbackprop.bat -p xor.trn xor.xml, jbackprop.bat -p
xor.trn -a xor.xml
• MinMax problem: jmentorjbackpropoutjbackprop.bat -p mm.trn mm.xml, jbackprop.bat -
p mm.trn -a mm.xml
• Look at training set ttt.trn
• Training network to learn TTT: Run a bourne shell: qgosh.sh
• See the results: showbest.sh
• TTT GUI: jmentorjtictactoeguioutjtictactoegui.bat
• Play a game
@rabbuhl#Devoxx #AISelfLearningGamePlaying
AlphaGo
• Go is a game of profound complexity.
• There are
1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,0
00,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,00
0,000,000,000,000,000,000,000 possible positions
• That’s more than the number of atoms in the universe, and more than a
googol times larger than chess.
AlphaGo: using machine learning to master the ancient game of go, Machine Learning 2016
@rabbuhl#Devoxx #AISelfLearningGamePlaying
AlphaGo
AlphaGo combines:
• Advanced Tree Search (monte-carlo tree search)
• Deep Neural Networks (neural networks and reinforcement learning)
Neural Networks:
• One neural network, the “policy network” select the next move to play,
• The other neural network, the “value network” predicts the winner of
the game
@rabbuhl#Devoxx #AISelfLearningGamePlaying
AlphaGo
Neural Networks (policy network):
•Trained using 30 million moves played by human experts
•It could then predict the human moves 57% percent of the time
•The policy networks discovered new strategies by playing lots of
games between the neural networks and improving them using
reinforcement learning
•The improved policy networks without a tree search can beat
state-of-the-art Go programs which use enormous tree searches
@rabbuhl#Devoxx #AISelfLearningGamePlaying
AlphaGo
Neural Networks (value network):
•The policy networks were used to train the value networks
and again improved using reinforcement learning
•The value networks can evaluate a Go position and estimate
an eventual winner
@rabbuhl#Devoxx #AISelfLearningGamePlaying
AlphaGo
Monte-Carlo Tree Search (MCTS)
• AlphaGo combines the policy and search value networks in an MCTS
algorithm that selects actions by lookahead search,
• Evaluating policy and value network requires several orders of
magnitude more computation that traditional search heuristics
Mastering the game of Go with Deep Neural Networks and Tree Search
https://gogameguru.com/i/2016/03/deepmind-mastering-go.pdf
AlphaGo: using machine learning to master the ancient game of go, Machine Learning 2016
@rabbuhl#Devoxx #AISelfLearningGamePlaying
AlphaGo
Monte-Carlo Tree Search (MCTS)
• The final version of AlphaGo used 40 search threads, 48 CPUs, and 8
GPUs.
• The distributed version of AlphaGo using multiple machines, 40 search
threads, 1202 CPUs, and 176 GPUs.
Mastering the game of Go with Deep Neural Networks and Tree Search
https://gogameguru.com/i/2016/03/deepmind-mastering-go.pdf
AlphaGo: using machine learning to master the ancient game of go, Machine Learning 2016
@rabbuhl#Devoxx #AISelfLearningGamePlaying
AlphaGo
For the match against Fan Hui, the researchers used:
• A larger network of computers that spanned about 170 GPU cards
• and 1,200 standard processors, or CPUs.
• This larger computer network both trained the system and played the
actual game, drawing on the results of the training.
In a Huge Breakthrough, Google’s AI Beats a Top Player at the Game of Go, Wired 01.26.16
@rabbuhl#Devoxx #AISelfLearningGamePlaying
AlphaGo Zero
A new version of AlphaGo has emerged:
• AlphaGo learned by using training data from hundreds of thousands of
games played by human experts,
• AlphaGo Zero uses not training data; instead it learns by playing millions
of games against itself and learning by each game to improve.
• No human data is needed any more.
AlphaGo Zero Shows Machines Can Become Superhuman Without Any Help, Intelligent Machines, Will Knight, October 18, 2017.
https://www.technologyreview.com/s/609141/alphago-zero-shows-machines-can-become-superhuman-without-any-help/
@rabbuhl#Devoxx #AISelfLearningGamePlaying
ML Links / Questions?
• Deep Learning 4j: https://deeplearning4j.org/
• Tensor Flow: https://www.tensorflow.org/
• Google Cloud Machine Learning: https://cloud.google.com/ml-engine/
• Azure Machine Learning: https://azure.microsoft.com/en-
us/services/machine-learning/
• Amazon Machine Learning: https://aws.amazon.com/machine-learning/
• Deep Mind: https://deepmind.com/blog/open-sourcing-deepmind-lab/
@rabbuhl#Devoxx #AISelfLearningGamePlaying
Thank You!!

More Related Content

What's hot

Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Jen Aman
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetAmazon Web Services
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Sean Everett
 
KERAS Python Tutorial
KERAS Python TutorialKERAS Python Tutorial
KERAS Python TutorialMahmutKAMALAK
 
AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)
AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)
AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)Chida Chidambaram
 

What's hot (8)

Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016
 
KERAS Python Tutorial
KERAS Python TutorialKERAS Python Tutorial
KERAS Python Tutorial
 
AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)
AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)
AWS re:Invent Deep Learning: Goin Beyond Machine Learning (BDT311)
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
 

Similar to Devoxx 2017 - AI Self-learning Game Playing

J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingRichard Abbuhl
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondNUS-ISS
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?NAVER Engineering
 
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jRobotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jKevin Watters
 
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...Lucidworks
 
Understanding and improving games through machine learning - Natasha Latysheva
Understanding and improving games through machine learning - Natasha LatyshevaUnderstanding and improving games through machine learning - Natasha Latysheva
Understanding and improving games through machine learning - Natasha LatyshevaLauren Cormack
 
Adversarial search with Game Playing
Adversarial search with Game PlayingAdversarial search with Game Playing
Adversarial search with Game PlayingAman Patel
 
Machine Learning with TensorFlow 2
Machine Learning with TensorFlow 2Machine Learning with TensorFlow 2
Machine Learning with TensorFlow 2Sarah Stemmler
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningS N
 
Supersize your production pipe enjmin 2013 v1.1 hd
Supersize your production pipe    enjmin 2013 v1.1 hdSupersize your production pipe    enjmin 2013 v1.1 hd
Supersize your production pipe enjmin 2013 v1.1 hdslantsixgames
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksHannes Hapke
 
transferlearning.pptx
transferlearning.pptxtransferlearning.pptx
transferlearning.pptxAmit Kumar
 
LearningKit.ppt
LearningKit.pptLearningKit.ppt
LearningKit.pptbutest
 
convolutional neural networks, swift and iOS 11
convolutional neural networks, swift and iOS 11convolutional neural networks, swift and iOS 11
convolutional neural networks, swift and iOS 11Brett Koonce
 
Password Storage Sucks!
Password Storage Sucks!Password Storage Sucks!
Password Storage Sucks!nerdybeardo
 
Machine Learning Models on Mobile Devices
Machine Learning Models on Mobile DevicesMachine Learning Models on Mobile Devices
Machine Learning Models on Mobile DevicesLars Gregori
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfMohammad Shaker
 

Similar to Devoxx 2017 - AI Self-learning Game Playing (20)

J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game Playing
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?
 
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jRobotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
 
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
 
Understanding and improving games through machine learning - Natasha Latysheva
Understanding and improving games through machine learning - Natasha LatyshevaUnderstanding and improving games through machine learning - Natasha Latysheva
Understanding and improving games through machine learning - Natasha Latysheva
 
Adversarial search with Game Playing
Adversarial search with Game PlayingAdversarial search with Game Playing
Adversarial search with Game Playing
 
Machine Learning with TensorFlow 2
Machine Learning with TensorFlow 2Machine Learning with TensorFlow 2
Machine Learning with TensorFlow 2
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
 
Supersize your production pipe enjmin 2013 v1.1 hd
Supersize your production pipe    enjmin 2013 v1.1 hdSupersize your production pipe    enjmin 2013 v1.1 hd
Supersize your production pipe enjmin 2013 v1.1 hd
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
transferlearning.pptx
transferlearning.pptxtransferlearning.pptx
transferlearning.pptx
 
LearningKit.ppt
LearningKit.pptLearningKit.ppt
LearningKit.ppt
 
convolutional neural networks, swift and iOS 11
convolutional neural networks, swift and iOS 11convolutional neural networks, swift and iOS 11
convolutional neural networks, swift and iOS 11
 
Understanding AlphaGo
Understanding AlphaGoUnderstanding AlphaGo
Understanding AlphaGo
 
Password Storage Sucks!
Password Storage Sucks!Password Storage Sucks!
Password Storage Sucks!
 
Machine Learning Models on Mobile Devices
Machine Learning Models on Mobile DevicesMachine Learning Models on Mobile Devices
Machine Learning Models on Mobile Devices
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates Itself
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Devoxx 2017 - AI Self-learning Game Playing

  • 2. @rabbuhl#Devoxx #AISelfLearningGamePlaying Overview • Introduction • Machine Learning? Why Game Playing? • History Game Playing and Machine Learning • Machine Learning Basics / Neural Networks • TD Gammon / Reinforcement Learning • Tic-Tac-Toe / Q-Learning / Demo • AlphaGo / AlphaGo Zero • Useful Links? Questions?
  • 3. @rabbuhl#Devoxx #AISelfLearningGamePlaying Introduction •Thesis: Enhancements to Back-propagation and its Application to Large-Scale Pattern Recognition Problems •Product: Database Mining Marksman
  • 4. @rabbuhl#Devoxx #AISelfLearningGamePlaying Machine Learning? Drive a car? Play a game? Tell a joke? Tell a story? Make a prediction?
  • 5. @rabbuhl#Devoxx #AISelfLearningGamePlaying Game Playing: Why? •Data Warehouse (processed) •Data Lake (raw)
  • 6. @rabbuhl#Devoxx #AISelfLearningGamePlaying Game Playing: ML (50’s – 70’s) When Who, What Year Book/Article Category 1950’s/60’s Arthur Samuels, invents alpha-beta pruning 1959 Some Studies in Machine Learning Using the Game of Checkers Algorithm (early AI is search tree) Frank Rosenblatt, invents Perceptrons 1962 Principles of Neurodynamics Connectionist (neural net) 1970’s Marvin Minsky, criticizes Perceptrons (XOR) 1969 Perceptrons Symbolist (Expert Systems), AI Winter except Expert Systems (until 80’s)
  • 7. @rabbuhl#Devoxx #AISelfLearningGamePlaying Game Playing: ML (80’s) When Who, What Year Book/Article Category 1980’s Rumelhart, McCelland, Hinton multi-layer perceptrons trained with back- propagation, UCSD 1986 Parallel Distributed Processing Connectionist (neural net), AI Revived Christopher Watkins develops Q-learning 1989 Q-learning Reinforcement Learning Richard Sutton, and Andrew Barto 1989 Reinforcement Learning: An Introduction Reinforcement Learning
  • 8. @rabbuhl#Devoxx #AISelfLearningGamePlaying Game Playing: ML (90’s) When Who, What Year Book/Article Category 1990’s RA, Mentor 1991 to 1994 Neural Nets and Tic-Tac-Toe Neural network N. Schraudolph, HNC, UCSD PhD Student 1993 Temporal Difference Learning and Go Neural network Gerald Tesauro, develops TD- Gammon 1995 Temporal Difference Learning and TD- Gammon Connectionist (neural net) and Reinforcement Learning (TD- lambda) IBM Deep Blue chess playing computer 1997 Deep Blue Overview, 1997 Brute-force hardware, used alpha-beta minimax search
  • 9. @rabbuhl#Devoxx #AISelfLearningGamePlaying Game Playing: ML (00’s and beyond) When Who, What Year Book/Article Category 2000’s RA / JMentor 2004 Q-Learning and Tic- Tac-Toe Reinforcement Learning (Q- Learning), QMiniMax IBM Watson 2011 Jeopardy! Machine Learning, Nat Lang Processing, information retrieval Facebook DeepFace, Facial Recognition, Humans 97.53% correct, DeepFace 97.25% correct 2014 Facebook Creates Software That Matches Faces Almost as Well as You Do, 2014 Connectionist, Neural Networks 2016 AlphaGo, beats human player 2016 Mastering the game of Go with deep neural networks and tree search RL, Monte Carlo Tree Search, Machine Learning, etc.
  • 10. @rabbuhl#Devoxx #AISelfLearningGamePlaying Machine Learning Basics •Basics: •Machine learning can be implemented using a feed-forward multi-layer neural network
  • 11. @rabbuhl#Devoxx #AISelfLearningGamePlaying Machine Learning Basics Basics: •Training is done by presenting two members sets of patterns to the network: • Ki = {Ai, Bi}, I = 0,…,p – 1 • Where Ai = {Xi,0, …, Xi,n-1} Bi = {Yi,0, …, Yi,m-1}
  • 12. @rabbuhl#Devoxx #AISelfLearningGamePlaying Machine Learning Basics Basics: • Training is done as follows: 1.Initialize the weights and thresholds 2.Present training set Ki to the network 3.Calculate the forward pass of the network 4.Calculate the desired output 5.Adapt the weights 6.Calculate the error for the training set 7.Repeat by going to step 2 (*) Training stops when the error for all training sets is less than 0.01 (generalize).
  • 13. @rabbuhl#Devoxx #AISelfLearningGamePlaying Machine Learning Basics Example: • For the XOR problem the network is: • 2 inputs, 8 hidden, 1 output • The training set is defined as: • 0.0 0.0 0.9 • 0.0 1.0 -0.9 • 1.0 0.0 -0.9 • 1.0 1.0 0.9
  • 14. @rabbuhl#Devoxx #AISelfLearningGamePlaying Machine Learning Basics: Kolmogorov’s Theorem •Robert Hecht-Neilsen (1987): Kolmogorov's mapping neural network existence theorem
  • 15. @rabbuhl#Devoxx #AISelfLearningGamePlaying TD Gammon •Back-gammon (1995): http://www.bkgm.com/articles/tesauro/tdl.html •Tesauro used this approach to teach a multi-layer perceptrons to play back-gammon
  • 16. @rabbuhl#Devoxx #AISelfLearningGamePlaying TD Gammon / Reinforcement Learning • Back-gammon: • Tesuaro switched to reinforcement learning and self-learning to teach TD- gammon to play at a world-class level. • Version 2.1 used a heuristic 2-ply search in real time • A 3-ply search would improve its playing ability
  • 17. @rabbuhl#Devoxx #AISelfLearningGamePlaying Reinforcement Learning RL differs from supervised learning where learning is done from examples provided by a knowledgeable external supervisor. RL attempts to learn from its own experience, four parts: • Policy: defines the learning agents way of behaving at a give time, • Reward function: defines the goal of the RL problem, • Value function: defines what is good in the long run, • Model: mimics the behavior of the environment
  • 18. @rabbuhl#Devoxx #AISelfLearningGamePlaying Tic-Tac-Toe / Q-Learning • Tic-Tac-Toe: • Board size is 3 x 3 • Training is done using Q-Learning.
  • 19. @rabbuhl#Devoxx #AISelfLearningGamePlaying Tic-Tac-Toe / Q-Learning Policy: •Rule which tells the player which move to make for every state of the game Values: •First, set up a table of numbers, one for each state of the game •Each number is the probability of winning from the state
  • 20. @rabbuhl#Devoxx #AISelfLearningGamePlaying Tic-Tac-Toe / Q-Learning We play many games against our opponent: •We examine states which result from each possible move •We look up their current values in the table Most of the time: •We move greedily and select the move which has the highest probability of winning •However, sometimes we randomly select from other moves
  • 21. @rabbuhl#Devoxx #AISelfLearningGamePlaying Tic-Tac-Toe / Q-Learning When we are playing: • We adjust the states using the temporal difference: • V(s1) = V(s1) + alpha [V(s2) – V(s1)] • s1 is the state before the greedy move • s2 is the state after the move • Alpha is the step-size parameter which is the rate of learning • Number of states for Tic-Tac-Toe: 3 ^ 9 = 19,683 • Number of states for Backgammon: 10 ^ 20 = 100,000,000,000,000,000,000
  • 22. @rabbuhl#Devoxx #AISelfLearningGamePlaying Tic-Tac-Toe / Demo JMentor: https://github.com/richardabbuhl/jmentor Download and build with Maven • XOR problem: jmentorjbackpropoutjbackprop.bat -p xor.trn xor.xml, jbackprop.bat -p xor.trn -a xor.xml • MinMax problem: jmentorjbackpropoutjbackprop.bat -p mm.trn mm.xml, jbackprop.bat - p mm.trn -a mm.xml • Look at training set ttt.trn • Training network to learn TTT: Run a bourne shell: qgosh.sh • See the results: showbest.sh • TTT GUI: jmentorjtictactoeguioutjtictactoegui.bat • Play a game
  • 23. @rabbuhl#Devoxx #AISelfLearningGamePlaying AlphaGo • Go is a game of profound complexity. • There are 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,0 00,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, 000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,00 0,000,000,000,000,000,000,000 possible positions • That’s more than the number of atoms in the universe, and more than a googol times larger than chess. AlphaGo: using machine learning to master the ancient game of go, Machine Learning 2016
  • 24. @rabbuhl#Devoxx #AISelfLearningGamePlaying AlphaGo AlphaGo combines: • Advanced Tree Search (monte-carlo tree search) • Deep Neural Networks (neural networks and reinforcement learning) Neural Networks: • One neural network, the “policy network” select the next move to play, • The other neural network, the “value network” predicts the winner of the game
  • 25. @rabbuhl#Devoxx #AISelfLearningGamePlaying AlphaGo Neural Networks (policy network): •Trained using 30 million moves played by human experts •It could then predict the human moves 57% percent of the time •The policy networks discovered new strategies by playing lots of games between the neural networks and improving them using reinforcement learning •The improved policy networks without a tree search can beat state-of-the-art Go programs which use enormous tree searches
  • 26. @rabbuhl#Devoxx #AISelfLearningGamePlaying AlphaGo Neural Networks (value network): •The policy networks were used to train the value networks and again improved using reinforcement learning •The value networks can evaluate a Go position and estimate an eventual winner
  • 27. @rabbuhl#Devoxx #AISelfLearningGamePlaying AlphaGo Monte-Carlo Tree Search (MCTS) • AlphaGo combines the policy and search value networks in an MCTS algorithm that selects actions by lookahead search, • Evaluating policy and value network requires several orders of magnitude more computation that traditional search heuristics Mastering the game of Go with Deep Neural Networks and Tree Search https://gogameguru.com/i/2016/03/deepmind-mastering-go.pdf AlphaGo: using machine learning to master the ancient game of go, Machine Learning 2016
  • 28. @rabbuhl#Devoxx #AISelfLearningGamePlaying AlphaGo Monte-Carlo Tree Search (MCTS) • The final version of AlphaGo used 40 search threads, 48 CPUs, and 8 GPUs. • The distributed version of AlphaGo using multiple machines, 40 search threads, 1202 CPUs, and 176 GPUs. Mastering the game of Go with Deep Neural Networks and Tree Search https://gogameguru.com/i/2016/03/deepmind-mastering-go.pdf AlphaGo: using machine learning to master the ancient game of go, Machine Learning 2016
  • 29. @rabbuhl#Devoxx #AISelfLearningGamePlaying AlphaGo For the match against Fan Hui, the researchers used: • A larger network of computers that spanned about 170 GPU cards • and 1,200 standard processors, or CPUs. • This larger computer network both trained the system and played the actual game, drawing on the results of the training. In a Huge Breakthrough, Google’s AI Beats a Top Player at the Game of Go, Wired 01.26.16
  • 30. @rabbuhl#Devoxx #AISelfLearningGamePlaying AlphaGo Zero A new version of AlphaGo has emerged: • AlphaGo learned by using training data from hundreds of thousands of games played by human experts, • AlphaGo Zero uses not training data; instead it learns by playing millions of games against itself and learning by each game to improve. • No human data is needed any more. AlphaGo Zero Shows Machines Can Become Superhuman Without Any Help, Intelligent Machines, Will Knight, October 18, 2017. https://www.technologyreview.com/s/609141/alphago-zero-shows-machines-can-become-superhuman-without-any-help/
  • 31. @rabbuhl#Devoxx #AISelfLearningGamePlaying ML Links / Questions? • Deep Learning 4j: https://deeplearning4j.org/ • Tensor Flow: https://www.tensorflow.org/ • Google Cloud Machine Learning: https://cloud.google.com/ml-engine/ • Azure Machine Learning: https://azure.microsoft.com/en- us/services/machine-learning/ • Amazon Machine Learning: https://aws.amazon.com/machine-learning/ • Deep Mind: https://deepmind.com/blog/open-sourcing-deepmind-lab/