SlideShare a Scribd company logo
1 of 36
Building a Deep Learning AI
Project repo: https://github.com/DanielSlater/PythonDeepLearningSamples
Daniel Slater
● Deep learning can be used to create AI agents that can master games
● Introduction to Reinforcement Learning (RL)
● We will look at an example that learns to play Pong using Actor Critic methods
We will talk about...
Why do we care about this?
● It’s fun
● It’s challenging
● If we can develop generalized learning algorithms they could apply to many other fields
● Games is an interesting field for testing intelligence
https://gym.openai.com/
Great framework for running games
● Pong
● Breakout
● Doom
● Cart-Pole
How to run AI agents on games?
https://gym.openai.com/
Example:
Pip install -e '.[atari]'
import gym
env = gym.make('SpaceInvaders-v0')
obs = env.reset()
env.render()
ob, reward, done, _ = env.step(action)
How to run AI agents on games?
Other options
PyGame:
● 1000’s of games
● Easy to change game code
● PyGamePlayer
● Half pong
How to run AI agents on games?
Other options
PyGame:
● 1000’s of games
● Easy to change game code
● PyGamePlayer
● Half pong
PyGamePlayer:
https://github.com/DanielSlater/PyGamePlayer
How to run AI agents on games?
Deep neural networks
● Tensor Flow is a good flexible deep learning framework
● Backpropagation and deep neural network do a lot the reinforcement learning challenge
is how you find the best loss function to train
● There are examples of all 3 in:
https://github.com/DanielSlater/PythonDeepLearningSamples
● Also contains code for a range of different techniques and games
● Also AlphaToe may be interesting:
https://github.com/DanielSlater/AlphaToe
Resources
Reinforcement learning
● Agents are run within an environment.
● As they take actions they receive feedback, known as reward
● They aim to maximize good feedback and minimize bad feedback
3 categories of reinforcement learning
● Value learning : Q-learning
● Policy learning : Policy gradients
● Model learning
Reinforcement learning
Reinforcement learning
Value learning:
What is the value of being in a
state
Reinforcement learning
Pong valuing a state
● Given a state and an a set of possible actions determine the best action to take to
maximize reward
● Any action will put us into a new state that itself has a set of possible actions
● Our best action now depends on what our best action will be in the next state and so on
Q-Learning
Q Learning
● Q-function is the concept of the perfect action state function
● We will use a neural network to approximate this Q-function
Images from http://mnemstudio.org/path-finding-q-learning-tutorial.htm
Bunny must navigate a maze
Reward = 100 in state 5 (a carrot)
Discount factor = 0.8
Q-Learning Maze example
Convolutional networks
Convolutional net:
● Use a deep convolutional architecture to turn a the huge screen image into a much
smaller representation of the state of the game.
● Key insight: pixels next to each other are much more likely to be related...
Network architecture
Q-Learning - convergence issues
Q-Learning - convergence issues
If I behave in a certain way what will
be it’s reward
Policy learning
● An approach that aims to optimize a policy given a function
● Function = The reward we get from the game we are playing given the actions we take
● Policy = The choice of actions playing the game
● Network outputs the probability of a move in a given board position
● Moves are chosen randomly based on the output of the network.
● Better moves will tend to get more reward
Policy gradients
Policy gradients
Policy gradients
-1.0
Policy gradients
-1.0-0.99-0.97-0.93
● Both aim to achieve the same thing in very different ways
● Q-learning has convergence issues
● Policy gradients has issues of local minima
● Is there an approach that gets the best of both worlds
Policy gradients vs Q-learning
● Policy learning - Actor uses policy gradients to find the best path through the network
● Value learning - A critic tries to learns how the actor performs in different positions
● Actor uses the critics evaluation for it’s gradients
Actor critic methods
Actor critic methods
-1.0-0.650.150.00
-1.0-0.99-0.97-0.93Policy Gradients
Critic Gradients
● Coach / Player
● Coach (critic) provides extra feedback for the player
where he went wrong
● Player (actor) learns tries to do what the coach
wants
Actor critic methods
● The same architecture can work on all kinds of other games:
○ Breakout
○ Q*bert
○ Seaquest
○ Space invaders
This works
Model based:
Learn a simulation of the
environment
Reinforcement learning
● Learn transitions between states.
● If I’m in state x and take action y I will be in state z
● Looks like a supervised learning problem now
Model based learning
● Learn transitions between states.
● If I’m in state x and take action y I will be in state z
● Looks like a supervised learning problem now
● Unroll forward in time
● Apply techniques from board game AI’s
○ Min-Max
Model based learning
Model based learning
Repo: https://github.com/DanielSlater/AlphaToe
Blog post: http://www.danielslater.net/2016/10/alphatoe.html
Alpha Toe
Thank you! Hope you enjoyed the talk!
contact me @:
http://www.danielslater.net/

More Related Content

Similar to Building a deep learning ai.pptx

Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingDonal Byrne
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...郁凱 黃
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introductionConnorShorten2
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
Long Lin at AI Frontiers : AI in Gaming
Long Lin at AI Frontiers : AI in GamingLong Lin at AI Frontiers : AI in Gaming
Long Lin at AI Frontiers : AI in GamingAI Frontiers
 
Artworks personalization on Netflix
Artworks personalization on Netflix Artworks personalization on Netflix
Artworks personalization on Netflix IntoTheMinds
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Fernando Amat
 
Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningSeolhokim
 
Learning Analytics Design in Game-based Learning
Learning Analytics Design in Game-based LearningLearning Analytics Design in Game-based Learning
Learning Analytics Design in Game-based LearningMIT
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfssuseradaf5f
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsSneha Ravikumar
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningRuth Yakubu
 
Introduction to Reinforcement Learning
Introduction to Reinforcement LearningIntroduction to Reinforcement Learning
Introduction to Reinforcement LearningMihir Thakkar
 

Similar to Building a deep learning ai.pptx (20)

Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous Driving
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introduction
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Long Lin at AI Frontiers : AI in Gaming
Long Lin at AI Frontiers : AI in GamingLong Lin at AI Frontiers : AI in Gaming
Long Lin at AI Frontiers : AI in Gaming
 
Open ai openpower
Open ai openpowerOpen ai openpower
Open ai openpower
 
Artworks personalization on Netflix
Artworks personalization on Netflix Artworks personalization on Netflix
Artworks personalization on Netflix
 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
 
Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning
Multi-Agent Reinforcement Learning
 
Learning Analytics Design in Game-based Learning
Learning Analytics Design in Game-based LearningLearning Analytics Design in Game-based Learning
Learning Analytics Design in Game-based Learning
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdf
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving Cars
 
CSSC ML Workshop
CSSC ML WorkshopCSSC ML Workshop
CSSC ML Workshop
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement Learning
 
Introduction to Reinforcement Learning
Introduction to Reinforcement LearningIntroduction to Reinforcement Learning
Introduction to Reinforcement Learning
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 

Building a deep learning ai.pptx

  • 1. Building a Deep Learning AI Project repo: https://github.com/DanielSlater/PythonDeepLearningSamples Daniel Slater
  • 2. ● Deep learning can be used to create AI agents that can master games ● Introduction to Reinforcement Learning (RL) ● We will look at an example that learns to play Pong using Actor Critic methods We will talk about...
  • 3. Why do we care about this? ● It’s fun ● It’s challenging ● If we can develop generalized learning algorithms they could apply to many other fields ● Games is an interesting field for testing intelligence
  • 4. https://gym.openai.com/ Great framework for running games ● Pong ● Breakout ● Doom ● Cart-Pole How to run AI agents on games?
  • 5. https://gym.openai.com/ Example: Pip install -e '.[atari]' import gym env = gym.make('SpaceInvaders-v0') obs = env.reset() env.render() ob, reward, done, _ = env.step(action) How to run AI agents on games?
  • 6. Other options PyGame: ● 1000’s of games ● Easy to change game code ● PyGamePlayer ● Half pong How to run AI agents on games?
  • 7. Other options PyGame: ● 1000’s of games ● Easy to change game code ● PyGamePlayer ● Half pong PyGamePlayer: https://github.com/DanielSlater/PyGamePlayer How to run AI agents on games?
  • 8. Deep neural networks ● Tensor Flow is a good flexible deep learning framework ● Backpropagation and deep neural network do a lot the reinforcement learning challenge is how you find the best loss function to train
  • 9. ● There are examples of all 3 in: https://github.com/DanielSlater/PythonDeepLearningSamples ● Also contains code for a range of different techniques and games ● Also AlphaToe may be interesting: https://github.com/DanielSlater/AlphaToe Resources
  • 10. Reinforcement learning ● Agents are run within an environment. ● As they take actions they receive feedback, known as reward ● They aim to maximize good feedback and minimize bad feedback
  • 11. 3 categories of reinforcement learning ● Value learning : Q-learning ● Policy learning : Policy gradients ● Model learning Reinforcement learning
  • 12. Reinforcement learning Value learning: What is the value of being in a state
  • 14. ● Given a state and an a set of possible actions determine the best action to take to maximize reward ● Any action will put us into a new state that itself has a set of possible actions ● Our best action now depends on what our best action will be in the next state and so on Q-Learning
  • 15. Q Learning ● Q-function is the concept of the perfect action state function ● We will use a neural network to approximate this Q-function
  • 16. Images from http://mnemstudio.org/path-finding-q-learning-tutorial.htm Bunny must navigate a maze Reward = 100 in state 5 (a carrot) Discount factor = 0.8 Q-Learning Maze example
  • 17. Convolutional networks Convolutional net: ● Use a deep convolutional architecture to turn a the huge screen image into a much smaller representation of the state of the game. ● Key insight: pixels next to each other are much more likely to be related...
  • 21. If I behave in a certain way what will be it’s reward Policy learning
  • 22. ● An approach that aims to optimize a policy given a function ● Function = The reward we get from the game we are playing given the actions we take ● Policy = The choice of actions playing the game ● Network outputs the probability of a move in a given board position ● Moves are chosen randomly based on the output of the network. ● Better moves will tend to get more reward Policy gradients
  • 26. ● Both aim to achieve the same thing in very different ways ● Q-learning has convergence issues ● Policy gradients has issues of local minima ● Is there an approach that gets the best of both worlds Policy gradients vs Q-learning
  • 27. ● Policy learning - Actor uses policy gradients to find the best path through the network ● Value learning - A critic tries to learns how the actor performs in different positions ● Actor uses the critics evaluation for it’s gradients Actor critic methods
  • 29. ● Coach / Player ● Coach (critic) provides extra feedback for the player where he went wrong ● Player (actor) learns tries to do what the coach wants Actor critic methods
  • 30. ● The same architecture can work on all kinds of other games: ○ Breakout ○ Q*bert ○ Seaquest ○ Space invaders This works
  • 31. Model based: Learn a simulation of the environment Reinforcement learning
  • 32. ● Learn transitions between states. ● If I’m in state x and take action y I will be in state z ● Looks like a supervised learning problem now Model based learning
  • 33. ● Learn transitions between states. ● If I’m in state x and take action y I will be in state z ● Looks like a supervised learning problem now ● Unroll forward in time ● Apply techniques from board game AI’s ○ Min-Max Model based learning
  • 35. Repo: https://github.com/DanielSlater/AlphaToe Blog post: http://www.danielslater.net/2016/10/alphatoe.html Alpha Toe
  • 36. Thank you! Hope you enjoyed the talk! contact me @: http://www.danielslater.net/

Editor's Notes

  1. Mention my own experiences at Skimlinks trying to find a use RL
  2. http://mnemstudio.org/path-finding-q-learning-tutorial.htm
  3. Show convolutional net code in Tensorflow?Show convolutional net code in Tensorflow? convolutional net code in Tensorflow?
  4. Show convolutional net code in Tensorflow?Show convolutional net code in Tensorflow? convolutional net code in Tensorflow?
  5. Show half pong
  6. Buy my book, blah about Skimlinks