SlideShare a Scribd company logo
Building a Deep Learning AI
Project repo: https://github.com/DanielSlater/PythonDeepLearningSamples
Daniel Slater
● Deep learning can be used to create AI agents that can master games
● Introduction to Reinforcement Learning (RL)
● We will look at an example that learns to play Pong using Actor Critic methods
We will talk about...
Why do we care about this?
● It’s fun
● It’s challenging
● If we can develop generalized learning algorithms they could apply to many other fields
● Games is an interesting field for testing intelligence
https://gym.openai.com/
Great framework for running games
● Pong
● Breakout
● Doom
● Cart-Pole
How to run AI agents on games?
https://gym.openai.com/
Example:
Pip install -e '.[atari]'
import gym
env = gym.make('SpaceInvaders-v0')
obs = env.reset()
env.render()
ob, reward, done, _ = env.step(action)
How to run AI agents on games?
Other options
PyGame:
● 1000’s of games
● Easy to change game code
● PyGamePlayer
● Half pong
How to run AI agents on games?
Other options
PyGame:
● 1000’s of games
● Easy to change game code
● PyGamePlayer
● Half pong
PyGamePlayer:
https://github.com/DanielSlater/PyGamePlayer
How to run AI agents on games?
Deep neural networks
● Tensor Flow is a good flexible deep learning framework
● Backpropagation and deep neural network do a lot the reinforcement learning challenge
is how you find the best loss function to train
● There are examples of all 3 in:
https://github.com/DanielSlater/PythonDeepLearningSamples
● Also contains code for a range of different techniques and games
● Also AlphaToe may be interesting:
https://github.com/DanielSlater/AlphaToe
Resources
Reinforcement learning
● Agents are run within an environment.
● As they take actions they receive feedback, known as reward
● They aim to maximize good feedback and minimize bad feedback
3 categories of reinforcement learning
● Value learning : Q-learning
● Policy learning : Policy gradients
● Model learning
Reinforcement learning
Reinforcement learning
Value learning:
What is the value of being in a
state
Reinforcement learning
Pong valuing a state
● Given a state and an a set of possible actions determine the best action to take to
maximize reward
● Any action will put us into a new state that itself has a set of possible actions
● Our best action now depends on what our best action will be in the next state and so on
Q-Learning
Q Learning
● Q-function is the concept of the perfect action state function
● We will use a neural network to approximate this Q-function
Images from http://mnemstudio.org/path-finding-q-learning-tutorial.htm
Bunny must navigate a maze
Reward = 100 in state 5 (a carrot)
Discount factor = 0.8
Q-Learning Maze example
Convolutional networks
Convolutional net:
● Use a deep convolutional architecture to turn a the huge screen image into a much
smaller representation of the state of the game.
● Key insight: pixels next to each other are much more likely to be related...
Network architecture
Q-Learning - convergence issues
Q-Learning - convergence issues
If I behave in a certain way what will
be it’s reward
Policy learning
● An approach that aims to optimize a policy given a function
● Function = The reward we get from the game we are playing given the actions we take
● Policy = The choice of actions playing the game
● Network outputs the probability of a move in a given board position
● Moves are chosen randomly based on the output of the network.
● Better moves will tend to get more reward
Policy gradients
Policy gradients
Policy gradients
-1.0
Policy gradients
-1.0-0.99-0.97-0.93
● Both aim to achieve the same thing in very different ways
● Q-learning has convergence issues
● Policy gradients has issues of local minima
● Is there an approach that gets the best of both worlds
Policy gradients vs Q-learning
● Policy learning - Actor uses policy gradients to find the best path through the network
● Value learning - A critic tries to learns how the actor performs in different positions
● Actor uses the critics evaluation for it’s gradients
Actor critic methods
Actor critic methods
-1.0-0.650.150.00
-1.0-0.99-0.97-0.93Policy Gradients
Critic Gradients
● Coach / Player
● Coach (critic) provides extra feedback for the player
where he went wrong
● Player (actor) learns tries to do what the coach
wants
Actor critic methods
● The same architecture can work on all kinds of other games:
β—‹ Breakout
β—‹ Q*bert
β—‹ Seaquest
β—‹ Space invaders
This works
Model based:
Learn a simulation of the
environment
Reinforcement learning
● Learn transitions between states.
● If I’m in state x and take action y I will be in state z
● Looks like a supervised learning problem now
Model based learning
● Learn transitions between states.
● If I’m in state x and take action y I will be in state z
● Looks like a supervised learning problem now
● Unroll forward in time
● Apply techniques from board game AI’s
β—‹ Min-Max
Model based learning
Model based learning
Repo: https://github.com/DanielSlater/AlphaToe
Blog post: http://www.danielslater.net/2016/10/alphatoe.html
Alpha Toe
Thank you! Hope you enjoyed the talk!
contact me @:
http://www.danielslater.net/

More Related Content

Similar to Building a deep learning ai.pptx

Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Donal Byrne
Β 
Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko...
Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko...Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko...
Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko...
Marko Lohert
Β 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale
Β 
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
郁凱 黃
Β 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain
Β 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introduction
ConnorShorten2
Β 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
Β 
Long Lin at AI Frontiers : AI in Gaming
Long Lin at AI Frontiers : AI in GamingLong Lin at AI Frontiers : AI in Gaming
Long Lin at AI Frontiers : AI in Gaming
AI Frontiers
Β 
Open ai openpower
Open ai openpowerOpen ai openpower
Open ai openpower
Ganesan Narayanasamy
Β 
Artworks personalization on Netflix
Artworks personalization on Netflix Artworks personalization on Netflix
Artworks personalization on Netflix
IntoTheMinds
Β 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
Fernando Amat
Β 
Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning
Multi-Agent Reinforcement Learning
Seolhokim
Β 
Learning Analytics Design in Game-based Learning
Learning Analytics Design in Game-based LearningLearning Analytics Design in Game-based Learning
Learning Analytics Design in Game-based Learning
MIT
Β 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
Subrat Panda, PhD
Β 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdf
ssuseradaf5f
Β 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
IDEAS - Int'l Data Engineering and Science Association
Β 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
Usman Qayyum
Β 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving Cars
Sneha Ravikumar
Β 
CSSC ML Workshop
CSSC ML WorkshopCSSC ML Workshop
CSSC ML Workshop
GDSC UofT Mississauga
Β 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement Learning
Ruth Yakubu
Β 

Similar to Building a deep learning ai.pptx (20)

Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Β 
Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko...
Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko...Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko...
Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko...
Β 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Β 
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Β 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Β 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introduction
Β 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Β 
Long Lin at AI Frontiers : AI in Gaming
Long Lin at AI Frontiers : AI in GamingLong Lin at AI Frontiers : AI in Gaming
Long Lin at AI Frontiers : AI in Gaming
Β 
Open ai openpower
Open ai openpowerOpen ai openpower
Open ai openpower
Β 
Artworks personalization on Netflix
Artworks personalization on Netflix Artworks personalization on Netflix
Artworks personalization on Netflix
Β 
Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018 Artwork Personalization at Netflix Fernando Amat RecSys2018
Artwork Personalization at Netflix Fernando Amat RecSys2018
Β 
Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning
Multi-Agent Reinforcement Learning
Β 
Learning Analytics Design in Game-based Learning
Learning Analytics Design in Game-based LearningLearning Analytics Design in Game-based Learning
Learning Analytics Design in Game-based Learning
Β 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
Β 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdf
Β 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
Β 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
Β 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving Cars
Β 
CSSC ML Workshop
CSSC ML WorkshopCSSC ML Workshop
CSSC ML Workshop
Β 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement Learning
Β 

Recently uploaded

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
Β 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
Β 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
Β 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
Β 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
Β 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
Β 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
Β 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
Β 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
Β 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
Β 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
Β 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
Β 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
Β 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
Β 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
Β 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
Β 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
Β 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
Β 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
Β 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
Β 

Recently uploaded (20)

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Β 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Β 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Β 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Β 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
Β 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Β 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Β 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
Β 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Β 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
Β 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Β 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Β 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
Β 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Β 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Β 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Β 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Β 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Β 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Β 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Β 

Building a deep learning ai.pptx

  • 1. Building a Deep Learning AI Project repo: https://github.com/DanielSlater/PythonDeepLearningSamples Daniel Slater
  • 2. ● Deep learning can be used to create AI agents that can master games ● Introduction to Reinforcement Learning (RL) ● We will look at an example that learns to play Pong using Actor Critic methods We will talk about...
  • 3. Why do we care about this? ● It’s fun ● It’s challenging ● If we can develop generalized learning algorithms they could apply to many other fields ● Games is an interesting field for testing intelligence
  • 4. https://gym.openai.com/ Great framework for running games ● Pong ● Breakout ● Doom ● Cart-Pole How to run AI agents on games?
  • 5. https://gym.openai.com/ Example: Pip install -e '.[atari]' import gym env = gym.make('SpaceInvaders-v0') obs = env.reset() env.render() ob, reward, done, _ = env.step(action) How to run AI agents on games?
  • 6. Other options PyGame: ● 1000’s of games ● Easy to change game code ● PyGamePlayer ● Half pong How to run AI agents on games?
  • 7. Other options PyGame: ● 1000’s of games ● Easy to change game code ● PyGamePlayer ● Half pong PyGamePlayer: https://github.com/DanielSlater/PyGamePlayer How to run AI agents on games?
  • 8. Deep neural networks ● Tensor Flow is a good flexible deep learning framework ● Backpropagation and deep neural network do a lot the reinforcement learning challenge is how you find the best loss function to train
  • 9. ● There are examples of all 3 in: https://github.com/DanielSlater/PythonDeepLearningSamples ● Also contains code for a range of different techniques and games ● Also AlphaToe may be interesting: https://github.com/DanielSlater/AlphaToe Resources
  • 10. Reinforcement learning ● Agents are run within an environment. ● As they take actions they receive feedback, known as reward ● They aim to maximize good feedback and minimize bad feedback
  • 11. 3 categories of reinforcement learning ● Value learning : Q-learning ● Policy learning : Policy gradients ● Model learning Reinforcement learning
  • 12. Reinforcement learning Value learning: What is the value of being in a state
  • 14. ● Given a state and an a set of possible actions determine the best action to take to maximize reward ● Any action will put us into a new state that itself has a set of possible actions ● Our best action now depends on what our best action will be in the next state and so on Q-Learning
  • 15. Q Learning ● Q-function is the concept of the perfect action state function ● We will use a neural network to approximate this Q-function
  • 16. Images from http://mnemstudio.org/path-finding-q-learning-tutorial.htm Bunny must navigate a maze Reward = 100 in state 5 (a carrot) Discount factor = 0.8 Q-Learning Maze example
  • 17. Convolutional networks Convolutional net: ● Use a deep convolutional architecture to turn a the huge screen image into a much smaller representation of the state of the game. ● Key insight: pixels next to each other are much more likely to be related...
  • 21. If I behave in a certain way what will be it’s reward Policy learning
  • 22. ● An approach that aims to optimize a policy given a function ● Function = The reward we get from the game we are playing given the actions we take ● Policy = The choice of actions playing the game ● Network outputs the probability of a move in a given board position ● Moves are chosen randomly based on the output of the network. ● Better moves will tend to get more reward Policy gradients
  • 26. ● Both aim to achieve the same thing in very different ways ● Q-learning has convergence issues ● Policy gradients has issues of local minima ● Is there an approach that gets the best of both worlds Policy gradients vs Q-learning
  • 27. ● Policy learning - Actor uses policy gradients to find the best path through the network ● Value learning - A critic tries to learns how the actor performs in different positions ● Actor uses the critics evaluation for it’s gradients Actor critic methods
  • 29. ● Coach / Player ● Coach (critic) provides extra feedback for the player where he went wrong ● Player (actor) learns tries to do what the coach wants Actor critic methods
  • 30. ● The same architecture can work on all kinds of other games: β—‹ Breakout β—‹ Q*bert β—‹ Seaquest β—‹ Space invaders This works
  • 31. Model based: Learn a simulation of the environment Reinforcement learning
  • 32. ● Learn transitions between states. ● If I’m in state x and take action y I will be in state z ● Looks like a supervised learning problem now Model based learning
  • 33. ● Learn transitions between states. ● If I’m in state x and take action y I will be in state z ● Looks like a supervised learning problem now ● Unroll forward in time ● Apply techniques from board game AI’s β—‹ Min-Max Model based learning
  • 35. Repo: https://github.com/DanielSlater/AlphaToe Blog post: http://www.danielslater.net/2016/10/alphatoe.html Alpha Toe
  • 36. Thank you! Hope you enjoyed the talk! contact me @: http://www.danielslater.net/

Editor's Notes

  1. Mention my own experiences at Skimlinks trying to find a use RL
  2. http://mnemstudio.org/path-finding-q-learning-tutorial.htm
  3. Show convolutional net code in Tensorflow?Show convolutional net code in Tensorflow? convolutional net code in Tensorflow?
  4. Show convolutional net code in Tensorflow?Show convolutional net code in Tensorflow? convolutional net code in Tensorflow?
  5. Show half pong
  6. Buy my book, blah about Skimlinks