Building a Deep Learning AI
Project repo: https://github.com/DanielSlater/PythonDeepLearningSamples
Daniel Slater
● Deep learning can be used to create AI agents that can master games
● Introduction to Reinforcement Learning (RL)
● We will look at an example that learns to play Pong using Actor Critic methods
We will talk about...
Why do we care about this?
● It’s fun
● It’s challenging
● If we can develop generalized learning algorithms they could apply to many other fields
● Games is an interesting field for testing intelligence
https://gym.openai.com/
Great framework for running games
● Pong
● Breakout
● Doom
● Cart-Pole
How to run AI agents on games?
https://gym.openai.com/
Example:
Pip install -e '.[atari]'
import gym
env = gym.make('SpaceInvaders-v0')
obs = env.reset()
env.render()
ob, reward, done, _ = env.step(action)
How to run AI agents on games?
Other options
PyGame:
● 1000’s of games
● Easy to change game code
● PyGamePlayer
● Half pong
How to run AI agents on games?
Other options
PyGame:
● 1000’s of games
● Easy to change game code
● PyGamePlayer
● Half pong
PyGamePlayer:
https://github.com/DanielSlater/PyGamePlayer
How to run AI agents on games?
Deep neural networks
● Tensor Flow is a good flexible deep learning framework
● Backpropagation and deep neural network do a lot the reinforcement learning challenge
is how you find the best loss function to train
● There are examples of all 3 in:
https://github.com/DanielSlater/PythonDeepLearningSamples
● Also contains code for a range of different techniques and games
● Also AlphaToe may be interesting:
https://github.com/DanielSlater/AlphaToe
Resources
Reinforcement learning
● Agents are run within an environment.
● As they take actions they receive feedback, known as reward
● They aim to maximize good feedback and minimize bad feedback
3 categories of reinforcement learning
● Value learning : Q-learning
● Policy learning : Policy gradients
● Model learning
Reinforcement learning
Reinforcement learning
Value learning:
What is the value of being in a
state
Reinforcement learning
Pong valuing a state
● Given a state and an a set of possible actions determine the best action to take to
maximize reward
● Any action will put us into a new state that itself has a set of possible actions
● Our best action now depends on what our best action will be in the next state and so on
Q-Learning
Q Learning
● Q-function is the concept of the perfect action state function
● We will use a neural network to approximate this Q-function
Images from http://mnemstudio.org/path-finding-q-learning-tutorial.htm
Bunny must navigate a maze
Reward = 100 in state 5 (a carrot)
Discount factor = 0.8
Q-Learning Maze example
Convolutional networks
Convolutional net:
● Use a deep convolutional architecture to turn a the huge screen image into a much
smaller representation of the state of the game.
● Key insight: pixels next to each other are much more likely to be related...
Network architecture
Q-Learning - convergence issues
Q-Learning - convergence issues
If I behave in a certain way what will
be it’s reward
Policy learning
● An approach that aims to optimize a policy given a function
● Function = The reward we get from the game we are playing given the actions we take
● Policy = The choice of actions playing the game
● Network outputs the probability of a move in a given board position
● Moves are chosen randomly based on the output of the network.
● Better moves will tend to get more reward
Policy gradients
Policy gradients
Policy gradients
-1.0
Policy gradients
-1.0-0.99-0.97-0.93
● Both aim to achieve the same thing in very different ways
● Q-learning has convergence issues
● Policy gradients has issues of local minima
● Is there an approach that gets the best of both worlds
Policy gradients vs Q-learning
● Policy learning - Actor uses policy gradients to find the best path through the network
● Value learning - A critic tries to learns how the actor performs in different positions
● Actor uses the critics evaluation for it’s gradients
Actor critic methods
Actor critic methods
-1.0-0.650.150.00
-1.0-0.99-0.97-0.93Policy Gradients
Critic Gradients
● Coach / Player
● Coach (critic) provides extra feedback for the player
where he went wrong
● Player (actor) learns tries to do what the coach
wants
Actor critic methods
● The same architecture can work on all kinds of other games:
○ Breakout
○ Q*bert
○ Seaquest
○ Space invaders
This works
Model based:
Learn a simulation of the
environment
Reinforcement learning
● Learn transitions between states.
● If I’m in state x and take action y I will be in state z
● Looks like a supervised learning problem now
Model based learning
● Learn transitions between states.
● If I’m in state x and take action y I will be in state z
● Looks like a supervised learning problem now
● Unroll forward in time
● Apply techniques from board game AI’s
○ Min-Max
Model based learning
Model based learning
Repo: https://github.com/DanielSlater/AlphaToe
Blog post: http://www.danielslater.net/2016/10/alphatoe.html
Alpha Toe
Thank you! Hope you enjoyed the talk!
contact me @:
http://www.danielslater.net/

Building a deep learning ai.pptx

  • 1.
    Building a DeepLearning AI Project repo: https://github.com/DanielSlater/PythonDeepLearningSamples Daniel Slater
  • 2.
    ● Deep learningcan be used to create AI agents that can master games ● Introduction to Reinforcement Learning (RL) ● We will look at an example that learns to play Pong using Actor Critic methods We will talk about...
  • 3.
    Why do wecare about this? ● It’s fun ● It’s challenging ● If we can develop generalized learning algorithms they could apply to many other fields ● Games is an interesting field for testing intelligence
  • 4.
    https://gym.openai.com/ Great framework forrunning games ● Pong ● Breakout ● Doom ● Cart-Pole How to run AI agents on games?
  • 5.
    https://gym.openai.com/ Example: Pip install -e'.[atari]' import gym env = gym.make('SpaceInvaders-v0') obs = env.reset() env.render() ob, reward, done, _ = env.step(action) How to run AI agents on games?
  • 6.
    Other options PyGame: ● 1000’sof games ● Easy to change game code ● PyGamePlayer ● Half pong How to run AI agents on games?
  • 7.
    Other options PyGame: ● 1000’sof games ● Easy to change game code ● PyGamePlayer ● Half pong PyGamePlayer: https://github.com/DanielSlater/PyGamePlayer How to run AI agents on games?
  • 8.
    Deep neural networks ●Tensor Flow is a good flexible deep learning framework ● Backpropagation and deep neural network do a lot the reinforcement learning challenge is how you find the best loss function to train
  • 9.
    ● There areexamples of all 3 in: https://github.com/DanielSlater/PythonDeepLearningSamples ● Also contains code for a range of different techniques and games ● Also AlphaToe may be interesting: https://github.com/DanielSlater/AlphaToe Resources
  • 10.
    Reinforcement learning ● Agentsare run within an environment. ● As they take actions they receive feedback, known as reward ● They aim to maximize good feedback and minimize bad feedback
  • 11.
    3 categories ofreinforcement learning ● Value learning : Q-learning ● Policy learning : Policy gradients ● Model learning Reinforcement learning
  • 12.
    Reinforcement learning Value learning: Whatis the value of being in a state
  • 13.
  • 14.
    ● Given astate and an a set of possible actions determine the best action to take to maximize reward ● Any action will put us into a new state that itself has a set of possible actions ● Our best action now depends on what our best action will be in the next state and so on Q-Learning
  • 15.
    Q Learning ● Q-functionis the concept of the perfect action state function ● We will use a neural network to approximate this Q-function
  • 16.
    Images from http://mnemstudio.org/path-finding-q-learning-tutorial.htm Bunnymust navigate a maze Reward = 100 in state 5 (a carrot) Discount factor = 0.8 Q-Learning Maze example
  • 17.
    Convolutional networks Convolutional net: ●Use a deep convolutional architecture to turn a the huge screen image into a much smaller representation of the state of the game. ● Key insight: pixels next to each other are much more likely to be related...
  • 18.
  • 19.
  • 20.
  • 21.
    If I behavein a certain way what will be it’s reward Policy learning
  • 22.
    ● An approachthat aims to optimize a policy given a function ● Function = The reward we get from the game we are playing given the actions we take ● Policy = The choice of actions playing the game ● Network outputs the probability of a move in a given board position ● Moves are chosen randomly based on the output of the network. ● Better moves will tend to get more reward Policy gradients
  • 23.
  • 24.
  • 25.
  • 26.
    ● Both aimto achieve the same thing in very different ways ● Q-learning has convergence issues ● Policy gradients has issues of local minima ● Is there an approach that gets the best of both worlds Policy gradients vs Q-learning
  • 27.
    ● Policy learning- Actor uses policy gradients to find the best path through the network ● Value learning - A critic tries to learns how the actor performs in different positions ● Actor uses the critics evaluation for it’s gradients Actor critic methods
  • 28.
  • 29.
    ● Coach /Player ● Coach (critic) provides extra feedback for the player where he went wrong ● Player (actor) learns tries to do what the coach wants Actor critic methods
  • 30.
    ● The samearchitecture can work on all kinds of other games: ○ Breakout ○ Q*bert ○ Seaquest ○ Space invaders This works
  • 31.
    Model based: Learn asimulation of the environment Reinforcement learning
  • 32.
    ● Learn transitionsbetween states. ● If I’m in state x and take action y I will be in state z ● Looks like a supervised learning problem now Model based learning
  • 33.
    ● Learn transitionsbetween states. ● If I’m in state x and take action y I will be in state z ● Looks like a supervised learning problem now ● Unroll forward in time ● Apply techniques from board game AI’s ○ Min-Max Model based learning
  • 34.
  • 35.
    Repo: https://github.com/DanielSlater/AlphaToe Blog post:http://www.danielslater.net/2016/10/alphatoe.html Alpha Toe
  • 36.
    Thank you! Hopeyou enjoyed the talk! contact me @: http://www.danielslater.net/

Editor's Notes

  • #4 Mention my own experiences at Skimlinks trying to find a use RL
  • #17 http://mnemstudio.org/path-finding-q-learning-tutorial.htm
  • #18 Show convolutional net code in Tensorflow?Show convolutional net code in Tensorflow? convolutional net code in Tensorflow?
  • #19 Show convolutional net code in Tensorflow?Show convolutional net code in Tensorflow? convolutional net code in Tensorflow?
  • #31 Show half pong
  • #37 Buy my book, blah about Skimlinks