1. Building a Deep Learning AI
Project repo: https://github.com/DanielSlater/PythonDeepLearningSamples
Daniel Slater
2. ● Deep learning can be used to create AI agents that can master games
● Introduction to Reinforcement Learning (RL)
● We will look at an example that learns to play Pong using Actor Critic methods
We will talk about...
3. Why do we care about this?
● It’s fun
● It’s challenging
● If we can develop generalized learning algorithms they could apply to many other fields
● Games is an interesting field for testing intelligence
5. https://gym.openai.com/
Example:
Pip install -e '.[atari]'
import gym
env = gym.make('SpaceInvaders-v0')
obs = env.reset()
env.render()
ob, reward, done, _ = env.step(action)
How to run AI agents on games?
6. Other options
PyGame:
● 1000’s of games
● Easy to change game code
● PyGamePlayer
● Half pong
How to run AI agents on games?
7. Other options
PyGame:
● 1000’s of games
● Easy to change game code
● PyGamePlayer
● Half pong
PyGamePlayer:
https://github.com/DanielSlater/PyGamePlayer
How to run AI agents on games?
8. Deep neural networks
● Tensor Flow is a good flexible deep learning framework
● Backpropagation and deep neural network do a lot the reinforcement learning challenge
is how you find the best loss function to train
9. ● There are examples of all 3 in:
https://github.com/DanielSlater/PythonDeepLearningSamples
● Also contains code for a range of different techniques and games
● Also AlphaToe may be interesting:
https://github.com/DanielSlater/AlphaToe
Resources
10. Reinforcement learning
● Agents are run within an environment.
● As they take actions they receive feedback, known as reward
● They aim to maximize good feedback and minimize bad feedback
11. 3 categories of reinforcement learning
● Value learning : Q-learning
● Policy learning : Policy gradients
● Model learning
Reinforcement learning
14. ● Given a state and an a set of possible actions determine the best action to take to
maximize reward
● Any action will put us into a new state that itself has a set of possible actions
● Our best action now depends on what our best action will be in the next state and so on
Q-Learning
15. Q Learning
● Q-function is the concept of the perfect action state function
● We will use a neural network to approximate this Q-function
17. Convolutional networks
Convolutional net:
● Use a deep convolutional architecture to turn a the huge screen image into a much
smaller representation of the state of the game.
● Key insight: pixels next to each other are much more likely to be related...
21. If I behave in a certain way what will
be it’s reward
Policy learning
22. ● An approach that aims to optimize a policy given a function
● Function = The reward we get from the game we are playing given the actions we take
● Policy = The choice of actions playing the game
● Network outputs the probability of a move in a given board position
● Moves are chosen randomly based on the output of the network.
● Better moves will tend to get more reward
Policy gradients
26. ● Both aim to achieve the same thing in very different ways
● Q-learning has convergence issues
● Policy gradients has issues of local minima
● Is there an approach that gets the best of both worlds
Policy gradients vs Q-learning
27. ● Policy learning - Actor uses policy gradients to find the best path through the network
● Value learning - A critic tries to learns how the actor performs in different positions
● Actor uses the critics evaluation for it’s gradients
Actor critic methods
29. ● Coach / Player
● Coach (critic) provides extra feedback for the player
where he went wrong
● Player (actor) learns tries to do what the coach
wants
Actor critic methods
30. ● The same architecture can work on all kinds of other games:
○ Breakout
○ Q*bert
○ Seaquest
○ Space invaders
This works
32. ● Learn transitions between states.
● If I’m in state x and take action y I will be in state z
● Looks like a supervised learning problem now
Model based learning
33. ● Learn transitions between states.
● If I’m in state x and take action y I will be in state z
● Looks like a supervised learning problem now
● Unroll forward in time
● Apply techniques from board game AI’s
○ Min-Max
Model based learning