Building a deep learning ai.pptx

Building a Deep Learning AI
Project repo: https://github.com/DanielSlater/PythonDeepLearningSamples
Daniel Slater

● Deep learning can be used to create AI agents that can master games
● Introduction to Reinforcement Learning (RL)
● We will look at an example that learns to play Pong using Actor Critic methods
We will talk about...

Why do we care about this?
● It’s fun
● It’s challenging
● If we can develop generalized learning algorithms they could apply to many other fields
● Games is an interesting field for testing intelligence

https://gym.openai.com/
Great framework for running games
● Pong
● Breakout
● Doom
● Cart-Pole
How to run AI agents on games?

https://gym.openai.com/
Example:
Pip install -e '.[atari]'
import gym
env = gym.make('SpaceInvaders-v0')
obs = env.reset()
env.render()
ob, reward, done, _ = env.step(action)

Other options
PyGame:
● 1000’s of games
● Easy to change game code
● PyGamePlayer
● Half pong

Other options
PyGame:
● 1000’s of games
● Easy to change game code
● PyGamePlayer
● Half pong
PyGamePlayer:
https://github.com/DanielSlater/PyGamePlayer

Deep neural networks
● Tensor Flow is a good flexible deep learning framework
● Backpropagation and deep neural network do a lot the reinforcement learning challenge
is how you find the best loss function to train

● There are examples of all 3 in:
https://github.com/DanielSlater/PythonDeepLearningSamples
● Also contains code for a range of different techniques and games
● Also AlphaToe may be interesting:
https://github.com/DanielSlater/AlphaToe
Resources

Reinforcement learning
● Agents are run within an environment.
● As they take actions they receive feedback, known as reward
● They aim to maximize good feedback and minimize bad feedback

3 categories of reinforcement learning
● Value learning : Q-learning
● Policy learning : Policy gradients
● Model learning

Value learning:
What is the value of being in a
state

Pong valuing a state

● Given a state and an a set of possible actions determine the best action to take to
maximize reward
● Any action will put us into a new state that itself has a set of possible actions
● Our best action now depends on what our best action will be in the next state and so on
Q-Learning

Q Learning
● Q-function is the concept of the perfect action state function
● We will use a neural network to approximate this Q-function

Images from http://mnemstudio.org/path-finding-q-learning-tutorial.htm
Bunny must navigate a maze
Reward = 100 in state 5 (a carrot)
Discount factor = 0.8
Q-Learning Maze example

Convolutional networks
Convolutional net:
● Use a deep convolutional architecture to turn a the huge screen image into a much
smaller representation of the state of the game.
● Key insight: pixels next to each other are much more likely to be related...

Q-Learning - convergence issues

If I behave in a certain way what will
be it’s reward
Policy learning

● An approach that aims to optimize a policy given a function
● Function = The reward we get from the game we are playing given the actions we take
● Policy = The choice of actions playing the game
● Network outputs the probability of a move in a given board position
● Moves are chosen randomly based on the output of the network.
● Better moves will tend to get more reward
Policy gradients

Policy gradients
-1.0-0.99-0.97-0.93

● Both aim to achieve the same thing in very different ways
● Q-learning has convergence issues
● Policy gradients has issues of local minima
● Is there an approach that gets the best of both worlds
Policy gradients vs Q-learning

● Policy learning - Actor uses policy gradients to find the best path through the network
● Value learning - A critic tries to learns how the actor performs in different positions
● Actor uses the critics evaluation for it’s gradients
Actor critic methods

-1.0-0.650.150.00
-1.0-0.99-0.97-0.93Policy Gradients
Critic Gradients

● Coach / Player
● Coach (critic) provides extra feedback for the player
where he went wrong
● Player (actor) learns tries to do what the coach
wants

● The same architecture can work on all kinds of other games:
○ Breakout
○ Q*bert
○ Seaquest
○ Space invaders
This works

Model based:
Learn a simulation of the
environment

● Learn transitions between states.
● If I’m in state x and take action y I will be in state z
● Looks like a supervised learning problem now
Model based learning

● Learn transitions between states.
● If I’m in state x and take action y I will be in state z
● Looks like a supervised learning problem now
● Unroll forward in time
● Apply techniques from board game AI’s
○ Min-Max
Model based learning

Repo: https://github.com/DanielSlater/AlphaToe
Blog post: http://www.danielslater.net/2016/10/alphatoe.html
Alpha Toe

Thank you! Hope you enjoyed the talk!
contact me @:
http://www.danielslater.net/

Building a deep learning ai.pptx

Recommended

Recommended

More Related Content

Similar to Building a deep learning ai.pptx

Similar to Building a deep learning ai.pptx (20)

Recently uploaded

Recently uploaded (20)

Building a deep learning ai.pptx

Editor's Notes