Deep Reinforcement
Learning
By Sachin Bijalwan
End to end deep learning
● Predict angles directly from the image
● Supervising Learning
● Uses Data
Motivation
● Scenario - What if we want our model to crash into a tree rather than human
● What if our model avoids a leaf lying below the car?
● What if we want our model to take minimal turns?
Deep Reinforcement Learning
● Collects data by experiments
● Unsupervised Learning
DeepMind’s paper
Deep Reinforcement learning in Self Driving Car
● Environment is complex
● Continuous outputs
● Model is more deeper
● Too many hyperparameters to adjust
Architecture
● Previous architecture :- 4 convolutional layers + 5 fully connected layers
● Removed dropout layer
● Removed augmentation of data
Hill Climbing with Nvidia’s model
● Agent runs initially according to random values set in our model
● Stores images and predicted angle
● As the agent crashes into wall, it dies
● Took last 30 images and predictions
● Generate random values as predictions
● Train itself
● Validate the training by again running on simulator
Deep Q Learning
● Qvalue(s,a)= expected reward after taking action a on state s
● Learn Qvalue of the states
● Choose the states with Maximum Q values
● Update equation of Qvalue:
○ Q(s,a)=R+gamma*max(Q(s’,a’) a’)
∀
Episodes
● Represent the tuple of (s,a,s’,R)
● We store episodes
● Later we train our model on them
● Memory used is called Replay memory
● Resembles biological way of learning things
How to train model?
● Agent would select a random action with probability epsilon and the action of
model with probability 1-epsilon
● Keep on decreasing the value of epsilon with time
● Store episodes
● Use them later for training the model
Challenges
● Too many hyperparameters to adjust
● Model takes too much time to train
Deep Reinforcement learning in Self Driving cars

Deep Reinforcement learning in Self Driving cars

  • 1.
  • 2.
    End to enddeep learning ● Predict angles directly from the image ● Supervising Learning ● Uses Data
  • 3.
    Motivation ● Scenario -What if we want our model to crash into a tree rather than human ● What if our model avoids a leaf lying below the car? ● What if we want our model to take minimal turns?
  • 4.
    Deep Reinforcement Learning ●Collects data by experiments ● Unsupervised Learning
  • 5.
  • 6.
    Deep Reinforcement learningin Self Driving Car ● Environment is complex ● Continuous outputs ● Model is more deeper ● Too many hyperparameters to adjust
  • 7.
    Architecture ● Previous architecture:- 4 convolutional layers + 5 fully connected layers ● Removed dropout layer ● Removed augmentation of data
  • 8.
    Hill Climbing withNvidia’s model ● Agent runs initially according to random values set in our model ● Stores images and predicted angle ● As the agent crashes into wall, it dies ● Took last 30 images and predictions ● Generate random values as predictions ● Train itself ● Validate the training by again running on simulator
  • 9.
    Deep Q Learning ●Qvalue(s,a)= expected reward after taking action a on state s ● Learn Qvalue of the states ● Choose the states with Maximum Q values ● Update equation of Qvalue: ○ Q(s,a)=R+gamma*max(Q(s’,a’) a’) ∀
  • 10.
    Episodes ● Represent thetuple of (s,a,s’,R) ● We store episodes ● Later we train our model on them ● Memory used is called Replay memory ● Resembles biological way of learning things
  • 11.
    How to trainmodel? ● Agent would select a random action with probability epsilon and the action of model with probability 1-epsilon ● Keep on decreasing the value of epsilon with time ● Store episodes ● Use them later for training the model
  • 12.
    Challenges ● Too manyhyperparameters to adjust ● Model takes too much time to train