SlideShare a Scribd company logo
Autonomous Driving and Reinforcement Learning ā€“ an Introduction
Michael Bosello
michael.bosello@studio.unibo.it
Universit`a di Bologna ā€“ Department of Computer Science and Engineering, Cesena, Italy
Smart Vehicular Systems A.Y. 2019/2020
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 1 / 30
Outline
1 The Driving Problem
Single task approach
End-to-end approach
Pitfalls of Simulations
2 Reinforcement Learning Basic Concepts
Problem Deļ¬nition
Learning a Policy
3 Example of a Driving Agent: DQN
Neural Network and Training Cycle
Issues and Solutions
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 2 / 30
The Driving Problem
Next in Line...
1 The Driving Problem
Single task approach
End-to-end approach
Pitfalls of Simulations
2 Reinforcement Learning Basic Concepts
Problem Deļ¬nition
Learning a Policy
3 Example of a Driving Agent: DQN
Neural Network and Training Cycle
Issues and Solutions
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 3 / 30
The Driving Problem
Self-Driving I
Goals
Reach the destination
Avoid dangerous states
Provide comfort to passengers (e.g. avoid oscillations, sudden moves)
Settings
Multi-agent problem
One have to consider others
Interaction is both
Cooperative, all the agents desire safety
Competitive, each agent has its own goal
Environment
Non-deterministic
Partially observable
Dynamic
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 4 / 30
The Driving Problem
Self-Driving II
Three sub-problems
Recognition Identify environmentā€™s components
Prediction Predict the evolution of the surrounding
Decision making Take decisions and act to pursue the goal
Two approaches
Single task handling
Use human ingenuity to inject knowledge about the domain
End-to-end
Let the algorithm optimize towards the ļ¬nal goal without constrains
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 5 / 30
The Driving Problem
Self-Driving III
Figure: Modules of autonomous driving systems (source [Talpaert et al., 2019] - modiļ¬ed)
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 6 / 30
The Driving Problem
Sensing & Control
Sensing
Which sensors?
Single/multiple
Sensor expense
Operating conditions
Control
Continuous ā€“ steering, acceleration, braking
Discrete ā€“ forward, right, left, brake
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 7 / 30
The Driving Problem Single task approach
Recognition
Convolutional Neural Networks (CNNs)
YOLO [Redmon et al., 2015]
Computer Vision techniques for object detection
Feature based
Local descriptor based
Mixed
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 8 / 30
The Driving Problem Single task approach
Prediction
Model-based
Physics-based
Maneuver-based
Interaction-aware
Data-driven
Recurrent NN (like Long-Short Term Memory LSTM)
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 9 / 30
The Driving Problem Single task approach
Decision Making
Planning/Tree Search/Optimization
Agent-oriented programming (like the BDI model)
Deep Learning (DL)
Reinforcement Learning (RL)
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 10 / 30
The Driving Problem End-to-end approach
End-to-end
Supervised Learning
Based on imitation
Current approach by major car manufacturer
Robotic drivers are expected to be perfect
Drawbacks
Training data Huge amounts of labeled data or human effort
Covering all possible driving scenarios is very hard
Unsupervised Learning
Learns behaviors by trial-and-error
like a young human driving student
Does not require explicit supervision from humans.
Mainly used on simulated environment
to avoid consequences in real-life
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 11 / 30
The Driving Problem Pitfalls of Simulations
Simulations
Keep in mind
An agent trained in a virtual environment will not perform well in the real setting
For cameras: different visual appearance, especially for textures
You need some kind of transfer learning or conversion of images/data
Always enable noises of sensors/actuators
Some ā€œgrand truthā€ sensors present in some simulators arenā€™t available in real life
For example, in TORCS (a circuit simulator popular in the ML ļ¬eld) you can have exact
distances from other cars and from the borders. Too easy...
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 12 / 30
Reinforcement Learning Basic Concepts
Next in Line...
1 The Driving Problem
Single task approach
End-to-end approach
Pitfalls of Simulations
2 Reinforcement Learning Basic Concepts
Problem Deļ¬nition
Learning a Policy
3 Example of a Driving Agent: DQN
Neural Network and Training Cycle
Issues and Solutions
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 13 / 30
Reinforcement Learning Basic Concepts Problem Deļ¬nition
Agent-environment interaction
In RL, an agent learns how to fulļ¬ll a task by interacting with its environment
The interaction between the agent and the environment can be reduced to:
State Every information about the environment useful to predict the future
Action What the agent do
Reward A real number that Indicates how well the agent is doing
Figure: At each time step t the agent perform an action At and receives the couple St+1, Rt+1
(source [Sutton and Barto, 2018])
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 14 / 30
Reinforcement Learning Basic Concepts Problem Deļ¬nition
Terminology
Episode
A complete run from a starting state to a ļ¬nal state
Episodic task A task that has an end
Continuing task A task that goes on without limit
We can split it e.g. by setting a maximum number of steps
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 15 / 30
Reinforcement Learning Basic Concepts Problem Deļ¬nition
Markov Decision Process (MDP)
A RL problem can be formalized as a MDP. The 4-tuple:
S set of states
As set of actions available in state s
P(s |s, a) state-transition probabilities (probability to reach s given s and a)
R(s, s ) reward distribution associate to state transitions
Figure: Example of a simple MDP (source Wikipedia)
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 16 / 30
Reinforcement Learning Basic Concepts Problem Deļ¬nition
Environment features
MDPs can deal with non-deterministic environments
(because they have probability distributions)
But we need to cope with observability
Markov Property ā€“ Full observability
Classical RL algorithms rely on the Markov property
A state satisfy the property if it includes information about all aspects of the (past)
agent-environment interaction that make a difference for the future
From states to observations ā€“ Partial observability
Sometimes, an agent has only a partial view of the world state
The agent gets information (observations) about some aspects of the environment
Abusing the language, observations/history are called ā€œstateā€ and symbolized St
How to deal with partial observability?
Approximation methods (e.g. NN) doesnā€™t rely on the Markov property
Sometimes we can reconstruct a Markov state using a history
Stacking the last n states
Using a RNN (e.g. LSTM)
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 17 / 30
Reinforcement Learning Basic Concepts Learning a Policy
Learning a Policy I
We want the agent to learn a strategy i.e. a policy
Policy Ļ€
A map from states to actions used by the agent to choose the next action
In particular, we are searching for the optimal policy
ā†’ The policy that maximize the cumulative reward
Cumulative reward
The discounted sum of rewards over time
R =
n
t=0
Ī³t
rt+1 | 0 ā‰¤ Ī³ ā‰¤ 1
For larger values of Ī³, the agent focuses more about the long term rewards
For smaller values of Ī³, the agent focuses more about the immediate rewards
What does this mean?
The actions of the agent affect its possibilities later in time
It could be better to choose an action that led to a better state than an action that
gives you a greater reward but led to a worse state
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 18 / 30
Reinforcement Learning Basic Concepts Learning a Policy
Learning a Policy II
Problem
we donā€™t know the MDP model (state transition probabilities and reward distribution)
Solution
Monte Carlo approach: letā€™s make estimations based on experience
A way to produce a policy is to estimate the value (or action-value) function
given the function, the agent needs only to perform the action with the greatest value
Expected return E[Gt]
The expectation of cumulative reward i.e. our current estimation
Value function VĻ€(s) = E[Gt|St = s]
(state) ā†’ expected return when starting in s and following Ļ€ thereafter
Action-value function QĻ€(s, a) = E[Gt|St = s, At = a]
(state, action) ā†’ expected return performing a and following Ļ€ thereafter
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 19 / 30
Reinforcement Learning Basic Concepts Learning a Policy
Learning a Policy III
SARSA and Q-learning
They are algorithms that approximate the action-value function
At every iteration the estimation is reļ¬ned thanks to the new experience.
Every time the evaluation becomes more precise, the policy gets closer to the
optimal one.
Generalized Policy Iteration (GPI)
Almost all RL methods could be described as GPI
They all have identiļ¬able policies and value functions
Learning consist of two interacting processes
Policy evaluation Compute the value function
Policy improvement Make the policy greedy
Figure: [Sutton and Barto, 2018]
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 20 / 30
Reinforcement Learning Basic Concepts Learning a Policy
Learning a policy IV
The exploration/exploitation dilemma
The agent seeks to learn action values of optimal behavior, but it needs to behave
non-optimally in order to explore all actions (to ļ¬nd the optimal actions)
Īµ-greedy policy: the agent behave greedily but there is a (small) Īµ probability to
select a random action.
On-policy learning
The agent learns action values of the policy that produces its own behavior (qĻ€)
It learns about a near-optimal policy (Īµ-greedy)
Target policy and behavior policy correspond
Off-policy learning
The agent learns action values of the optimal policy (qāˆ—)
There are two policies: target and behavior
Every n steps the target policy is copied into the behavior one
If n = 1 there is only one action-value function
But itā€™s not the same of on-policy (the target and behavior policy still not correspond)
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 21 / 30
Reinforcement Learning Basic Concepts Learning a Policy
How to estimate the value function?
A table that associate state (or state-action pairs) to a value
Randomly initialized
Temporal Difference (TD) update
V (St) ā† V (St) + Ī±[Rt+1 + Ī³V (St+1) āˆ’ V (St)]
TD error
Ī“t = Rt+1 + Ī³V (St+1) āˆ’ V (St)
Rt+1 + Ī³V (St+1) is a better estimation of V (St)
The actual reward obtained plus the expected rewards for the next state
By subtracting the old estimate we get the error made at time t
Sarsa (on-policy)
Q(St, At) ā† Q(St, At) + Ī±[Rt+1 + Ī³Q(St+1, At+1) āˆ’ Q(St, At)]
Q-learning (off-policy)
Q(St, At) ā† Q(St, At) + Ī±[Rt+1 + Ī³ maxa Q(St+1, a) āˆ’ Q(St, At)]
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 22 / 30
Example of a Driving Agent: DQN
Next in Line...
1 The Driving Problem
Single task approach
End-to-end approach
Pitfalls of Simulations
2 Reinforcement Learning Basic Concepts
Problem Deļ¬nition
Learning a Policy
3 Example of a Driving Agent: DQN
Neural Network and Training Cycle
Issues and Solutions
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 23 / 30
Example of a Driving Agent: DQN Neural Network and Training Cycle
Deep Q-Network
DQN
When the number of states increases, it becomes impossible to use a table to
store the Q-function (alias for action-value function)
A neural network can approximate the Q-function
We also get better generalization and the ability to deal with non-Markovian envs
How to represent the action-value function in a NN?
1 Input: the state (as a vector) ā†’ Output: the values of each action
2 Input: the state and an action ā†’ Output: the value for that action
Design the driver agent
Environment: states and actions, reward function
Neural network
Training cycle
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 24 / 30
Example of a Driving Agent: DQN Neural Network and Training Cycle
Neural Network
Pre-processing
Itā€™s important how we present the data to the network
Letā€™s take images as an example
Resize to small images (e.g. 84x84)
If colors are not discriminatory, use grayscale
How many frames to stack?
Remember: the larger the input vector, the larger the searching space
Convolutional Neural Networks
A good idea for structured/spatially local data
e.g. images, lidar data ...
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 25 / 30
Example of a Driving Agent: DQN Neural Network and Training Cycle
Training Cycle
Transaction samples are used to compute the target update
Training use a gradient form of Q-learning
([Mnih et al., 2013] and [Sutton and Barto, 2018] for details)
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 26 / 30
Example of a Driving Agent: DQN Issues and Solutions
DQN Issues and Solutions I
Each of these features signiļ¬cantly improve performance
Experience Replay
We put samples in a buffer and we take random batches from it for training
When full, oldest samples are discarded
Why? Experience replay is needed to break temporal relation
In supervised learning, inputs should be independent and identically distributed (i.i.d.)
i.e. the generative process have no memory of past generated samples
Other advantages:
More efļ¬cient use of experience (samples are used several times)
Avoid catastrophic forgetting (by presenting again old samples)
Double DQN
We have two networks, one behavior NN and one target NN
We train the target NN
Every N steps we copy the target NN into the behavior one
Why? To reduce target instability
Supervised learning to perform well requires that for the same input a label does not
change over time
In RL, the target change constantly (as we reļ¬ne the estimations)
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 27 / 30
Example of a Driving Agent: DQN Issues and Solutions
DQN Issues and Solutions II
Rewards scaling
Scaling rewards in the range [āˆ’1; 1] dramatically improves efļ¬ciency [Mnih et al., 2013]
Frame skipping
Consequent frames are very similar ā†’ we can skip frames without loosing much
information and speed up training by N times
The selected action is repeated for N frames
Initial sampling
It could be useful to save K samples in the replay buffer before start training
So many parameters...
[Mnih et al., 2015] is a good place to start
The table of hyperparameters used in their work is in the next slide
More recent works and experiments could provide better suggestions
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 28 / 30
Example of a Driving Agent: DQN Issues and Solutions
Hyperparameters used in [Mnih et al., 2015]
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 29 / 30
Example of a Driving Agent: DQN Issues and Solutions
Last things
Reward function
Designing a reward function is not an easy task
It could severely affect learning
In the case of a driving agent (like in RL for robotics), the rewards are internal and
doesnā€™t come from the environment
You should give rewards also for intermediate steps to direct the agent (i.e. not
only +1 when successful and āˆ’1 when it fails)
Some naive hints
Reach target +1
Crash āˆ’1
Car goes forward +0.4 (if too big it will inhibit crash penalty)
Car turns +0.1 (if too big it will keep circle around)
āˆ’0.2 instead of the positive reward if the agent turns to a direction and just after it turns
to the opposite one (i.e. right-left or left-right) to reduce car oscillations
Exploration/exploitation
Start with a big Īµ
Gradually reduce it with an Īµ-decay (e.g. 0.999)
Īµt+1 = Īµt āˆ— decay
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 30 / 30
Appendix: Reading Suggestions
Appendix: Reading Suggestions
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 31 / 30
Appendix: Reading Suggestions
Reading Suggestions: Access to Resources
Books and papers are available through the university proxy
Search engine: https://sba-unibo-it.ezproxy.unibo.it/AlmaStart
If you want to access an article by link
converts dots in scores and add .ezproxy.unibo.it at the end of the url e.g:
https://www.nature.com/articles/nature14236
ā†“
https://www-nature-com.ezproxy.unibo.it/articles/nature14236
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 32 / 30
Appendix: Reading Suggestions
Reading Suggestions: Driving Problem
Extended review of autonomous driving approaches [Leon and Gavrilescu, 2019]
Overview of RL for autonomous driving and its challenges [Talpaert et al., 2019]
Tutorial of RL with CARLA simulator https://pythonprogramming.net/
introduction-self-driving-autonomous-cars-carla-python
End-to-end approach examples
CNN to steering on a real car [Bojarski et al., 2016]
DQN car control [Yu et al., 2016]
Single task approach example
NN for recognition + RL for control [Li et al., 2018]
Converting virtual images to real one
[Pan et al., 2017]
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 33 / 30
Appendix: Reading Suggestions
Reading Suggestions: RL & DL
In depth (books)
[Sutton and Barto, 2018] is the bible of RL ā€“ the second version (2018) contains
also recent breakthroughs
[Goodfellow et al., 2016] for Deep Learning
Compact introductions
Brieļ¬‚y explanation of RL basics and DQN: https://rubenfiszel.github.
io/posts/rl4j/2016-08-24-Reinforcement-Learning-and-DQN
CNN: https://cs231n.github.io/convolutional-networks/
Even more compact: https://pathmind.com/wiki/convolutional-network
DQN
[Mnih et al., 2015] Revised version (better)
[Mnih et al., 2013] Original version
[van Hasselt et al., 2015] Double Q-learning
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 34 / 30
References
References I
Bojarski, M., Testa, D. D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel,
L. D., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., and Zieba, K. (2016).
End to end learning for self-driving cars.
Goodfellow, I., Bengio, Y., and Courville, A. (2016).
Deep Learning.
MIT Press.
http://www.deeplearningbook.org.
Leon, F. and Gavrilescu, M. (2019).
A review of tracking, prediction and decision making methods for autonomous
driving.
Li, D., Zhao, D., Zhang, Q., and Chen, Y. (2018).
Reinforcement learning and deep learning based lateral control for autonomous
driving.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and
Riedmiller, M. (2013).
Playing atari with deep reinforcement learning.
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 35 / 30
References
References II
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G.,
Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C.,
Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and
Hassabis, D. (2015).
Human-level control through deep reinforcement learning.
Nature, 518(7540):529ā€“533.
Pan, X., You, Y., Wang, Z., and Lu, C. (2017).
Virtual to real reinforcement learning for autonomous driving.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2015).
You only look once: Uniļ¬ed, real-time object detection.
Sallab, A. E., Abdou, M., Perot, E., and Yogamani, S. (2017).
Deep reinforcement learning framework for autonomous driving.
Electronic Imaging, 2017(19):70ā€“76.
Sutton, R. S. and Barto, A. G. (2018).
Reinforcement learning : an introduction.
The MIT Press.
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 36 / 30
References
References III
Talpaert, V., Sobh, I., Kiran, B. R., Mannion, P., Yogamani, S., El-Sallab, A., and
Perez, P. (2019).
Exploring applications of deep reinforcement learning for real-world autonomous
driving systems.
van Hasselt, H., Guez, A., and Silver, D. (2015).
Deep reinforcement learning with double q-learning.
Yu, A., Palefsky-Smith, R., and Bedi, R. (2016).
Deep reinforcement learning for simulated autonomous vehicle control.
Course Project Reports: Winter, pages 1ā€“7.
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 37 / 30
Autonomous Driving and Reinforcement Learning ā€“ an Introduction
Michael Bosello
michael.bosello@studio.unibo.it
Universit`a di Bologna ā€“ Department of Computer Science and Engineering, Cesena, Italy
Smart Vehicular Systems A.Y. 2019/2020
Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 38 / 30

More Related Content

What's hot

Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
MeetupDataScienceRoma
Ā 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering
Ā 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
Big Data Colombia
Ā 
Particle swarm optimization
Particle swarm optimizationParticle swarm optimization
Particle swarm optimization
anurag singh
Ā 
Generalized Reinforcement Learning
Generalized Reinforcement LearningGeneralized Reinforcement Learning
Generalized Reinforcement Learning
Po-Hsiang (Barnett) Chiu
Ā 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
pauldix
Ā 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic Programming
Seung Jae Lee
Ā 
Particle swarm optimization
Particle swarm optimizationParticle swarm optimization
Particle swarm optimization
Suman Chatterjee
Ā 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
Subrat Panda, PhD
Ā 
Reinforcement learningļ¼špolicy gradient (part 1)
Reinforcement learningļ¼špolicy gradient (part 1)Reinforcement learningļ¼špolicy gradient (part 1)
Reinforcement learningļ¼špolicy gradient (part 1)
Bean Yen
Ā 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
Bill Liu
Ā 
Reinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision ProcessesReinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision Processes
Seung Jae Lee
Ā 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
Nikolay Pavlov
Ā 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introduction
ConnorShorten2
Ā 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learningbutest
Ā 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
Jie-Han Chen
Ā 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningJungyeol
Ā 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
DongHyun Kwak
Ā 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
Omar Enayet
Ā 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
Kai-Wen Zhao
Ā 

What's hot (20)

Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
Ā 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
Ā 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
Ā 
Particle swarm optimization
Particle swarm optimizationParticle swarm optimization
Particle swarm optimization
Ā 
Generalized Reinforcement Learning
Generalized Reinforcement LearningGeneralized Reinforcement Learning
Generalized Reinforcement Learning
Ā 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
Ā 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic Programming
Ā 
Particle swarm optimization
Particle swarm optimizationParticle swarm optimization
Particle swarm optimization
Ā 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
Ā 
Reinforcement learningļ¼špolicy gradient (part 1)
Reinforcement learningļ¼špolicy gradient (part 1)Reinforcement learningļ¼špolicy gradient (part 1)
Reinforcement learningļ¼špolicy gradient (part 1)
Ā 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
Ā 
Reinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision ProcessesReinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision Processes
Ā 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
Ā 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introduction
Ā 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Ā 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
Ā 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Ā 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Ā 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
Ā 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
Ā 

Similar to Autonomous Driving and Reinforcement Learning - an Introduction

MSN 2019 - Robot Drivers: Learning to Drive by Trial & Error
MSN 2019 - Robot Drivers: Learning to Drive by Trial & ErrorMSN 2019 - Robot Drivers: Learning to Drive by Trial & Error
MSN 2019 - Robot Drivers: Learning to Drive by Trial & Error
Michael Bosello
Ā 
Concepts of predictive control
Concepts of predictive controlConcepts of predictive control
Concepts of predictive control
JARossiter
Ā 
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
IRJET Journal
Ā 
Planning in Markov Stochastic Task Domains
Planning in Markov Stochastic Task DomainsPlanning in Markov Stochastic Task Domains
Planning in Markov Stochastic Task Domains
Waqas Tariq
Ā 
Towards Reinforcement Learning-based Aggregate Computing
Towards Reinforcement Learning-based Aggregate ComputingTowards Reinforcement Learning-based Aggregate Computing
Towards Reinforcement Learning-based Aggregate Computing
Gianluca Aguzzi
Ā 
FurkatKochkarov-propoal
FurkatKochkarov-propoalFurkatKochkarov-propoal
FurkatKochkarov-propoalFurkat Kochkarov
Ā 
ARTIFICIAL INTELLIGENCE ppt.pptx
ARTIFICIAL INTELLIGENCE ppt.pptxARTIFICIAL INTELLIGENCE ppt.pptx
ARTIFICIAL INTELLIGENCE ppt.pptx
VinodhKumar658343
Ā 
[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes
University of Bologna
Ā 
Towards Systematically Engineering Ensembles - Martin Wirsing
Towards Systematically Engineering Ensembles - Martin WirsingTowards Systematically Engineering Ensembles - Martin Wirsing
Towards Systematically Engineering Ensembles - Martin Wirsing
FET AWARE project - Self Awareness in Autonomic Systems
Ā 
Mobile data offloading
Mobile data offloadingMobile data offloading
Mobile data offloading
Akshay Salunkhe
Ā 
Quantitative management
Quantitative managementQuantitative management
Quantitative managementsmumbahelp
Ā 
čÆ¾å ‚č®²ä¹‰(ęœ€åŽę›“ę–°:2009-9-25)
čÆ¾å ‚č®²ä¹‰(ęœ€åŽę›“ę–°:2009-9-25)čÆ¾å ‚č®²ä¹‰(ęœ€åŽę›“ę–°:2009-9-25)
čÆ¾å ‚č®²ä¹‰(ęœ€åŽę›“ę–°:2009-9-25)butest
Ā 
Aj Copulas V4
Aj Copulas V4Aj Copulas V4
Aj Copulas V4
jainan33
Ā 
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
felicidaddinwoodie
Ā 
Codecamp Iasi 7 mai 2011 Monte Carlo Simulation
Codecamp Iasi 7 mai 2011 Monte Carlo SimulationCodecamp Iasi 7 mai 2011 Monte Carlo Simulation
Codecamp Iasi 7 mai 2011 Monte Carlo SimulationCodecamp Romania
Ā 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
VaishnavGhadge1
Ā 
Demolition Derby 2011 at GECCO
Demolition Derby 2011 at GECCO Demolition Derby 2011 at GECCO
Demolition Derby 2011 at GECCO
Martin Butz
Ā 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
James by CrowdProcess
Ā 
On Collective Reinforcement Learning
On Collective Reinforcement LearningOn Collective Reinforcement Learning
On Collective Reinforcement Learning
Gianluca Aguzzi
Ā 
Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)
Sanjeev Deshmukh
Ā 

Similar to Autonomous Driving and Reinforcement Learning - an Introduction (20)

MSN 2019 - Robot Drivers: Learning to Drive by Trial & Error
MSN 2019 - Robot Drivers: Learning to Drive by Trial & ErrorMSN 2019 - Robot Drivers: Learning to Drive by Trial & Error
MSN 2019 - Robot Drivers: Learning to Drive by Trial & Error
Ā 
Concepts of predictive control
Concepts of predictive controlConcepts of predictive control
Concepts of predictive control
Ā 
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
IRJET- A Review on Deep Reinforcement Learning Induced Autonomous Driving Fra...
Ā 
Planning in Markov Stochastic Task Domains
Planning in Markov Stochastic Task DomainsPlanning in Markov Stochastic Task Domains
Planning in Markov Stochastic Task Domains
Ā 
Towards Reinforcement Learning-based Aggregate Computing
Towards Reinforcement Learning-based Aggregate ComputingTowards Reinforcement Learning-based Aggregate Computing
Towards Reinforcement Learning-based Aggregate Computing
Ā 
FurkatKochkarov-propoal
FurkatKochkarov-propoalFurkatKochkarov-propoal
FurkatKochkarov-propoal
Ā 
ARTIFICIAL INTELLIGENCE ppt.pptx
ARTIFICIAL INTELLIGENCE ppt.pptxARTIFICIAL INTELLIGENCE ppt.pptx
ARTIFICIAL INTELLIGENCE ppt.pptx
Ā 
[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes
Ā 
Towards Systematically Engineering Ensembles - Martin Wirsing
Towards Systematically Engineering Ensembles - Martin WirsingTowards Systematically Engineering Ensembles - Martin Wirsing
Towards Systematically Engineering Ensembles - Martin Wirsing
Ā 
Mobile data offloading
Mobile data offloadingMobile data offloading
Mobile data offloading
Ā 
Quantitative management
Quantitative managementQuantitative management
Quantitative management
Ā 
čÆ¾å ‚č®²ä¹‰(ęœ€åŽę›“ę–°:2009-9-25)
čÆ¾å ‚č®²ä¹‰(ęœ€åŽę›“ę–°:2009-9-25)čÆ¾å ‚č®²ä¹‰(ęœ€åŽę›“ę–°:2009-9-25)
čÆ¾å ‚č®²ä¹‰(ęœ€åŽę›“ę–°:2009-9-25)
Ā 
Aj Copulas V4
Aj Copulas V4Aj Copulas V4
Aj Copulas V4
Ā 
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
Ā 
Codecamp Iasi 7 mai 2011 Monte Carlo Simulation
Codecamp Iasi 7 mai 2011 Monte Carlo SimulationCodecamp Iasi 7 mai 2011 Monte Carlo Simulation
Codecamp Iasi 7 mai 2011 Monte Carlo Simulation
Ā 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
Ā 
Demolition Derby 2011 at GECCO
Demolition Derby 2011 at GECCO Demolition Derby 2011 at GECCO
Demolition Derby 2011 at GECCO
Ā 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
Ā 
On Collective Reinforcement Learning
On Collective Reinforcement LearningOn Collective Reinforcement Learning
On Collective Reinforcement Learning
Ā 
Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)
Ā 

Recently uploaded

extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
Ā 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
Ā 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
Ā 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
Ā 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
Ā 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
Ā 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
Ā 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
Ā 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
Ā 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
Ā 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
Ā 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
Ā 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
Ā 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
SĆ©rgio Sacani
Ā 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
Ā 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
Ā 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
Ā 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
Ā 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
Ā 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
Ā 

Recently uploaded (20)

extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
Ā 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Ā 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
Ā 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
Ā 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Ā 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
Ā 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
Ā 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
Ā 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Ā 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
Ā 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
Ā 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Ā 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
Ā 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Ā 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
Ā 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
Ā 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
Ā 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
Ā 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Ā 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Ā 

Autonomous Driving and Reinforcement Learning - an Introduction

  • 1. Autonomous Driving and Reinforcement Learning ā€“ an Introduction Michael Bosello michael.bosello@studio.unibo.it Universit`a di Bologna ā€“ Department of Computer Science and Engineering, Cesena, Italy Smart Vehicular Systems A.Y. 2019/2020 Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 1 / 30
  • 2. Outline 1 The Driving Problem Single task approach End-to-end approach Pitfalls of Simulations 2 Reinforcement Learning Basic Concepts Problem Deļ¬nition Learning a Policy 3 Example of a Driving Agent: DQN Neural Network and Training Cycle Issues and Solutions Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 2 / 30
  • 3. The Driving Problem Next in Line... 1 The Driving Problem Single task approach End-to-end approach Pitfalls of Simulations 2 Reinforcement Learning Basic Concepts Problem Deļ¬nition Learning a Policy 3 Example of a Driving Agent: DQN Neural Network and Training Cycle Issues and Solutions Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 3 / 30
  • 4. The Driving Problem Self-Driving I Goals Reach the destination Avoid dangerous states Provide comfort to passengers (e.g. avoid oscillations, sudden moves) Settings Multi-agent problem One have to consider others Interaction is both Cooperative, all the agents desire safety Competitive, each agent has its own goal Environment Non-deterministic Partially observable Dynamic Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 4 / 30
  • 5. The Driving Problem Self-Driving II Three sub-problems Recognition Identify environmentā€™s components Prediction Predict the evolution of the surrounding Decision making Take decisions and act to pursue the goal Two approaches Single task handling Use human ingenuity to inject knowledge about the domain End-to-end Let the algorithm optimize towards the ļ¬nal goal without constrains Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 5 / 30
  • 6. The Driving Problem Self-Driving III Figure: Modules of autonomous driving systems (source [Talpaert et al., 2019] - modiļ¬ed) Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 6 / 30
  • 7. The Driving Problem Sensing & Control Sensing Which sensors? Single/multiple Sensor expense Operating conditions Control Continuous ā€“ steering, acceleration, braking Discrete ā€“ forward, right, left, brake Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 7 / 30
  • 8. The Driving Problem Single task approach Recognition Convolutional Neural Networks (CNNs) YOLO [Redmon et al., 2015] Computer Vision techniques for object detection Feature based Local descriptor based Mixed Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 8 / 30
  • 9. The Driving Problem Single task approach Prediction Model-based Physics-based Maneuver-based Interaction-aware Data-driven Recurrent NN (like Long-Short Term Memory LSTM) Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 9 / 30
  • 10. The Driving Problem Single task approach Decision Making Planning/Tree Search/Optimization Agent-oriented programming (like the BDI model) Deep Learning (DL) Reinforcement Learning (RL) Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 10 / 30
  • 11. The Driving Problem End-to-end approach End-to-end Supervised Learning Based on imitation Current approach by major car manufacturer Robotic drivers are expected to be perfect Drawbacks Training data Huge amounts of labeled data or human effort Covering all possible driving scenarios is very hard Unsupervised Learning Learns behaviors by trial-and-error like a young human driving student Does not require explicit supervision from humans. Mainly used on simulated environment to avoid consequences in real-life Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 11 / 30
  • 12. The Driving Problem Pitfalls of Simulations Simulations Keep in mind An agent trained in a virtual environment will not perform well in the real setting For cameras: different visual appearance, especially for textures You need some kind of transfer learning or conversion of images/data Always enable noises of sensors/actuators Some ā€œgrand truthā€ sensors present in some simulators arenā€™t available in real life For example, in TORCS (a circuit simulator popular in the ML ļ¬eld) you can have exact distances from other cars and from the borders. Too easy... Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 12 / 30
  • 13. Reinforcement Learning Basic Concepts Next in Line... 1 The Driving Problem Single task approach End-to-end approach Pitfalls of Simulations 2 Reinforcement Learning Basic Concepts Problem Deļ¬nition Learning a Policy 3 Example of a Driving Agent: DQN Neural Network and Training Cycle Issues and Solutions Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 13 / 30
  • 14. Reinforcement Learning Basic Concepts Problem Deļ¬nition Agent-environment interaction In RL, an agent learns how to fulļ¬ll a task by interacting with its environment The interaction between the agent and the environment can be reduced to: State Every information about the environment useful to predict the future Action What the agent do Reward A real number that Indicates how well the agent is doing Figure: At each time step t the agent perform an action At and receives the couple St+1, Rt+1 (source [Sutton and Barto, 2018]) Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 14 / 30
  • 15. Reinforcement Learning Basic Concepts Problem Deļ¬nition Terminology Episode A complete run from a starting state to a ļ¬nal state Episodic task A task that has an end Continuing task A task that goes on without limit We can split it e.g. by setting a maximum number of steps Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 15 / 30
  • 16. Reinforcement Learning Basic Concepts Problem Deļ¬nition Markov Decision Process (MDP) A RL problem can be formalized as a MDP. The 4-tuple: S set of states As set of actions available in state s P(s |s, a) state-transition probabilities (probability to reach s given s and a) R(s, s ) reward distribution associate to state transitions Figure: Example of a simple MDP (source Wikipedia) Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 16 / 30
  • 17. Reinforcement Learning Basic Concepts Problem Deļ¬nition Environment features MDPs can deal with non-deterministic environments (because they have probability distributions) But we need to cope with observability Markov Property ā€“ Full observability Classical RL algorithms rely on the Markov property A state satisfy the property if it includes information about all aspects of the (past) agent-environment interaction that make a difference for the future From states to observations ā€“ Partial observability Sometimes, an agent has only a partial view of the world state The agent gets information (observations) about some aspects of the environment Abusing the language, observations/history are called ā€œstateā€ and symbolized St How to deal with partial observability? Approximation methods (e.g. NN) doesnā€™t rely on the Markov property Sometimes we can reconstruct a Markov state using a history Stacking the last n states Using a RNN (e.g. LSTM) Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 17 / 30
  • 18. Reinforcement Learning Basic Concepts Learning a Policy Learning a Policy I We want the agent to learn a strategy i.e. a policy Policy Ļ€ A map from states to actions used by the agent to choose the next action In particular, we are searching for the optimal policy ā†’ The policy that maximize the cumulative reward Cumulative reward The discounted sum of rewards over time R = n t=0 Ī³t rt+1 | 0 ā‰¤ Ī³ ā‰¤ 1 For larger values of Ī³, the agent focuses more about the long term rewards For smaller values of Ī³, the agent focuses more about the immediate rewards What does this mean? The actions of the agent affect its possibilities later in time It could be better to choose an action that led to a better state than an action that gives you a greater reward but led to a worse state Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 18 / 30
  • 19. Reinforcement Learning Basic Concepts Learning a Policy Learning a Policy II Problem we donā€™t know the MDP model (state transition probabilities and reward distribution) Solution Monte Carlo approach: letā€™s make estimations based on experience A way to produce a policy is to estimate the value (or action-value) function given the function, the agent needs only to perform the action with the greatest value Expected return E[Gt] The expectation of cumulative reward i.e. our current estimation Value function VĻ€(s) = E[Gt|St = s] (state) ā†’ expected return when starting in s and following Ļ€ thereafter Action-value function QĻ€(s, a) = E[Gt|St = s, At = a] (state, action) ā†’ expected return performing a and following Ļ€ thereafter Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 19 / 30
  • 20. Reinforcement Learning Basic Concepts Learning a Policy Learning a Policy III SARSA and Q-learning They are algorithms that approximate the action-value function At every iteration the estimation is reļ¬ned thanks to the new experience. Every time the evaluation becomes more precise, the policy gets closer to the optimal one. Generalized Policy Iteration (GPI) Almost all RL methods could be described as GPI They all have identiļ¬able policies and value functions Learning consist of two interacting processes Policy evaluation Compute the value function Policy improvement Make the policy greedy Figure: [Sutton and Barto, 2018] Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 20 / 30
  • 21. Reinforcement Learning Basic Concepts Learning a Policy Learning a policy IV The exploration/exploitation dilemma The agent seeks to learn action values of optimal behavior, but it needs to behave non-optimally in order to explore all actions (to ļ¬nd the optimal actions) Īµ-greedy policy: the agent behave greedily but there is a (small) Īµ probability to select a random action. On-policy learning The agent learns action values of the policy that produces its own behavior (qĻ€) It learns about a near-optimal policy (Īµ-greedy) Target policy and behavior policy correspond Off-policy learning The agent learns action values of the optimal policy (qāˆ—) There are two policies: target and behavior Every n steps the target policy is copied into the behavior one If n = 1 there is only one action-value function But itā€™s not the same of on-policy (the target and behavior policy still not correspond) Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 21 / 30
  • 22. Reinforcement Learning Basic Concepts Learning a Policy How to estimate the value function? A table that associate state (or state-action pairs) to a value Randomly initialized Temporal Difference (TD) update V (St) ā† V (St) + Ī±[Rt+1 + Ī³V (St+1) āˆ’ V (St)] TD error Ī“t = Rt+1 + Ī³V (St+1) āˆ’ V (St) Rt+1 + Ī³V (St+1) is a better estimation of V (St) The actual reward obtained plus the expected rewards for the next state By subtracting the old estimate we get the error made at time t Sarsa (on-policy) Q(St, At) ā† Q(St, At) + Ī±[Rt+1 + Ī³Q(St+1, At+1) āˆ’ Q(St, At)] Q-learning (off-policy) Q(St, At) ā† Q(St, At) + Ī±[Rt+1 + Ī³ maxa Q(St+1, a) āˆ’ Q(St, At)] Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 22 / 30
  • 23. Example of a Driving Agent: DQN Next in Line... 1 The Driving Problem Single task approach End-to-end approach Pitfalls of Simulations 2 Reinforcement Learning Basic Concepts Problem Deļ¬nition Learning a Policy 3 Example of a Driving Agent: DQN Neural Network and Training Cycle Issues and Solutions Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 23 / 30
  • 24. Example of a Driving Agent: DQN Neural Network and Training Cycle Deep Q-Network DQN When the number of states increases, it becomes impossible to use a table to store the Q-function (alias for action-value function) A neural network can approximate the Q-function We also get better generalization and the ability to deal with non-Markovian envs How to represent the action-value function in a NN? 1 Input: the state (as a vector) ā†’ Output: the values of each action 2 Input: the state and an action ā†’ Output: the value for that action Design the driver agent Environment: states and actions, reward function Neural network Training cycle Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 24 / 30
  • 25. Example of a Driving Agent: DQN Neural Network and Training Cycle Neural Network Pre-processing Itā€™s important how we present the data to the network Letā€™s take images as an example Resize to small images (e.g. 84x84) If colors are not discriminatory, use grayscale How many frames to stack? Remember: the larger the input vector, the larger the searching space Convolutional Neural Networks A good idea for structured/spatially local data e.g. images, lidar data ... Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 25 / 30
  • 26. Example of a Driving Agent: DQN Neural Network and Training Cycle Training Cycle Transaction samples are used to compute the target update Training use a gradient form of Q-learning ([Mnih et al., 2013] and [Sutton and Barto, 2018] for details) Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 26 / 30
  • 27. Example of a Driving Agent: DQN Issues and Solutions DQN Issues and Solutions I Each of these features signiļ¬cantly improve performance Experience Replay We put samples in a buffer and we take random batches from it for training When full, oldest samples are discarded Why? Experience replay is needed to break temporal relation In supervised learning, inputs should be independent and identically distributed (i.i.d.) i.e. the generative process have no memory of past generated samples Other advantages: More efļ¬cient use of experience (samples are used several times) Avoid catastrophic forgetting (by presenting again old samples) Double DQN We have two networks, one behavior NN and one target NN We train the target NN Every N steps we copy the target NN into the behavior one Why? To reduce target instability Supervised learning to perform well requires that for the same input a label does not change over time In RL, the target change constantly (as we reļ¬ne the estimations) Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 27 / 30
  • 28. Example of a Driving Agent: DQN Issues and Solutions DQN Issues and Solutions II Rewards scaling Scaling rewards in the range [āˆ’1; 1] dramatically improves efļ¬ciency [Mnih et al., 2013] Frame skipping Consequent frames are very similar ā†’ we can skip frames without loosing much information and speed up training by N times The selected action is repeated for N frames Initial sampling It could be useful to save K samples in the replay buffer before start training So many parameters... [Mnih et al., 2015] is a good place to start The table of hyperparameters used in their work is in the next slide More recent works and experiments could provide better suggestions Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 28 / 30
  • 29. Example of a Driving Agent: DQN Issues and Solutions Hyperparameters used in [Mnih et al., 2015] Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 29 / 30
  • 30. Example of a Driving Agent: DQN Issues and Solutions Last things Reward function Designing a reward function is not an easy task It could severely affect learning In the case of a driving agent (like in RL for robotics), the rewards are internal and doesnā€™t come from the environment You should give rewards also for intermediate steps to direct the agent (i.e. not only +1 when successful and āˆ’1 when it fails) Some naive hints Reach target +1 Crash āˆ’1 Car goes forward +0.4 (if too big it will inhibit crash penalty) Car turns +0.1 (if too big it will keep circle around) āˆ’0.2 instead of the positive reward if the agent turns to a direction and just after it turns to the opposite one (i.e. right-left or left-right) to reduce car oscillations Exploration/exploitation Start with a big Īµ Gradually reduce it with an Īµ-decay (e.g. 0.999) Īµt+1 = Īµt āˆ— decay Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 30 / 30
  • 31. Appendix: Reading Suggestions Appendix: Reading Suggestions Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 31 / 30
  • 32. Appendix: Reading Suggestions Reading Suggestions: Access to Resources Books and papers are available through the university proxy Search engine: https://sba-unibo-it.ezproxy.unibo.it/AlmaStart If you want to access an article by link converts dots in scores and add .ezproxy.unibo.it at the end of the url e.g: https://www.nature.com/articles/nature14236 ā†“ https://www-nature-com.ezproxy.unibo.it/articles/nature14236 Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 32 / 30
  • 33. Appendix: Reading Suggestions Reading Suggestions: Driving Problem Extended review of autonomous driving approaches [Leon and Gavrilescu, 2019] Overview of RL for autonomous driving and its challenges [Talpaert et al., 2019] Tutorial of RL with CARLA simulator https://pythonprogramming.net/ introduction-self-driving-autonomous-cars-carla-python End-to-end approach examples CNN to steering on a real car [Bojarski et al., 2016] DQN car control [Yu et al., 2016] Single task approach example NN for recognition + RL for control [Li et al., 2018] Converting virtual images to real one [Pan et al., 2017] Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 33 / 30
  • 34. Appendix: Reading Suggestions Reading Suggestions: RL & DL In depth (books) [Sutton and Barto, 2018] is the bible of RL ā€“ the second version (2018) contains also recent breakthroughs [Goodfellow et al., 2016] for Deep Learning Compact introductions Brieļ¬‚y explanation of RL basics and DQN: https://rubenfiszel.github. io/posts/rl4j/2016-08-24-Reinforcement-Learning-and-DQN CNN: https://cs231n.github.io/convolutional-networks/ Even more compact: https://pathmind.com/wiki/convolutional-network DQN [Mnih et al., 2015] Revised version (better) [Mnih et al., 2013] Original version [van Hasselt et al., 2015] Double Q-learning Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 34 / 30
  • 35. References References I Bojarski, M., Testa, D. D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L. D., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., and Zieba, K. (2016). End to end learning for self-driving cars. Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org. Leon, F. and Gavrilescu, M. (2019). A review of tracking, prediction and decision making methods for autonomous driving. Li, D., Zhao, D., Zhang, Q., and Chen, Y. (2018). Reinforcement learning and deep learning based lateral control for autonomous driving. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 35 / 30
  • 36. References References II Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540):529ā€“533. Pan, X., You, Y., Wang, Z., and Lu, C. (2017). Virtual to real reinforcement learning for autonomous driving. Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2015). You only look once: Uniļ¬ed, real-time object detection. Sallab, A. E., Abdou, M., Perot, E., and Yogamani, S. (2017). Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 2017(19):70ā€“76. Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning : an introduction. The MIT Press. Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 36 / 30
  • 37. References References III Talpaert, V., Sobh, I., Kiran, B. R., Mannion, P., Yogamani, S., El-Sallab, A., and Perez, P. (2019). Exploring applications of deep reinforcement learning for real-world autonomous driving systems. van Hasselt, H., Guez, A., and Silver, D. (2015). Deep reinforcement learning with double q-learning. Yu, A., Palefsky-Smith, R., and Bedi, R. (2016). Deep reinforcement learning for simulated autonomous vehicle control. Course Project Reports: Winter, pages 1ā€“7. Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 37 / 30
  • 38. Autonomous Driving and Reinforcement Learning ā€“ an Introduction Michael Bosello michael.bosello@studio.unibo.it Universit`a di Bologna ā€“ Department of Computer Science and Engineering, Cesena, Italy Smart Vehicular Systems A.Y. 2019/2020 Michael Bosello Autonomous Driving and RL ā€“ an introduction SVS 2020 38 / 30