This document discusses reinforcement learning approaches for autonomous driving systems. It provides an overview of reinforcement learning and how it can be applied to various autonomous driving tasks like vehicle control, path planning, and decision making. The document outlines the state and action spaces used in reinforcement learning for autonomous driving. It also discusses challenges in designing reward functions and exploring the environment safely. It compares reinforcement learning to imitation learning and inverse reinforcement learning approaches. Finally, it discusses the role of simulators and challenges in moving learned policies from simulation to the real world.
발표자: 이광무(빅토리아대 교수)
발표일: 2018.7.
Local features are one of the core building blocks of Computer Vision, used in various tasks such as Image Retrieval, Visual Tracking, Image Registration, and Image Matching. Especially for geometric applications, that is, finding the pose of the camera from images, it still remains the state of the art, even in the era of Deep Learining.
In this talk, I will introduce our recent works on using local features, as well as a recent self-supervised pipeline that learns to keypoints that are used to match images from scratch. I will first talk about how to learn to find good correspondences, then talk about how to go through losses based on eigen decomposition, which is essential when retrieving the camera pose. I will also introduce our latest local feature pipeline, which is self-supervised inspired by reinforcement learning.
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
Presentation at University of Lisbon on Machine Learning and big data.
Deep learning algorithms and applications to credit risk analysis, churn detection and recommendation algorithms
발표자: 이광무(빅토리아대 교수)
발표일: 2018.7.
Local features are one of the core building blocks of Computer Vision, used in various tasks such as Image Retrieval, Visual Tracking, Image Registration, and Image Matching. Especially for geometric applications, that is, finding the pose of the camera from images, it still remains the state of the art, even in the era of Deep Learining.
In this talk, I will introduce our recent works on using local features, as well as a recent self-supervised pipeline that learns to keypoints that are used to match images from scratch. I will first talk about how to learn to find good correspondences, then talk about how to go through losses based on eigen decomposition, which is essential when retrieving the camera pose. I will also introduce our latest local feature pipeline, which is self-supervised inspired by reinforcement learning.
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
Presentation at University of Lisbon on Machine Learning and big data.
Deep learning algorithms and applications to credit risk analysis, churn detection and recommendation algorithms
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
What is pattern recognition (lecture 4 of 6)Randa Elanwar
In this series I intend to simplify a beautiful branch of computer science that we as humans use it in everyday life without knowing. Pattern recognition is a sub-branch of the computer vision research and is tightly related to digital signal processing research as well as machine learning and artificial intelligence.
Deep reinforcement learning framework for autonomous drivingGopikaGopinath5
Motivated by the successful demonstrations of learning of Atari games and Go by Google DeepMind, it is possible to propose a framework for autonomous driving using deep reinforcement learning.
It incorporates Recurrent Neural Networks for information integration, enabling the car to handle partially observable scenarios.
Autonomous driving system using proximal policy optimization in deep reinforc...IAESIJAI
Autonomous driving is one solution that can minimize and even prevent
accidents. In autonomous driving, the vehicle must know the surrounding
environment and move under the provisions and situations. We build an
autonomous driving system using proximal policy optimization (PPO) in
deep reinforcement learning, with PPO acting as an instinct for the agent to
choose an action. The instinct will be updated continuously until the agent
reaches the destination from the initial point. We use five sensory inputs for
the agent to accelerate, turn the steer, hit the brakes, avoid the walls, detect
the initial point, and reach the destination point. We evaluated our proposed
autonomous driving system in a simulation environment with several
branching tracks, reflecting a real-world setting. For our driving simulation
purpose in this research, we use the Unity3D engine to construct the dataset
(in the form of a road track) and the agent model (in the form of a car). Our
experimental results firmly indicate our agent can successfully control a
vehicle to navigate to the destination point.
Deep Reinforcement Leaning In Machine LearningInterCon
A session by Dr Ganapathi Pulipaka, Chief Data Scientist, Accenture on the topic of 'Deep Reinforcement Leaning In Machine Learning' at InterCon USA 2019, held at Caesars Palace, Las Vegas on 18-20 June, 2019.
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...GagandeepKaur872517
Convolutional Neural Networks (CNNs) and Long Short-Term Memory Networks (LSTMs) have made substantial advances in the domains of computer vision and speech recognition in recent years. These deep learning architectures have shown exceptional ability in a variety of tasks. Evaluating the performance of such models is critical for comprehending their efficacy and directing future developments. In this study, we undertake a thorough comparison utilizing two essential evaluation metrics: Root Mean Square Error (RMSE) and Mean Average Precision (MAP). Our research intends to give light on the applicability of these metrics for analyzing the performance of CNNs and LSTMs, as well as their strengths and limitations.
The combination of modern Reinforcement Learning and Deep Learning approaches holds the promise of making significant progress on challenging applications requiring both rich perception and policy-selection. The Arcade Learning Environment (ALE) provides a set of Atari games that represent a useful benchmark set of such applications. A recent breakthrough in combining model-free reinforcement learning with deep learning, called DQN, achieves the best realtimeagentsthusfar. Planning-based approaches achieve far higherscores than the best model-free approaches, but they exploit inf ormation that is not available to humanplayers,andtheyareordersofmagnitudeslowerthanneededforreal-time play. Our main goal in this work is to build a better real-time Atari game playing gentthan DQN.The central idea is tousethes low planning-basedagent stoprovide training data for a deep-learning architecture capable of real-time play. We proposed new agents based on this idea and show that they outperform DQN.
"Trans Failsafe Prog" on your BMW X5 indicates potential transmission issues requiring immediate action. This safety feature activates in response to abnormalities like low fluid levels, leaks, faulty sensors, electrical or mechanical failures, and overheating.
Things to remember while upgrading the brakes of your carjennifermiller8137
Upgrading the brakes of your car? Keep these things in mind before doing so. Additionally, start using an OBD 2 GPS tracker so that you never miss a vehicle maintenance appointment. On top of this, a car GPS tracker will also let you master good driving habits that will let you increase the operational life of your car’s brakes.
More Related Content
Similar to Deep RL for Autonomous Driving exploring applications Cognitive vehicles 2019
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
What is pattern recognition (lecture 4 of 6)Randa Elanwar
In this series I intend to simplify a beautiful branch of computer science that we as humans use it in everyday life without knowing. Pattern recognition is a sub-branch of the computer vision research and is tightly related to digital signal processing research as well as machine learning and artificial intelligence.
Deep reinforcement learning framework for autonomous drivingGopikaGopinath5
Motivated by the successful demonstrations of learning of Atari games and Go by Google DeepMind, it is possible to propose a framework for autonomous driving using deep reinforcement learning.
It incorporates Recurrent Neural Networks for information integration, enabling the car to handle partially observable scenarios.
Autonomous driving system using proximal policy optimization in deep reinforc...IAESIJAI
Autonomous driving is one solution that can minimize and even prevent
accidents. In autonomous driving, the vehicle must know the surrounding
environment and move under the provisions and situations. We build an
autonomous driving system using proximal policy optimization (PPO) in
deep reinforcement learning, with PPO acting as an instinct for the agent to
choose an action. The instinct will be updated continuously until the agent
reaches the destination from the initial point. We use five sensory inputs for
the agent to accelerate, turn the steer, hit the brakes, avoid the walls, detect
the initial point, and reach the destination point. We evaluated our proposed
autonomous driving system in a simulation environment with several
branching tracks, reflecting a real-world setting. For our driving simulation
purpose in this research, we use the Unity3D engine to construct the dataset
(in the form of a road track) and the agent model (in the form of a car). Our
experimental results firmly indicate our agent can successfully control a
vehicle to navigate to the destination point.
Deep Reinforcement Leaning In Machine LearningInterCon
A session by Dr Ganapathi Pulipaka, Chief Data Scientist, Accenture on the topic of 'Deep Reinforcement Leaning In Machine Learning' at InterCon USA 2019, held at Caesars Palace, Las Vegas on 18-20 June, 2019.
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...GagandeepKaur872517
Convolutional Neural Networks (CNNs) and Long Short-Term Memory Networks (LSTMs) have made substantial advances in the domains of computer vision and speech recognition in recent years. These deep learning architectures have shown exceptional ability in a variety of tasks. Evaluating the performance of such models is critical for comprehending their efficacy and directing future developments. In this study, we undertake a thorough comparison utilizing two essential evaluation metrics: Root Mean Square Error (RMSE) and Mean Average Precision (MAP). Our research intends to give light on the applicability of these metrics for analyzing the performance of CNNs and LSTMs, as well as their strengths and limitations.
The combination of modern Reinforcement Learning and Deep Learning approaches holds the promise of making significant progress on challenging applications requiring both rich perception and policy-selection. The Arcade Learning Environment (ALE) provides a set of Atari games that represent a useful benchmark set of such applications. A recent breakthrough in combining model-free reinforcement learning with deep learning, called DQN, achieves the best realtimeagentsthusfar. Planning-based approaches achieve far higherscores than the best model-free approaches, but they exploit inf ormation that is not available to humanplayers,andtheyareordersofmagnitudeslowerthanneededforreal-time play. Our main goal in this work is to build a better real-time Atari game playing gentthan DQN.The central idea is tousethes low planning-basedagent stoprovide training data for a deep-learning architecture capable of real-time play. We proposed new agents based on this idea and show that they outperform DQN.
Similar to Deep RL for Autonomous Driving exploring applications Cognitive vehicles 2019 (20)
"Trans Failsafe Prog" on your BMW X5 indicates potential transmission issues requiring immediate action. This safety feature activates in response to abnormalities like low fluid levels, leaks, faulty sensors, electrical or mechanical failures, and overheating.
Things to remember while upgrading the brakes of your carjennifermiller8137
Upgrading the brakes of your car? Keep these things in mind before doing so. Additionally, start using an OBD 2 GPS tracker so that you never miss a vehicle maintenance appointment. On top of this, a car GPS tracker will also let you master good driving habits that will let you increase the operational life of your car’s brakes.
Ever been troubled by the blinking sign and didn’t know what to do?
Here’s a handy guide to dashboard symbols so that you’ll never be confused again!
Save them for later and save the trouble!
Comprehensive program for Agricultural Finance, the Automotive Sector, and Empowerment . We will define the full scope and provide a detailed two-week plan for identifying strategic partners in each area within Limpopo, including target areas.:
1. Agricultural : Supporting Primary and Secondary Agriculture
• Scope: Provide support solutions to enhance agricultural productivity and sustainability.
• Target Areas: Polokwane, Tzaneen, Thohoyandou, Makhado, and Giyani.
2. Automotive Sector: Partnerships with Mechanics and Panel Beater Shops
• Scope: Develop collaborations with automotive service providers to improve service quality and business operations.
• Target Areas: Polokwane, Lephalale, Mokopane, Phalaborwa, and Bela-Bela.
3. Empowerment : Focusing on Women Empowerment
• Scope: Provide business support support and training to women-owned businesses, promoting economic inclusion.
• Target Areas: Polokwane, Thohoyandou, Musina, Burgersfort, and Louis Trichardt.
We will also prioritize Industrial Economic Zone areas and their priorities.
Sign up on https://profilesmes.online/welcome/
To be eligible:
1. You must have a registered business and operate in Limpopo
2. Generate revenue
3. Sectors : Agriculture ( primary and secondary) and Automative
Women and Youth are encouraged to apply even if you don't fall in those sectors.
What Exactly Is The Common Rail Direct Injection System & How Does It WorkMotor Cars International
Learn about Common Rail Direct Injection (CRDi) - the revolutionary technology that has made diesel engines more efficient. Explore its workings, advantages like enhanced fuel efficiency and increased power output, along with drawbacks such as complexity and higher initial cost. Compare CRDi with traditional diesel engines and discover why it's the preferred choice for modern engines.
𝘼𝙣𝙩𝙞𝙦𝙪𝙚 𝙋𝙡𝙖𝙨𝙩𝙞𝙘 𝙏𝙧𝙖𝙙𝙚𝙧𝙨 𝙞𝙨 𝙫𝙚𝙧𝙮 𝙛𝙖𝙢𝙤𝙪𝙨 𝙛𝙤𝙧 𝙢𝙖𝙣𝙪𝙛𝙖𝙘𝙩𝙪𝙧𝙞𝙣𝙜 𝙩𝙝𝙚𝙞𝙧 𝙥𝙧𝙤𝙙𝙪𝙘𝙩𝙨. 𝙒𝙚 𝙝𝙖𝙫𝙚 𝙖𝙡𝙡 𝙩𝙝𝙚 𝙥𝙡𝙖𝙨𝙩𝙞𝙘 𝙜𝙧𝙖𝙣𝙪𝙡𝙚𝙨 𝙪𝙨𝙚𝙙 𝙞𝙣 𝙖𝙪𝙩𝙤𝙢𝙤𝙩𝙞𝙫𝙚 𝙖𝙣𝙙 𝙖𝙪𝙩𝙤 𝙥𝙖𝙧𝙩𝙨 𝙖𝙣𝙙 𝙖𝙡𝙡 𝙩𝙝𝙚 𝙛𝙖𝙢𝙤𝙪𝙨 𝙘𝙤𝙢𝙥𝙖𝙣𝙞𝙚𝙨 𝙗𝙪𝙮 𝙩𝙝𝙚 𝙜𝙧𝙖𝙣𝙪𝙡𝙚𝙨 𝙛𝙧𝙤𝙢 𝙪𝙨.
Over the 10 years, we have gained a strong foothold in the market due to our range's high quality, competitive prices, and time-lined delivery schedules.
Why Is Your BMW X3 Hood Not Responding To Release CommandsDart Auto
Experiencing difficulty opening your BMW X3's hood? This guide explores potential issues like mechanical obstruction, hood release mechanism failure, electrical problems, and emergency release malfunctions. Troubleshooting tips include basic checks, clearing obstructions, applying pressure, and using the emergency release.
Symptoms like intermittent starting and key recognition errors signal potential problems with your Mercedes’ EIS. Use diagnostic steps like error code checks and spare key tests. Professional diagnosis and solutions like EIS replacement ensure safe driving. Consult a qualified technician for accurate diagnosis and repair.
5 Warning Signs Your BMW's Intelligent Battery Sensor Needs AttentionBertini's German Motors
IBS monitors and manages your BMW’s battery performance. If it malfunctions, you will have to deal with an array of electrical issues in your vehicle. Recognize warning signs like dimming headlights, frequent battery replacements, and electrical malfunctions to address potential IBS issues promptly.
Fleet management these days is next to impossible without connected vehicle solutions. Why? Well, fleet trackers and accompanying connected vehicle management solutions tend to offer quite a few hard-to-ignore benefits to fleet managers and businesses alike. Let’s check them out!
What Does the PARKTRONIC Inoperative, See Owner's Manual Message Mean for You...Autohaus Service and Sales
Learn what "PARKTRONIC Inoperative, See Owner's Manual" means for your Mercedes-Benz. This message indicates a malfunction in the parking assistance system, potentially due to sensor issues or electrical faults. Prompt attention is crucial to ensure safety and functionality. Follow steps outlined for diagnosis and repair in the owner's manual.
Core technology of Hyundai Motor Group's EV platform 'E-GMP'Hyundai Motor Group
What’s the force behind Hyundai Motor Group's EV performance and quality?
Maximized driving performance and quick charging time through high-density battery pack and fast charging technology and applicable to various vehicle types!
Discover more about Hyundai Motor Group’s EV platform ‘E-GMP’!
Core technology of Hyundai Motor Group's EV platform 'E-GMP'
Deep RL for Autonomous Driving exploring applications Cognitive vehicles 2019
1. 1 0 0 % A u t o n o m o u s 1 0 0 % D r i v e r l e s s 1 0 0 % E l e c t r i c
E x p l o r i n g d e e p r e i n f o r c e m e n t l e a r n i n g
f o r r e a l - w o r l d a u t o n o m o u s d r i v i n g s y s t e m s
Victor Talpaert(1), Ibrahim Sobh(2), B Ravi Kiran(3), Patrick Mannion(4), Senthil Yogamani(5), Ahmad El-Sallab(2), Patrick Perez(7)
(1) U2IS, ENSTA ParisTech, Palaiseau, AKKA Technologies (2) Valeo Egypt, Cairo (3) Navya Labs, Paris (4) Galway-Mayo Institute of Technology,
Ireland (5) Valeo Vision Systems, Ireland (7) Valeo.ai, France
ravi.kiran@navya.tech
2. 2
Quick overview of Reinforcement learning
Taxonomy of autonomous driving tasks
History & Applications
Taxonomy of methods in RL today
Autonomous Driving Tasks
Which tasks require reinforcement learning
Which tasks require Inverse reinforcement learning
Role of simulators
Challenges in RL for Autonomous driving
Designing reward functions, Sparse rewards, scalar reward functions
Long tail effect, Sample efficient RL/IL
Moving from Simulation to reality
Validating, testing and safety
Conclusion
Current solutions in deployment in industry
Summary and open questions
OVERVIEW
3. 3
AUTONOMOUS DRIVING
Scene interpretation tasks :
•2D, 3D Object detection & tracking
•Traffic light/traffic sign
•Semantic segmentation
•Free/Drive space estimation
•Lane extraction
•HD Maps : 3D map, Lanes, Road topology
•Crowd sourced Maps
Fusions tasks:
•Multimodal sensor fusion
•Odometry
•Localization
•Landmark extraction
•Relocalization with HD Maps
Reinforcement learning tasks:
•Controller optimization
•Path planning and Trajectory optimization
•Motion and dynamic path planning
•High level driving policy : Highway,
intersections, merges
•Actor (pedestrian/vehicles) prediction
•Safety and risk estimation
4. 4
Learning what to do—how to map situations to actions optimally :
an optimal policy*
*Maximization of the expected value of the cumulative sum of a received scalar reward
WHAT IS REINFORCEMENT LEARNING
Environment
Sensor stream
Cameras/Lidars/Radars
RL Agent
States-to-Actions (Policy)
state
Actions
Real world / Simulator
Assessment
State-Action Evaluation
Other supervisor streamsreward
5. 5
MACHINE LEARNING AND AI TODAY
Supervised Learning
Given input examples (X, Y)
Learn implicit function approximation
f: X → Y
(X: images) to (Y: class label)
Empircal risk (loss function) : representing
the price paid for inaccurate prediction
Predictions do not affect environment
(Samples are IID)
Reinforcement learning
Given input state space , rewards, transitions
Learn a policy from state-to-actions
π: S → A
(S vehicle state, images, A : speed, direction)
Value function : long-term reward achieved
Predictions affect both what is observed as
well as future rewards
Requires Exploration, learning and interaction
6. 6
Vehicle state space
Geometry (vehicle size, occupancy grid)
Road topology and curvature
Traffic Signs and laws
Vehicle pose and velocity(v)
Configuration of obstacle (with poses/v)
Drivable zone
Actions
Continuous control : Speed, steering
Discrete control : up, down, left, right, …
High level (temporal abstraction) : slow down,
follow, exit route, merge
STATE SPACE, ACTIONS AND REWARDS
Reinforcement Learning for Autonomous Maneuvering in Highway Scenarios
A Survey of State-Action Representations for Autonomous Driving
Reward (positive/negative)
Distances to obstacles (real)
Lateral error from trajectory (real)
Longitudinal : Time to collision (real)
Percentage of car on the road (sim)
Variation in speed profile (real)
Actor/Agent intentions in the scene
Damage to vehicle/other agents (sim)
7. 7
ORIGINS OF REINFORCEMENT LEARNING
1950s
Optimal Control
Pontryagin
Bellman
1960s
Dynamic Programming
stochastic optimal control
Richard Bellman
A History of Reinforcement Learning - Prof. A.G. Barto
1930s-70s
Trial/Error Learning
Psychology, Woodworth
Credit Assignment Minsky
Least Mean Squares (LMS)
Widrow-Gupta
Learning Automata
K-armed bandits
1990s
Q-Learning, Neuro-DP
(DP+ANNs)
Bertsekas/Tsitsiklis
1980s
Temporal Difference
R. Sutton Thesis
2006s
Monte-Carlo Tree Search
for RL on Game tree for Go
Rémi Coulom & others
2015-2019
AlphaGo,AlphaZero
MCTS+DeepRL
Go Chess Shogi
DeepMind
2005s
Neural Fitted Q Iteration
Martin Riedmiller
2015
Playing Atari with Deep
Reinforcement Learning
Deepmind Mnih et al.
2000s
Policy Gradient Methods
Sutton et al.
2016
Asynchronous Deep RL
methods A2C, A3C
Deepmind Mnih et al.
2014
Deterministic Policy
Gradient Algorithms
David Silver et al.
AlphaStar
OpenAI Dota
Adaptive signal processing
Stochastic approximation theory
Animal Psychology and neuroscience
Robotics and Control theory
8. 8
TERMINOLOGIES
Reinforcement Learning
Model Free (P and R unknown)
Prediction : MonteCarlo(MC),
TimeDifference(TD)
Control : MC control step
Q Learning
* Model Based (P & R known)
Control : Policy/Value Iteration
Prediction : Policy Optimization
Off policy methods
Learn from observations
* On policy methods
Learning using current policy
Exploitation
Learn Policy given
(P, R, Value function)
Exploration
Learning (P, R, Policy) using
current policy
Prediction
Evaluation of Policy
Control
Infer optimal policy
(policy/value iteration)
Action spaces
Continous (vehicle controller)
Discrete (Go, Chess)
Markovian Decisions
Processes
(MDP, POMPD)
assumption
9. 9
State-action map as supervised learning
Directly map (Inputs/states) TO (Outputs/Actions) control and
ignore IID assumption, no more a sequential decision process.
Also known as end-to-end learning since sensor streams are
directly mapped to control
Issues :
Agent mimics expert behavior at the danger of not recovering
from unseen scenarios such as unseen driver behavior, vehicle
orientations, adversarial behavior of agents (overfits expert)
Poorly defined reward functions cause poor exploration
Requires huge No. (>30M samples) of human expert samples
Improvements :
Heuristics to improve data collection in corner cases (Dagger)
Imitation is efficient in practice and still an alternative
BEHAVIOR AL CLONING/ IMITATION LEARNING (IL) ≠RL
DAGGER : A Reduction of Imitation Learning andStructured Prediction to No-Regret Online Learning
Hierarchical Imitation and Reinforcement Learning
1986 2015
11. 11
Learning vehicle controllers
For well defined tasks : Lane following, ACC classical solutions
(MPC) are good
Tuning/choosing better controllers based on vehicle and state
dynamics is where RL can be impactful
ACC and braking assistance
Path planning and trajectory optimization
Choose path that minimizes certain cost function
• Lane following, Jerk minimizer
Actor (pedestrian/vehicle) behavior prediction
Decision making in complex scenarios:
Highways driving : Large space of obstacle configurations,
translation/orientations/velocity, Rule based methods fail
Negotiating intersections : Dynamic Path Planning
Merge into traffic, Split out from traffic
MODERN DAY REINFORCEMENT LEARNING APPLICATIONS
Planning algorithms, Steven Lavalle
Real-time motion planning
12. 12
Inverse RL or Inverse Optical control
Given States, Action space and Roll-outs from Expert policy,
Mode of the environment (State dynamics)
Goal : Learn reward function, Then learn a new policy
Challenges : not well defined, tough to evaluate optimal reward
Applications : Predicting pedestrian, vehicles behavior on road,
Basic Lane following and obstacle avoidance
INVERSE REINFORCEMENT LEARNING APPLICATIONS
it is commonly assumed that the purpose of observation is to
learn policy, i.e. a direct representation of mapping from states
to actions. We propose instead to recover the experts reward
function and use this to generate desirable behavior. We
suggest that the reward function offers much more
parsimonious description of behavior. After all the entire field
of RL is founded on the presupposition that the reward
function, rather than the policy is the most succint, robust and
transferable definition of the task.
Algorithms for Inverse Reinforcement Learning, Ng Russel 2000
DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents
13. 13
Where do rewards come from ?
Simulations (low cost to high cost based on
dynamics and details required)
Large sample complexity
Positive rewards without negative rewards can have
dangerous consequences
Real World (very costly, and dangerous when agent
requires to explore)
Temporal abstraction
Credit assignment and Exploration-Exploitation
Dilemma
Cobra effect : RL algorithms are blind maximizers of
expected reward
Other ways to learn a reward
Decompose the problem in multilple subproblems
which are easier.
Guide the training of problems with expert
supervision using imitation learning as initialization
Reduce the hype and embrace the inherent
problems with RL : Use Domain knowledge
CHALLENGES IN REWARD FUNCTION DESIGN
Cobra effect : The British government was concerned about the number
of venomous cobra snakes in Delhi. They offered a reward for every dead
cobra. Initially this was a success as large numbers of snakes were killed for
the reward. Eventually, however, enterprising people began to breed cobras
for the income. When the government became aware of this, the reward
program was scrapped, causing the cobra breeders to set the now-worthless
snakes free. As a result, the wild cobra population further increased. The
apparent solution for the problem made the situation even worse.
https://www.alexirpan.com/2018/02/14/rl-hard.html
14. 14
Hierarchy of tasks
Decompose the problem in multilple
subproblems which are easier.
Combining learnt policies
Guide training for complex
problems
Train on principle task, then subtasks
expert supervision using imitation learning as
initialization
CHALLENGES IN REWARD FUNCTION DESIGN
https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning/
Composing Meta-Policies for Autonomous Driving Using Hierarchical Deep Reinforcement Learning
15. 15
Rare and adversarial scenarios are
difficult to learn
Core issue with safe deployment of autonomous driving systems
Models perform well for the average case but scale poorly due
to low frequency, as well as sparse rewards
Hairpin bends, U-Turns
Rare with difficult to model state space dynamics
Create simpler small sample based
models blended with average case model
cures symptom not the disease
LONG TAIL DRIVER POLICY
Drago Anguelov (Waymo) - MIT Self-Driving Cars
17. 17
CHALLENGES SIMULATION -REALIT Y GAP
H a n d l i n g d o m a i n t r a n s f e r
• How to create a simulated environment which both
faithfully emulates the real world and allows the agent in
the simulation to gain valuable real-world experience?
• Can we map Real world images to Simulation ?
18. 18
CHALLENGES SIMULATION -REALIT Y GAP
L e a r n i n g a g e n e r a t i v e m o d e l o f r e a l i t y
https://worldmodels.github.io/
World models enable agents to construct latent
space representations of the dynamics of the
world, while building/learning a robust
control/actuator module over this
representation.
19. 19
Safe policies for autonomous agent
SafeDAgger : safety policy that learns to predict the error
made by a primary policy w.r.t reference policy.
Define a feasible set of core safe state spaces that can be
increamentally grown with explorations
Reproducible (code) on benchmark
variance intrinsic to the methods hyperparameters init.
Cross-validation for RL is not well defined as opposed to
supervised learning problems
CHALLENGES: SAFET Y AND REPRODUCIBILIT Y
Future standardized benchmarks
Evaluating autonomous vehicle control algorithms even
before agent leaves for real world testing.
NHTSA-inspired pre-crash scenarios : Control loss
without previous action, Longitudinal control after
leading vehicle’s brake, Crossing traffic running a red
light at an intersection, and many others
Inspiration from the Aeronautics community on risk
Carla Challenge 2019
20. 20
DRLAD : M ODERN DAY DEPLOYMENTS
Agent maximizes the reward of distance
travelled before intervention by a safety driver.
Options Graph :
Recovering from a
Trajectory Perturbation
Future Trajectory Prediction on
Logged Data
Robust Imitation learning using perturbations and simulated expert
variations and aumented imitation loss function
https://sites.google.com/view/waymo-learn-to-drive/
21. 21
How to design rewards ?
How should the problem be decomposed to simplify learning an policy?
How to train in different levels of simulations (efficiently)?
How to handle long tail cases, especially risk intensive cases
How to intelligently perform domain change from simulation to reality ?
Can we use imitation to solve the problem before switching to
Reinforcement Learning?
How can we learning in a multi-agent setup to scale up learning?
CONCLUSION
23. 23
WHERE IS THE HYPE ON DEEP RL
Hypes are highly non-stationary
24. 24
Reinforcement Learning: An Introduction Sutton 2018 [book]
David Silver’s RL Course 2015 [link]
Berkeley Deep Reinforcement Learning [Course]
Deep RL Bootcamp lectures Berkeley [Course]
Reinforcement learning and optimal control : D P Bertsekas 2019 [book]
LECTURES AND SOURCES
25. 25
World Models Ha, Schimdhuber NeurIPS 2018 .
Jianyu Chen, Zining Wang, and Masayoshi Tomizuka.
“Deep Hierarchical Reinforcement Learning for
Autonomous Driving with Distinct Behaviors”. 2018
IEEE Intelligent Vehicles Symposium (IV).
Peter Henderson et al. “Deep Reinforcement
Learning That Matters”. In: (AAAI-18), 2018.
Andrew Y Ng, Stuart J Russell, et al. “Algorithms for
inverse reinforcement learning
Daniel Chi Kit Ngai and Nelson Hon Ching Yung. “A
multiple-goal reinforcement learning method for
complex vehicle overtaking maneuvers”
Stephane Ross and Drew Bagnell. “Efficient
reductions for imitation learning”. In: Proceedings of the
thirteenth international conference on artificial
intelligence and statistics. 2010
Learning to Drive using Inverse Reinforcement
Learning and Deep Q-Networks, Sahand Sharifzadeh
et al.
REFERENCES
Shai Shalev-Shwartz, Shaked Shammah, and Amnon
Shashua. Safe, multi-agent, reinforcement learning for
autonomous driving 2016.
Learning to Drive in a Day, Alex Kendall et al. 2018
Wayve
ChauffeurNet: Learning to Drive by Imitating the Best
and Synthesizing the Worst
StreetLearn, Deepmind google
A Systematic Review of Perception System and
Simulators for Autonomous Vehicles Research [pdf]
Meta-Sim: Learning to Generate Synthetic Datasets
[link][pdf] Sanja Fidler et al.
Deep Reinforcement Learning in the Enterprise:
Bridging the Gap from Games to Industry 2017 [link]
28. 28
Learning from Demonstrations (LfD)/Imitation/Behavioral
Cloning demonstrations are hard to collect
Measure the divergence between your expert and the current policy
Give priority in a replay buffer
Iteratively collect samples (DAgger)
Hierarchical Imitation reduce sample complexity by data aggregation by organizing the action
spaces in a hierarchy
CHALLENGES IMITATION LEARNING
Source
improving
diversity of
steering angle