SlideShare a Scribd company logo
Reinforcement Learning
for Self-Driving Cars
Vinay Sameer Kadi and Mayank Gupta, with Prof. Jeff Schneider
Sponsored by Argo AI 1
Outline
• Introduction
• Experiments
• Results
• Future Work
2
Introduction
Setting up the problem
3
Problem Statement
• Train a self driving agent using
RL algorithms in simulation.
• To have an algorithm that can
be run on Argo’s driving logs.
• To aim for sample efficient
algorithms. An agent exploring the CARLA environment
4
Motivation – Why Reinforcement Learning?
• End-to-end system.
• Verifiable performance
through simulation.
• Behavior cloning is
capped by the expert’s
performance while RL
isn’t. If we can run it on one video log, we can run it on any video log!
Problem Setting
A short description of the set up
6
Problem Setting
• State space – Either encoded image, waypoints or manual
WP 0.4
Obstacle 1
Traffic Light 0
… …
7
Problem Setting
• State space – Either encoded image, waypoints or manual
• Action space – Speed and Steer (bounded and continuous)
• PID Controller – For low level control
• Test Scenario : Navigation with dynamic actors
8
Experiments
9
Decoupling the problem
Input Images
and data
from
simulator
State Space
construction
RL
algorithm
Reward Optimization
Which components need to be improved?
10
Decoupling the problem
Policy
Network
Reward Optimization
Input Images
and data
from
simulator
State Space
construction
Focusing solely on RL – Handcrafted Input
11
Previous Semester
• Used Soft Actor Critic (SAC)
• 8-dimensional state space
• Mean angle to next 5 waypoints
• Nearest obstacle distance and speed
• Vehicle speed and steer
• Distance from trajectory, goal and red light
• Reward
• Speed based reward
• Distance to trajectory
• Collision reward
12
Final trained agent using SAC
N-step SAC outperforms PPO
13
• Naïve SAC is not as good as PPO
• N-step SAC performs slightly better
than PPO
Decoupling the problem
State Space
construction
RL
algorithm
Reward Optimization
Input Images
and data
from
simulator
Focusing on representation– Imitation Learning
14
Decoupling the problem
Pretrained
model
Policy
Network
Reward Optimization
Input Images
and data
from
simulator
15
Combining progress in both
Learning By Cheating – Pretrained Model
• Move from manual state space to
image state space + waypoints
• We want to leverage an Imitation
learning based pretrainedmodel to
acceleratetraining.
• EssentiallytreatingLBC as the “image-
net model” for this task.
16
Channel visualization from output of conv layers of LBC’s Resnet-34 [1] trained to drive
[1] Chen et al.,“Learning by cheating”, Conference on Robot Learning, 2019
Experiments with Policy Networks
• Instead of a 2-Layer MLP to get actions like
previous experiments,we have the following
network for policy and value function.
• The initial channel mixing helps keep the
number of parameters small.
• Result : Only passes 1 test case of 25. (Straight Nav)
Input: bs x 512 x 5 x 12
Conv1x1 (bs x 64 x 5 x 12)
(bs x 32 x 3 x 10)
ReLU
Conv3x3
ReLU
Conv3x3 (bs x 16 x 1 x 8)
ReLU + flatten + concat(speed, steer, wp)
FC Layers
Mean and Variances
for Actions
17
ResNet-34 (Pretrained)
384 x 160 RGB
Experiments with Policy Networks
• Spatial Softmax to reduce dimensions.
• The idea was that RL performs best in low
dimensional state spaces.
• Result : Only passes 1 of 25 test cases. (~20000
reward)
• We will revisit this once we have an end-to-end
traininginstead of frozen nets
Input: bs x 512 x 5 x 12
Spatial Softmax (bs x 1024)
ReLU + flatten + concat(speed, steer, wp)
FC Layers
Mean and Variances
for Actions
18
ResNet-34 (Pretrained)
384 x 160 RGB
Experiments with Policy Networks
• Regular Conv layers into FC.
• Result : Only passes 3 test case of 25.
• ~10000 reward higher.Not much.
• Several other architectureswere also tried
yielding similar results.
Input: bs x 512 x 5 x 12
Conv3x3 (bs x 64 x 3 x 10)
(bs x 8 x 1 x 8)
ReLU
Conv3x3
ReLU + flatten + concat(speed, steer, wp)
FC Layers
Mean and Variances
for Actions
19
ResNet-34 (Pretrained)
384 x 160 RGB
Approaches so far
Pretrained
visual model
Policy
Network
Reward Optimization
Input Images
and data
from
simulator
Behavior
cloning
(LBC)
Auto
Encoder
Proposed
• Leverage the trained policy network (privileged agent)
21
Policy
Network
Reward Optimization
Visual Policy
Network
Input Images
and data
from
simulator
Behavior Cloning
State Space
construction
Input Images
and data
from
simulator
RL Network – 8 Dim Input
• Privileged agent
22
BC Network – Image + 5 dim
• Visual policy
23
Initial experiment settings
• Semantically segmented images
• Removed manual obstacle information
• Simple conv network to ensure a fair comparison
• Expert can be trained using off policy or on policy
• Took expert trained using PPO for easy comparison
24
Comparison of RL+BC against pure RL
25
• The “expert” RL agent used in the previous slides is
shown in Red. It is the best performingand fastest
to train.
• The BC agent is comparable to yellow curve (I).
• Our BEV agent trains much faster and achieves a
much higher success rate of 96% (24/25).
• Our Front view agent heavily outperforms the pre-
existing one with 92% success against 60-70%.
[5] T. Agarwal, “On-Policy Reinforcement Learning for Learning to Drive in Urban Settings“, Master's Thesis, Tech. Report, CMU-RI-TR-20-32, August, 2020
Performance of RL using 8-dim (A), Image + 8-dim (A+I) and
Image + 6-dim networks (I) from a former lab member’s work [5].
Qualitative results
26
Advantages of Proposed approach
27
• Unlike LBC, there is no requirement of expert (fully RL)
• Training of initial visual policy is very fast
• Visual policy be finetuned using RL
• The privileged information can be easily obtained using sensor
• Easy to train
• Incorporate priors
Advantages of Proposed approach
28
• Easy to transfer the policies obtained from manual state space
• But, can sometimes fail:
• So, can't do behavior cloning all the time
• Use it as a behavior policy with
visual policy as target and do RL
Improvements of Proposed approach
29
• Naïve Behavior cloning can miss important tail distribution:
• Traffic light scenarios which are relatively rare in a video
• Heuristics can be applied to handle those
• Ex: custom prioritized experience replay
Future Work
• Train on RGB images
• Remove the traffic light information from manual state space
• Test on different weathers
• Finetune Visual Policy using CURL [8] (RL + Auxiliary Task)
• Dense traffic scenarios
• Lane change scenarios
30[8] M. Laskin, A. Srinivas, P. Abbeel, CURL: Contrastive Unsupervised Representations for Reinforcement Learning, ICML - July, 2020
References
[1] Chen et al., “Learning by cheating”, CoRL2019.
[2] Prof Jeff Schneider’s RI Seminar Talk
[3] Liang, Xiaodan, et al. "Cirl: Controllable imitative reinforcement learning for vision-based self-
driving.“,ECCV,2018.
[4] Kendall, Alex, et al. "Learning to drive in a day.“, ICRA, IEEE,2019.
[5] Agarwal, et al. “Learning to Drive using Waypoints“, NeurIPS 2019 Workshop – ML4AD
[6] Hernandez-Garcia,J. Fernando, and Richard S. Sutton. "Understanding multi-step deep
reinforcement learning: A systematic study of the DQN target.“
[7] Hessel, et al. "Rainbow: Combining improvements in deep reinforcement learning." AAAI 2018.
[8]
31

More Related Content

Recently uploaded

Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 

Recently uploaded (20)

Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 

Featured

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
Alireza Esmikhani
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Project for Public Spaces & National Center for Biking and Walking
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
Erica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Saba Software
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
Simplilearn
 

Featured (20)

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
 

Rl for self driving- Fall 2020 MidSem presentation

  • 1. Reinforcement Learning for Self-Driving Cars Vinay Sameer Kadi and Mayank Gupta, with Prof. Jeff Schneider Sponsored by Argo AI 1
  • 4. Problem Statement • Train a self driving agent using RL algorithms in simulation. • To have an algorithm that can be run on Argo’s driving logs. • To aim for sample efficient algorithms. An agent exploring the CARLA environment 4
  • 5. Motivation – Why Reinforcement Learning? • End-to-end system. • Verifiable performance through simulation. • Behavior cloning is capped by the expert’s performance while RL isn’t. If we can run it on one video log, we can run it on any video log!
  • 6. Problem Setting A short description of the set up 6
  • 7. Problem Setting • State space – Either encoded image, waypoints or manual WP 0.4 Obstacle 1 Traffic Light 0 … … 7
  • 8. Problem Setting • State space – Either encoded image, waypoints or manual • Action space – Speed and Steer (bounded and continuous) • PID Controller – For low level control • Test Scenario : Navigation with dynamic actors 8
  • 10. Decoupling the problem Input Images and data from simulator State Space construction RL algorithm Reward Optimization Which components need to be improved? 10
  • 11. Decoupling the problem Policy Network Reward Optimization Input Images and data from simulator State Space construction Focusing solely on RL – Handcrafted Input 11
  • 12. Previous Semester • Used Soft Actor Critic (SAC) • 8-dimensional state space • Mean angle to next 5 waypoints • Nearest obstacle distance and speed • Vehicle speed and steer • Distance from trajectory, goal and red light • Reward • Speed based reward • Distance to trajectory • Collision reward 12 Final trained agent using SAC
  • 13. N-step SAC outperforms PPO 13 • Naïve SAC is not as good as PPO • N-step SAC performs slightly better than PPO
  • 14. Decoupling the problem State Space construction RL algorithm Reward Optimization Input Images and data from simulator Focusing on representation– Imitation Learning 14
  • 15. Decoupling the problem Pretrained model Policy Network Reward Optimization Input Images and data from simulator 15 Combining progress in both
  • 16. Learning By Cheating – Pretrained Model • Move from manual state space to image state space + waypoints • We want to leverage an Imitation learning based pretrainedmodel to acceleratetraining. • EssentiallytreatingLBC as the “image- net model” for this task. 16 Channel visualization from output of conv layers of LBC’s Resnet-34 [1] trained to drive [1] Chen et al.,“Learning by cheating”, Conference on Robot Learning, 2019
  • 17. Experiments with Policy Networks • Instead of a 2-Layer MLP to get actions like previous experiments,we have the following network for policy and value function. • The initial channel mixing helps keep the number of parameters small. • Result : Only passes 1 test case of 25. (Straight Nav) Input: bs x 512 x 5 x 12 Conv1x1 (bs x 64 x 5 x 12) (bs x 32 x 3 x 10) ReLU Conv3x3 ReLU Conv3x3 (bs x 16 x 1 x 8) ReLU + flatten + concat(speed, steer, wp) FC Layers Mean and Variances for Actions 17 ResNet-34 (Pretrained) 384 x 160 RGB
  • 18. Experiments with Policy Networks • Spatial Softmax to reduce dimensions. • The idea was that RL performs best in low dimensional state spaces. • Result : Only passes 1 of 25 test cases. (~20000 reward) • We will revisit this once we have an end-to-end traininginstead of frozen nets Input: bs x 512 x 5 x 12 Spatial Softmax (bs x 1024) ReLU + flatten + concat(speed, steer, wp) FC Layers Mean and Variances for Actions 18 ResNet-34 (Pretrained) 384 x 160 RGB
  • 19. Experiments with Policy Networks • Regular Conv layers into FC. • Result : Only passes 3 test case of 25. • ~10000 reward higher.Not much. • Several other architectureswere also tried yielding similar results. Input: bs x 512 x 5 x 12 Conv3x3 (bs x 64 x 3 x 10) (bs x 8 x 1 x 8) ReLU Conv3x3 ReLU + flatten + concat(speed, steer, wp) FC Layers Mean and Variances for Actions 19 ResNet-34 (Pretrained) 384 x 160 RGB
  • 20. Approaches so far Pretrained visual model Policy Network Reward Optimization Input Images and data from simulator Behavior cloning (LBC) Auto Encoder
  • 21. Proposed • Leverage the trained policy network (privileged agent) 21 Policy Network Reward Optimization Visual Policy Network Input Images and data from simulator Behavior Cloning State Space construction Input Images and data from simulator
  • 22. RL Network – 8 Dim Input • Privileged agent 22
  • 23. BC Network – Image + 5 dim • Visual policy 23
  • 24. Initial experiment settings • Semantically segmented images • Removed manual obstacle information • Simple conv network to ensure a fair comparison • Expert can be trained using off policy or on policy • Took expert trained using PPO for easy comparison 24
  • 25. Comparison of RL+BC against pure RL 25 • The “expert” RL agent used in the previous slides is shown in Red. It is the best performingand fastest to train. • The BC agent is comparable to yellow curve (I). • Our BEV agent trains much faster and achieves a much higher success rate of 96% (24/25). • Our Front view agent heavily outperforms the pre- existing one with 92% success against 60-70%. [5] T. Agarwal, “On-Policy Reinforcement Learning for Learning to Drive in Urban Settings“, Master's Thesis, Tech. Report, CMU-RI-TR-20-32, August, 2020 Performance of RL using 8-dim (A), Image + 8-dim (A+I) and Image + 6-dim networks (I) from a former lab member’s work [5].
  • 27. Advantages of Proposed approach 27 • Unlike LBC, there is no requirement of expert (fully RL) • Training of initial visual policy is very fast • Visual policy be finetuned using RL • The privileged information can be easily obtained using sensor • Easy to train • Incorporate priors
  • 28. Advantages of Proposed approach 28 • Easy to transfer the policies obtained from manual state space • But, can sometimes fail: • So, can't do behavior cloning all the time • Use it as a behavior policy with visual policy as target and do RL
  • 29. Improvements of Proposed approach 29 • Naïve Behavior cloning can miss important tail distribution: • Traffic light scenarios which are relatively rare in a video • Heuristics can be applied to handle those • Ex: custom prioritized experience replay
  • 30. Future Work • Train on RGB images • Remove the traffic light information from manual state space • Test on different weathers • Finetune Visual Policy using CURL [8] (RL + Auxiliary Task) • Dense traffic scenarios • Lane change scenarios 30[8] M. Laskin, A. Srinivas, P. Abbeel, CURL: Contrastive Unsupervised Representations for Reinforcement Learning, ICML - July, 2020
  • 31. References [1] Chen et al., “Learning by cheating”, CoRL2019. [2] Prof Jeff Schneider’s RI Seminar Talk [3] Liang, Xiaodan, et al. "Cirl: Controllable imitative reinforcement learning for vision-based self- driving.“,ECCV,2018. [4] Kendall, Alex, et al. "Learning to drive in a day.“, ICRA, IEEE,2019. [5] Agarwal, et al. “Learning to Drive using Waypoints“, NeurIPS 2019 Workshop – ML4AD [6] Hernandez-Garcia,J. Fernando, and Richard S. Sutton. "Understanding multi-step deep reinforcement learning: A systematic study of the DQN target.“ [7] Hessel, et al. "Rainbow: Combining improvements in deep reinforcement learning." AAAI 2018. [8] 31