Simulation To Reality:
Reinforcement Learning For Autonomous
Vehicles
Donal Byrne
October 13th, 2019
Donal Byrne
dbyrne6@jaguarlandrover.com
https://www.linkedin.com/in/don
al-byrne-ai/
Who am I?
● Practical Talk
● Understand what RL is
● How can it be used
● Using RL in the real world
Objective
Agenda
Intro AD Case Study
How To Get
Started
● What is RL
● Where it can be applied
● Identifying the problem
● Designing the agent
● Training
● Simulation to reality
● Simple project
● Learning resources
● Libraries
Intro
Intro
What is Reinforcement Learning?
● Originates from Behaviourism
● Learn the best action to take,
given a specific scenario
● Can be applied to a wide
range of problems
Reinforcement Learning
RewardState
Action
Environment
Agent
RL Lifecycle
Gaming:
● Atari Games - 2013
● Go - 2015
● Dota2 - 2018
● AlphaStar - 2019
Live Use Cases:
● Robotics/Manufacturing
● Medical
● Finance
● Autonomous Driving
● Education
● Resource Management
Use Cases
Intro
Case Study
RL For Autonomous Driving
Project Steps
1. Identify your problem
Autonomous Driving: Motion Control
Should we use RL?
● Can it be optimized/learned?
● Can you explain the goal simply?
● Will the agent have enough
information?
● Can you simulate it?
● Is there a better solution?
Project Steps
1. Identify your problem
2. Reward
What is good driving?
Can you explain this in a single sentence?
Good
Bad
Accuracy Smoothness Adaptability
What about...
How to design a reward
Find its simplest form
Base on outcome, not
method
Shaping
Project Steps
1. Identify your problem
2. Reward
3. State Space
State Information
Key Info:
● Position on the road
● Current speed
● Angle of the steering
wheel
● Velocity (yaw, pitch, roll)
Concerns:
● How much data is being
taken in?
● What format ?
● Correct credit
assignment
● Is this info generic?
● Will it reflect the real
environment
● Should it have noise or
latency
Learning Environment
Open AI Gym
Unity ML Agents
• Where the agent will live and learn
• Will produce all state features
• Bespoke or Prebuilt
Criteria
• Provide all required state features
• Run at 10x real time speed
• Utilise GPU or Parallel Processing
Project Steps
1. Identify your problem
2. Reward
3. State Space
4. Algorithm
Key Factors:
● How does it learn?
● Is it sample efficient?
● Does it scale?
● Does it require a model of it
world
● Curiosity driven
● Probabilistic vs Deterministic
A non-exhaustive, but useful taxonomy of algorithms
in modern RL - Spinning Up In Deep RL
The Brain
What Algorithm Should I Choose?
It depends…..
3 Requirements For Practical RL Algos:
• Sample efficient
• Learn from previous experiences (Off Policy)
• Robust to hyper parameters and environment
Algorithm for
Autonomous Driving
Then:
Deep Deterministic Policy Gradient (DDPG)
• State of the art (at the time….)
• Deterministic
• Sample Efficient -
• Off Policy -
• Robust -
Now:
Soft Actor Critic (SAC)
• Improves upon DDPG’s short comings
• Real World Robotics
• Non-Deterministic
Good place to start:
SAC:
https://github.com/tensorflow/agents/blob/master/tf_agents/col
abs/7_SAC_minitaur_tutorial.ipynb
PPO:
https://github.com/tensorflow/agents/tree/master/tf_agents/age
nts/ppo
How To Choose An Algorithm
• Identify what is critical
• Find papers or examples of
similar problems
• Quickly experiment with high
level libraries
• Try with simple toy environments
Rlkit: https://github.com/vitchyr/rlkit
TF Agents: https://github.com/tensorflow/agents
Project Steps
1. Identify your problem
2. Reward
3. State Space
4. Algorithm
5. Training
Training
Training plan:
● Complex enough to learn how to
generalize
● Not so hard that the agent can’t
succeed
● Reset the agent at random
locations on the track
Curriculum Based Training:
● Introduce small tasks one at a
time
● The agent learns a task and then
builds upon it with the next task
● Meta Learning
You Are Here
Things Will Go Wrong….
• When deep learning fails, it fails
silently
• Parameters are hard to find
• Can get stuck in a local optima
• Catastrophic forgetting
• Exploration / Exploitation
• Cost: Time / Money
Training the agent
Fully Trained Agent
Project Steps
1. Identify your problem
2. Reward
3. State Space
4. Algorithm
5. Training
6. Evaluation
Is your agent learning?
After training comes evaluation
testing Tracks, Speeds and Vehicles
Use KPIs and Metrics that are not
just based on your reward function
Track every run, make 1 change at a
time!
Validation Experiments
Project Steps
1. Identify your problem
2. Reward
3. State Space
4. Algorithm
5. Training
6. Evaluation
7. Simulation To Reality
How to go from simulation to
reality?
Good Parenting
• Algorithm is important for learning a task
• For moving to reality, how it is trained is
more important
• Nurture vs Nature
Key Criteria
• Generalizable Reward
• Independent State Space
• Randomized & Noisy Training
• Mix Of Challenges During Training
Additional considerations
Effect Of Memory
1. Adding a LSTM head to the agent
2. Allows the agent to know the rate of
change in state values
3. Greater capacity to generalize
Additional Training Info
1. Arbitrary information given to value
network
2. Learns to evaluate good actions
faster
3. Learns optimal policy faster
Intro
Reinforcement Learning
How to get started
● Learn the basics
of how RL works
● Deep Mind
Lectures
● OpenAI Spinning
Up
● Take some small
toy examples
● Train some
prebuilt agents
● See what works
● TF Agents
● OpenAI Gym
● Now you have some
intuition
● Go through the project
steps
● Build a custom
environment
● Unity ML-Agents
Go over the
theory
Get your hands
dirty
Build your Proof of
Concept
Where to get started
Intro
Final Thoughts
Should you use RL?
● Can achieve incredible
results
● Capable of finding the
optimal solution to a wide
range of problems
● Technology is growing
rapidly providing better and
simpler solutions
● Not quite there yet
● Can be very expensive and
time consuming
● Not a silver bullet
Thank You!

Simulation To Reality: Reinforcement Learning For Autonomous Driving

  • 1.
    Simulation To Reality: ReinforcementLearning For Autonomous Vehicles Donal Byrne October 13th, 2019
  • 2.
  • 3.
    ● Practical Talk ●Understand what RL is ● How can it be used ● Using RL in the real world Objective
  • 4.
    Agenda Intro AD CaseStudy How To Get Started ● What is RL ● Where it can be applied ● Identifying the problem ● Designing the agent ● Training ● Simulation to reality ● Simple project ● Learning resources ● Libraries
  • 5.
  • 6.
    ● Originates fromBehaviourism ● Learn the best action to take, given a specific scenario ● Can be applied to a wide range of problems Reinforcement Learning
  • 8.
  • 9.
    Gaming: ● Atari Games- 2013 ● Go - 2015 ● Dota2 - 2018 ● AlphaStar - 2019 Live Use Cases: ● Robotics/Manufacturing ● Medical ● Finance ● Autonomous Driving ● Education ● Resource Management Use Cases
  • 11.
    Intro Case Study RL ForAutonomous Driving
  • 12.
  • 13.
    Autonomous Driving: MotionControl Should we use RL? ● Can it be optimized/learned? ● Can you explain the goal simply? ● Will the agent have enough information? ● Can you simulate it? ● Is there a better solution?
  • 14.
    Project Steps 1. Identifyyour problem 2. Reward
  • 15.
    What is gooddriving? Can you explain this in a single sentence? Good Bad Accuracy Smoothness Adaptability What about...
  • 16.
    How to designa reward Find its simplest form Base on outcome, not method Shaping
  • 17.
    Project Steps 1. Identifyyour problem 2. Reward 3. State Space
  • 18.
    State Information Key Info: ●Position on the road ● Current speed ● Angle of the steering wheel ● Velocity (yaw, pitch, roll) Concerns: ● How much data is being taken in? ● What format ? ● Correct credit assignment ● Is this info generic? ● Will it reflect the real environment ● Should it have noise or latency
  • 19.
    Learning Environment Open AIGym Unity ML Agents • Where the agent will live and learn • Will produce all state features • Bespoke or Prebuilt Criteria • Provide all required state features • Run at 10x real time speed • Utilise GPU or Parallel Processing
  • 22.
    Project Steps 1. Identifyyour problem 2. Reward 3. State Space 4. Algorithm
  • 23.
    Key Factors: ● Howdoes it learn? ● Is it sample efficient? ● Does it scale? ● Does it require a model of it world ● Curiosity driven ● Probabilistic vs Deterministic A non-exhaustive, but useful taxonomy of algorithms in modern RL - Spinning Up In Deep RL The Brain
  • 24.
    What Algorithm ShouldI Choose? It depends….. 3 Requirements For Practical RL Algos: • Sample efficient • Learn from previous experiences (Off Policy) • Robust to hyper parameters and environment
  • 25.
    Algorithm for Autonomous Driving Then: DeepDeterministic Policy Gradient (DDPG) • State of the art (at the time….) • Deterministic • Sample Efficient - • Off Policy - • Robust - Now: Soft Actor Critic (SAC) • Improves upon DDPG’s short comings • Real World Robotics • Non-Deterministic
  • 26.
    Good place tostart: SAC: https://github.com/tensorflow/agents/blob/master/tf_agents/col abs/7_SAC_minitaur_tutorial.ipynb PPO: https://github.com/tensorflow/agents/tree/master/tf_agents/age nts/ppo How To Choose An Algorithm • Identify what is critical • Find papers or examples of similar problems • Quickly experiment with high level libraries • Try with simple toy environments Rlkit: https://github.com/vitchyr/rlkit TF Agents: https://github.com/tensorflow/agents
  • 28.
    Project Steps 1. Identifyyour problem 2. Reward 3. State Space 4. Algorithm 5. Training
  • 29.
    Training Training plan: ● Complexenough to learn how to generalize ● Not so hard that the agent can’t succeed ● Reset the agent at random locations on the track Curriculum Based Training: ● Introduce small tasks one at a time ● The agent learns a task and then builds upon it with the next task ● Meta Learning
  • 30.
  • 31.
    Things Will GoWrong…. • When deep learning fails, it fails silently • Parameters are hard to find • Can get stuck in a local optima • Catastrophic forgetting • Exploration / Exploitation • Cost: Time / Money
  • 32.
  • 33.
  • 34.
    Project Steps 1. Identifyyour problem 2. Reward 3. State Space 4. Algorithm 5. Training 6. Evaluation
  • 35.
    Is your agentlearning? After training comes evaluation testing Tracks, Speeds and Vehicles Use KPIs and Metrics that are not just based on your reward function Track every run, make 1 change at a time!
  • 36.
  • 37.
    Project Steps 1. Identifyyour problem 2. Reward 3. State Space 4. Algorithm 5. Training 6. Evaluation 7. Simulation To Reality
  • 38.
    How to gofrom simulation to reality? Good Parenting • Algorithm is important for learning a task • For moving to reality, how it is trained is more important • Nurture vs Nature Key Criteria • Generalizable Reward • Independent State Space • Randomized & Noisy Training • Mix Of Challenges During Training
  • 39.
    Additional considerations Effect OfMemory 1. Adding a LSTM head to the agent 2. Allows the agent to know the rate of change in state values 3. Greater capacity to generalize Additional Training Info 1. Arbitrary information given to value network 2. Learns to evaluate good actions faster 3. Learns optimal policy faster
  • 40.
  • 41.
    ● Learn thebasics of how RL works ● Deep Mind Lectures ● OpenAI Spinning Up ● Take some small toy examples ● Train some prebuilt agents ● See what works ● TF Agents ● OpenAI Gym ● Now you have some intuition ● Go through the project steps ● Build a custom environment ● Unity ML-Agents Go over the theory Get your hands dirty Build your Proof of Concept Where to get started
  • 42.
  • 43.
    Should you useRL? ● Can achieve incredible results ● Capable of finding the optimal solution to a wide range of problems ● Technology is growing rapidly providing better and simpler solutions
  • 44.
    ● Not quitethere yet ● Can be very expensive and time consuming ● Not a silver bullet
  • 45.

Editor's Notes

  • #3 College, Work, Passions
  • #4 Practical talk, understand what RL is, how it can be used, how it is currently being used, how to go about starting your own RL project
  • #7 Type of machine learning that mimics how humans and animals learn. Learn behaviours from the outcomes of past experiences Very generalized method of learning. This is the benefit to RL, it can generalize
  • #8 Best example, training your dog When you and trying to teach a dog good behaviour what do you do? Well you are going to have an idea of what that good behaviour is (learning to sit on command) Every time they close to the desire behaviour, you give them a treat, or a reward But when they exhibit bad behaviour, like eating your shoes, you are going give out to them Over time, they learn the behaviour with the max positive outcome
  • #9 What do you think is the most important part here? why?
  • #10 Very popular and exciting Achieved a lot of amazing milestones, mainly in gaming Atari, Go, Dota, Hide and Seek Started to be used in real world use cases. NLP, Recomender Systems, Manufacturing, Engergy Optimisation
  • #12 Now that we know a little bit about what RL is lets go through what is actually involved in applying this to a real problem Few months ago Asanka was asked to look into RL and its potential for autonomous driving This next section is going to go through how that project unfolded and the process the team went through
  • #13 There are several steps involved in taking an RL project from start to finish, Going to go through these one by one and Share the lessons we learned a long the way
  • #14 What are we trying to solve, should we use RL? Can it be optimized? RL is an optimization technique Can you explain the simply? Is there info information? Can you simulate it Is there a better solution? Our problem was controlling a vehicle, there were good solutions out there, but we wanted to see what RL was capable of. The key benefit is that its generalizable. Could we teach a controller to learn the essence of good driving, and apply it to any vehicle or scenario? ADAS stack, where the controller sits
  • #15 Arguably the most important part.
  • #16 A perfectly built AI will fail if it isn’t being rewarded correctly When we are creating our reward, we want to make it as simple as possible Can we explain good driving in a single sentence? We all know what it is Want things like accuracy, smoothness and adaptability
  • #17 If the task is very complex, then break it up. We broke it up into lon and lat control Simple rewards make it easy to assign credit. Credit assignment can be a big problem, if your agent is being rewarded or punished it must be able To associate the behaviour with it Design based on the outcome, not the method: Cobra Story Finally, what shape will your data take? Is it sparse or continuos? Scale matters, scale everything, easier to weight the rewards based on scale Spent a long time iteratively desgning the reward, key to a good agent
  • #19 Much like the reward function, if the agent doesn’t have enough info, it wont be able to learn. We talked about credit assignment before. If you are training your dog and they pee on the rug, but you only find out hours later Giving out to them wont help, they don’t know why your upset, they wont be able to connect the two events We need to give our AI enough key info so that it can identify what good behaviour is based on the rewards its getting at each state Spent a lot of time experimenting with what were the right state parameters Practical conerns, cant be too big or too small Is it generic to the problem Is it noisy
  • #20 The state space will be represented through a learning environment, usually some sort of simulation. There are some good environments to get start in getting familiar with RL like gym But you will need to build a custom environment for your problem. When building, take into account some practical concerns: Capable of replicating the real environment as closely as possible Run faster than real time, takes a lot of experiences to train these agents, youll want to be training at about 10x When we were building ours we were limited by the environment and could only run at 0.3x speed, it took 10 hours just to get about 4 hours of driving time.
  • #22 This is an example of the environment we used. High fidelity, but very slow, very restricting this took 70% of the time
  • #23 Gonna get a little technical
  • #24 Theres a lot of options out there Not going to go into all of them, but the main ones are model and model free. Most modern methods are Actor Critic based (driving student and instructor, but both learning) Curiosity driven
  • #25 What algorithm should I choose? Dunno, shitty answer Depends…. 3 requirements for practical RL
  • #26 What we chose At the time it was SOTA, Meets 2/3 criteria Brittle to hyperparameters (nobs that you turn to make it work better) Spend 70% of the time tuning Better ways of doing it now, BOHB etc Would now use something more sophisticated like SAC
  • #27 Take a high level library Quickly experiment on simple problems and get some intuition about the algorithms Then scale up, find what works best for your problem We didn’t have TF Agents, so we had to hand build most implementations and test
  • #29 The fun part….
  • #30 So after you have sorted out all the other stuff, you’ve got a reward, a state space, some sort of simulator and even an algorithm that you are sure of You can now start training your real agent. Because this is still a relatively new field, the act of design these things, is a bit of a fine art There is no set path, this is changing, but for the moment, you need to use your intuition. The training environment needs to be complex enough that the agent can generalize, but simple enough to be capable of converging. Requires experience of lots of scenarios, curiosity, random resets, memory, curriculum based learning Lets go back to our project Phase 1, pick Reward, state and algorithm, Made a simple 2d environment, works great Now lets train it on a real simulator….
  • #31 Progress isn’t linear Big jump from phase 1 to phase 2
  • #32 Things are going to go wrong, very wrong Will this ever work? Silently failing Plateauing and forgetting When these happen, repeat the previous steps Simplify It takes a long time and training can be expensive Trainin for a small amount locally, then when most bugs are worked out, move to the cloud Supervised learning techniques does not directly transfer to RL Things like dropout and batch norm, which give great improvements to supervised networks, don’t provide much here
  • #33 Not gonna lie it looked like that for a while… Going from the pygame version to the real simulator was a big leap, and the agent struggled to grasp driving at this level
  • #34 After several iterations of working at this stage, we managed to get these results
  • #35 This all sounds great, but is it doing well??
  • #36 So now that you are actually getting some results, how do you know how good they are? You need to identify some metrics to base this on, not just how high is the reward Reward is relative, cant compare to other techniques. Reward can change over time Create a benchmark, or several and constantly compare results on these benchmarks During our project we had two validation tracks, 3 vehicles and different max speeds. Would run a new algorithms across all of these and compare the results to truly distinguish improvements Automate as much as you can, there will be a lot of tests Its very easy to get sucked into watching your agent training for hours
  • #39 So, we are finally come to taking the agent from the simulation to the real world This is undoubtedly the most difficult part of the process as this is the true test of how well the agent has learned to generalise The team put a lot of thought and discussion into what goes into teaching the agent not just how to succeed in the training track, but how to take what it had learned and apply it to knew scenarios. Writing the code correctly and using the best algorithms will get you a sophisticated agent that will solve difficult tasks, but generalization depends on how it was trained not how it was built Nature vs Nurture, Good AI requires good parenting
  • #40 Some other cool things that would improve this, but we didn’t have time to implement 1) improving memory by using LSTM cells 2) Arbitrary information given during training, but not used at inference Its like doing preparing for a test by doing a few open book tests