SlideShare a Scribd company logo
1 of 62
Download to read offline
Driving Behavior for ADAS
and Autonomous Driving V
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
• An Auto-tuning Framework for AVs
• Adaptive Behavior Generation for AD using Deep RL with Compact Semantic
States
• Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse RL
• Zero-shot Deep RL Driving Policy Transfer for AVs based on Robust Control
• Risk-averse Behavior Planning for AD under Uncertainty
• Multimodal Trajectory Predictions for AD using Deep Convolutional Networks
• Exploring the Limitations of Behavior Cloning for AD
An Auto-tuning Framework for Autonomous Vehicles
• Many AD motion planners generate trajectories by optimizing a reward/cost functional.
• Designing and tuning a high-performance reward/cost functional for Level-4 autonomous
driving vehicles with exposure to different driving conditions is challenging.
• Traditionally, reward/cost functional tuning involves substantial human effort and time spent on
both simulations and road tests.
• As the scenario becomes more complicated, tuning to improve the motion planner
performance becomes increasingly difficult.
• To systematically solve this issue, develop a data-driven auto-tuning framework based on the
Apollo autonomous driving framework.
• The framework includes a rank-based conditional inverse reinforcement learning (IRL) alg., an
offline training strategy and an automatic method of collecting and labeling data.
• The auto-tuning framework has the advantages that make it suitable for tuning an autonomous
driving motion planner.
• compared to most IRL algorithms, training is efficient and capable to different scenarios.
• offline training strategy offers a safe way to adjust the parameters before public road testing.
• expert driving data and information about the surrounding environment are collected and
automatically labeled, which considerably reduces the manual effort.
• motion planner tuned by the framework is examined via both simulation and public road testing.
An Auto-tuning Framework for Autonomous Vehicles
Data-driven autonomous driving motion
planner on the Apollo platform.
Algorithm tuning loop for the motion planner in
the Apollo autonomous driving platform
An Auto-tuning Framework for Autonomous Vehicles
• The idea for learning the reward functional
includes two key parts: conditional
comparison and rank-based learning.
• Instead of comparing the expectation of
value functions of the expert demonstration
and optimal policy, compare the value
functions state by state.
• To accelerate the training process and
extend the coverage of corner cases,
sample random policies and compare
against the expert demonstration instead of
generating the optimal policy first, as in
policy gradient.
• The background (driving scenarios)
differences may impact the tuned reward
functional significantly.
Online and offline pipeline of the auto-tuning
frame- work by module.
An Auto-tuning Framework for Autonomous Vehicles
Feature Description Given A Trajectory Point
Given a path profile in a station-lateral
coordinate system, obstacles and predicted
moving trajectories are projected on the station-
time graph if there are any interactions with the
moving path of the ego car.
Then, the goal is to generate a speed profile on
the station- time graph that can safely avoid
obstacles and maintain smooth driving.
The optimal speed profile is generated by
optimizing the cost/reward functional, which
captures the trajectory smoothness, distance to
different obstacles, and path smoothness.
Application of speed profile generation inside EM plann
An Auto-tuning Framework for Autonomous Vehicles
Siamese network used in RC-IRL. The
value networks of both the human and
the sampled trajectories share the same
network parameter settings. The loss
function evaluates the difference between
the sampled data and the generated
trajectory via the value network outputs.
The value network inside the Siamese model is used to
capture driving behavior based on encoded features. The
network is a trainable linear combination of encoded
rewards at different times. The weight of the encoded
reward is a learnable time decay factor. The encoded
reward includes an input layer with 21 raw features and a
hidden layer with 15 nodes.
An Auto-tuning Framework for Autonomous Vehicles
• To validate the general performance in terms of conditional IRL, consider a general adversarial
network (GAN) for comparison.
• In GAN, each sampled trajectory is labeled as human driving (1) or random trajectory (0) and
trained with cross entropy loss.
• The network is hoped to distinguish human driving from a randomly generated trajectory.
• Since the GAN cannot distinguish different scenarios, the procedure compares the average
performance of human driving trajectories and that of randomly generated trajectories.
Simulation Results of the Motion Planner
• IEEE Intelligent Vehicles Symposium (IV), June 2018
• Making the right decision in traffic is a challenging task that is highly dependent on individual preferences as
well as the surrounding environment. Therefore it is hard to model solely based on expert knowledge.
• It uses Deep Reinforcement Learning to learn maneuver decisions based on a compact semantic state
representation.
• This ensures a consistent model of the environment across scenarios as well as a behavior adaptation
function, enabling on-line changes of desired behaviors without re-training.
• The input for the neural network is a simulated object list similar to that of Radar or Lidar sensors,
superimposed by a relational semantic scene description.
• The state & reward are extended by a behavior adaptation function and a parameterization respectively.
• With little expert knowledge and a set of mid-level actions, it can be seen that the agent is capable to adhere
to traffic rules and learns to drive safely in a variety of situations.
Adaptive Behavior Generation for Autonomous Driving using
Deep Reinforcement Learning with Compact Semantic States
Adaptive Behavior Generation for Autonomous Driving using
Deep Reinforcement Learning with Compact Semantic States
• A reinforcement learning process is commonly modeled as an MDP (S,A,R,δ,T)
where S is the set of states, A the set of actions, R : S×A×S → R the reward
function, δ : S × A → S the state transition model and T the set of terminal
states. At timestep i an agent in state s ∈S can choose an action a ∈A
according to a policy π and will progress into the successor state s′ receiving
reward r. This is defined as a transition t = (s, a, s′, r).
• The aim of reinforcement learning is to maximize the future discounted
return .
• A DQN uses Q-Learning to learn Q-values for each action given input state s
based on past transitions. The predicted Q-values of the DQN are used to
adapt the policy π and therefore change the agent’s behavior.
The initial traffic scene is transformed into a compact semantic state representation s and used as input for the RL agent. The
agent estimates the action a with the highest return (Q-value) and executes it, e.g., changing lanes. Afterwards a reward r is
collected and a new state s′ is reached. The transition (s, a, r, s′ ) is stored in the agent’s replay memory.
Adaptive Behavior Generation for Autonomous Driving using
Deep Reinforcement Learning with Compact Semantic States
Adaptive Behavior Generation for Autonomous Driving using
Deep Reinforcement Learning with Compact Semantic States
(a) The ego vehicle (blue) is driving on a two lane road. Five other
vehicles are in its sensor range. Based on a vehicle scope with Λlateral =
Λahead = Λbehind = 1 only four vehicles (green) are considered in the
semantic state representation.
(b) The relational grid resulting from the scene
of (a) with Λlateral = Λahead = Λbehind = 1. Moreover,
there is no adjacent lane to the right, which is
represented in the grid as well.
An example scene (a) is transformed to a relational grid (b) using a
vehicle scope Λ with Λlateral = Λahead = Λbehind = 1. The red vehicle, which is
in sensor range, is not represented in the grid. Since there is no
vehicle driving on the same lane in front of the ego vehicle, the
respective relational position in the grid is empty.
define a relational grid, centered at the ego
vehicle vego ∈V, The rows correspond to the
relational lane topology, whereas the columns
correspond to the vehicle topology on these lanes.
Adaptive Behavior Generation for Autonomous Driving using
Deep Reinforcement Learning with Compact Semantic States
• For the input state representation,adapt the ontology- based concept focusing on relations
with other traffic participants as well as the road topology.
• Design the state representation to use high level preprocessed sensory inputs such as object
lists provided by common Radar and Lidar sensors and lane boundaries from visual sensors or
map data. To generate the desired behavior the reward is comprised of different factors with
varying priorities.
• A traffic scene τ is described by a semantic entity- relationship model, consisting of all scene
objects and relations between them.
• The scene objects contain all static and dynamic objects, such as vehicles, pedestrians, lane
segments, signs and traffic lights.
• This work focus on vehicles V ⊂ E, lane segments L ⊂ E and the three relation types vehicle-
vehicle relations, vehicle-lane relations and lane-lane relations.
• Every and relation holds several properties or attributes of the scene objects, such as e.g.
absolute positions or relative velocities.
• This scene description combines low level attributes with high level relational knowledge in a
generic way. It is thus applicable to any traffic scene and vehicle sensor setup, making it a
beneficial state representation.
Adaptive Behavior Generation for Autonomous Driving using
Deep Reinforcement Learning with Compact Semantic States
Partially visualized entity-relationship model of the scene in the last figure (a). The vehicle topology is
modeled by vehicle-vehicle relations whereas the lane topology is modeled by lane-lane relations.
But the representation is of varying size and includes more aspects than are relevant for a given driving task. In
order to use this representation as the input to a neural network, transform it to a fixed-size relational grid that
includes only the most relevant relations.
Adaptive Behavior Generation for Autonomous Driving using
Deep Reinforcement Learning with Compact Semantic States
• The set of vehicles inside the vehicle scope Λ is denoted by VΛ ⊆ V. The different features of
vehicles and lanes are represented by separate layers of the grid, resulting in a semantic state
representation.
• The column vector of each grid cell holds attributes of the vehicle and the lane it belongs to.
While the ego vehicle’s features are absolute values, other vehicles’ features are relative to
the ego vehicle or the lane they are in (induced by the vehicle-vehicle and vehicle-lane
relations of the entity-relationship model):
• 1) Non-ego vehicle features: longitudinal pos & velocity, lateral alignment and heading;
• 2) Ego vehicle features: behavior adaptation function, longitudinal velocity, lane index;
• 3) Lane features: lane type and lane ending;
• Non-perceivable features or missing entities are indicated in the semantic state
representation by a dedicated value.
• The relational grid ensures a consistent representation of the environment, independent of
the road geometry or the number of surrounding vehicles.
Adaptive Behavior Generation for Autonomous Driving using
Deep Reinforcement Learning with Compact Semantic States
The relational grid contains one layer per feature. The vehicle features fv
i and fv
ego share layers and are in the
cells of the vi and vego respectively. The lane features fk
l are on additional layers in the k-th row of the grid.
Adaptive Behavior Generation for Autonomous Driving using
Deep Reinforcement Learning with Compact Semantic States
• The vehicle’s actions space is defined by a set of semantic actions that
is deemed sufficient for most on-road driving tasks, excluding special
cases such as U-turns.
• The longitudinal movement of the vehicle is controlled by the actions
accelerate and decelerate.
• While executing these actions, the ego vehicle keeps its lane.
• Lateral movement is generated by the actions lane change left as well
as lane change right respectively.
• Only a single action is executed at a time and actions are executed in
their entirety, the vehicle is not able to prematurely abort an action.
• The default action results in no change of either velocity, lateral
alignment or heading.
Adaptive Behavior Generation for Autonomous Driving using
Deep Reinforcement Learning with Compact Semantic States
• With the aim to generate adaptive behavior, extend the reward function R(s,a)
by a parameterization θ.
• This parameterization is used in the behavior adaptation function Ω(τ,θ), so
that the agent is able to learn different desired behaviors without the need
to train a new model for varying parameter values.
• Furthermore, the desired driving behavior consists of several individual goals,
modeled by separated rewards.
• To rank these reward functions by three different priorities.
• The highest priority has collision avoidance, important but to a lesser extent
are rewards associated with traffic rules, least prioritized are rewards
connected to the driving style.
• The overall reward function R(s, a, θ) can be expressed
Adaptive Behavior Generation for Autonomous Driving using
Deep Reinforcement Learning with Compact Semantic States
• The subset Scollision ⊂ S consists of all states s describing a collision state of the ego
vehicle vego and another vehicle vi.
• In these states the agent only receives the immediate reward without any possibility
to earn any other future rewards.
• Additionally, attempting a lane change to a nonexistent adjacent lane is also treated
as a collision.
• The state dependent evaluation of the reward factors facilitates the learning process.
• As the reward for a state is independent of rewards with lower priority, the eligibility
trace is more concise for the agent being trained.
• For example, driving at the desired velocity does not mitigate the reward for collisions.
Adaptive Behavior Generation for Autonomous Driving using
Deep Reinforcement Learning with Compact Semantic States
The current state from SUMO is retrieved and transformed into the semantic state representation. This
is the input to the RL agent, which is trained using the TensorForce library. Chosen actions are mapped
to a respective state change of the simulation. The agent’s reward is then computed based on the initial
state, the chosen action and the successor state.
While the concept is able to handle data from many preprocessing methods used in
autonomous vehicles, it tested the approach with the traffic simulation SUMO.
Adaptive Behavior Generation for Autonomous Driving using
Deep Reinforcement Learning with Compact Semantic States
To examine the agent’s compliance to traffic rules, it is trained and evaluated on two different traffic
scenarios. In (a) the agent has the obligation to drive on the right most lane and must not pass others from
the right, amongst other constraints. In (b) the agent is allowed to accelerate while on the on-ramp and also
might overtake vehicles on its left. But it has to leave the on-ramp before it ends.
Results of the trained agents. The agents gH
(highway)and gM (merging ramp)were only
evaluated on their respective scenario, while gC
(both highway and merging ramp)was
evaluated on both. The results for gC are listed
separately for each scenario.
Adaptive Behavior Generation for Autonomous Driving using
Deep Reinforcement Learning with Compact Semantic States
Overtaking maneuver. The agent (blue) is driving behind a
slower vehicle (green). In this situation the action lane change left
has the highest estimated Q-value. After the lane change, the
agent accelerates and overtakes the slower vehicle.
Subsequently, the agent changes back to the right most lane.
The average speed of each agent given varying desired
velocities [m/s]. The agents have been evaluated on the
training scenarios with normal traffic density as well as
on an empty highway. While not every agent is able to
achieve the desired velocity precisely, their behavior
adapts to the different parameter values.
Probabilistic Prediction of Interactive Driving Behavior
via Hierarchical Inverse Reinforcement Learning
• To safely and efficiently interact with other road participants, AVs have to accurately predict
the behavior of surrounding vehicles and plan accordingly.
• Such prediction should be probabilistic, to address the uncertainties in human behavior.
• Such prediction should also be interactive, since the distribution over all possible trajectories
of the predicted vehicle depends not only on historical information, but also on future plans
of other vehicles that interact with it.
• To achieve such interaction-aware predictions, propose a probabilistic prediction approach
based on hierarchical inverse reinforcement learning (IRL).
• First, explicitly consider the hierarchical trajectory-generation process of human drivers
involving both discrete and continuous driving decisions.
• Based on this, the distribution over all future trajectories of the predicted vehicle is
formulated as a mixture of distributions partitioned by the discrete decisions.
• Then apply IRL hierarchically to learn the distributions from real human demonstrations.
• A case study for the ramp-merging driving scenario is provided.
• The quantitative results show that the proposed approach can accurately predict both the
discrete driving decisions such as yield or pass as well as the continuous trajectories.
Probabilistic Prediction of Interactive Driving Behavior
via Hierarchical Inverse Reinforcement Learning
• It focus on predicting human drivers’ interactive behavior within two vehicles: a host vehicle
(denoted by (·)H ) and a predicted vehicle (denoted by (·)M ).
• All other in-scene vehicles are treated as surrounding vehicles, denoted by (·)i
O (i=1, 2, · · ·,
N is the index for surrounding vehicles).
• It uses ξ to represent historical vehicle trajectories, and ξˆ for future trajectories.
• It is obvious that the probability distribution over all possible trajectories of the predicted
vehicle depends on his own historical trajectory and those of surrounding vehicles.
• Mathematically, such time-domain state dependency can be modelled as a conditional
probability density function (PDF):
• The probability distribution over all possible trajectories of the predicted vehicle should also
be conditioned on potential plans of the host vehicle.
Probabilistic Prediction of Interactive Driving Behavior
via Hierarchical Inverse Reinforcement Learning
• The trajectory-generation process of human drivers is naturally probabilistic and
hierarchical.
• It involves both discrete and continuous driving decisions.
• The discrete driving decisions determine “rough” pattern (or homotopy class) of his future
trajectory as a game-theoretic result (e.g., to yield or to pass), whereas the continuous
driving decisions influence details of the trajectory such as velocities, accelerations and
smoothness.
• The conditional distribution is formulated as a mixture of distributions, which explicitly
captures the influences of both discrete and continuous driving decisions of human drivers:
• Thus propose to apply inverse reinforcement learning (IRL)hierarchically to learn all the
models from observed demonstrations of human drivers.
Probabilistic Prediction of Interactive Driving Behavior
via Hierarchical Inverse Reinforcement Learning
• Figure shows the probabilistic and hierarchical trajectory–
generation process for a lane changing scenario.
• The predicted vehicle (blue) is trying to merge into the lane
of the host vehicle (red).
• Given all observed historical trajectories ξ={ξ1:N
O, ξH , ξM }
and his belief about the host vehicle’s future trajectory ξH ,
the trajectory distribution of the predicted vehicle over all
the trajectory space is partitioned first by the discrete
decisions: merge behind (d1
M) and merge front (d2
M).
• Under different discrete decisions, the distributions of
continuous trajectories can be significantly different, and
each of them is represented via a probability distribution
model.
• The observed demonstrations are samples satisfying the
distributions.
Probabilistic Prediction of Interactive Driving Behavior
via Hierarchical Inverse Reinforcement Learning
• Based on the principle of maximum entropy, assume that all drivers are exponentially more
likely to make decisions (both discrete and continuous) that lead to a lower cost.
• This introduces an family of exponential distributions depending on the cost functions, and
interest is to find the optimal hierarchical cost functions that ultimately lead to trajectory
distributions matching that of the observed trajectories in a given demonstration set Ξ.
• Since the trajectory space is continuous and demonstrations are noisily local-optimal, use
Continuous Inverse Optimal Control with Locally Optimal Examples.
• Continuous space IRL: the goal is to find the optimal parameter vector such the given
demonstration set is most likely to happen.
• Under discrete decision, assume that the cost of each trajectory can be linearly parametrized by a
group of selected features.
Probabilistic Prediction of Interactive Driving Behavior
via Hierarchical Inverse Reinforcement Learning
• Features of Continuous Driving Decisions:
• Speed: vs the certain speed limit
• Traffic: based on intelligent driver model (IDM)
• Control effort and smoothness: a set of kinematics-related features
• Clearance to other road participants: a distance-related feature
• Goal:
• Courtesy:
Probabilistic Prediction of Interactive Driving Behavior
via Hierarchical Inverse Reinforcement Learning
• Features of Discrete Driving Decisions:
• Rotation angle:
• Minimum cost:
• Discrete space IRL: assume that the decisions with lower cost are exponentially more
probable.
• The goal is to find the optimal parameter vector such that the likelihood of the demonstration set
is maximized
• The parameter vector is updated via
Probabilistic Prediction of Interactive Driving Behavior
via Hierarchical Inverse Reinforcement Learning
• Collect human driving data from the Next Generation SIMulation (NGSIM) dataset. It
captures the highway driving behaviors/trajectories by cameras mounted on top of
surrounding buildings. The sampling time of the trajectories is △t=0.01s.
• Choose 134 ramp-merging trajectories on Interstate 80 (near Emeryville, California), and
separated them into two sets: a training set of size 80 (denoted by Ξ, i.e., the human
demonstrations), and the other 54 trajectories.
Figure shows the road map and an example group of trajectories. There are four cars in
scene, one merging vehicle (red), one lane-keeping vehicle (blue) and two surrounding
vehicles (black), with one ahead of the blue car and the other behind. interest focuses on
the interactive driving behavior of both the merging vehicle and the lane-keeping vehicle.
Probabilistic Prediction of Interactive Driving Behavior
via Hierarchical Inverse Reinforcement Learning
• Use the same hierarchical IRL approach to model the conditional probability distributions for
both the merging vehicle and the lane-keeping vehicle.
• Driving Decisions: In the ramp-merging scenario, the driving decisions are listed as in Table.
• As mentioned above, x=[x1, · · · , xL]T and y=[y1, · · · , yL]T are, respectively, the coordinate
vectors in Frenet Frame along the longitudinal and lateral directions.
• L is set to be 50, i.e., in each demonstration, 5s trajectories are collected.
• Feature selection: Since the right of way for the merging vehicle and the lane-keeping
vehicles are different, define different features for them.
• Lane keeping: yield and pass;
• Merge: merge back and front, feature for IDM and courtesy.
Probabilistic Prediction of Interactive Driving Behavior
via Hierarchical Inverse Reinforcement Learning
• Use Tensorflow to implement the
hierarchical IRL algorithm.
• Figure gives the training curves
regarding to continuous and discrete
driving decisions.
• Due to the hierarchical structure, first
learn all four continuous distribution
models for both the merging vehicle
and the lane- keeping vehicle under
different discrete decisions.
• Randomly sample subsets of
trajectories from the training set and
perform multiple trains.
• As seen, the parameters in each
continuous model converge
consistently with small variance.
Probabilistic Prediction of Interactive Driving Behavior
via Hierarchical Inverse Reinforcement Learning
• In this test, generate the most probable trajectories under different discrete driving decisions
by solving a finite horizon Model Predictive Control (MPC) problem using the learned
continuous cost functions.
• Three illustrative examples:The red dotted lines and blue solid lines represent, respectively,
the predicted most likely trajectories and the ground truth trajectories;The thick black dash-
dot lines are trajectories of other vehicles;The predicted trajectories are very close to the
ground truth ones.
Zero-shot Deep Reinforcement Learning Driving Policy
Transfer for Autonomous Vehicles based on Robust Control
• Although deep RL methods have lots of strengths that are favorable if applied to
autonomous driving, real deep RL applications in autonomous driving have been
slowed down by the modeling gap between the source (training) domain and the
target (deployment) domain.
• Unlike current policy transfer approaches, which generally limit to the usage of
uninterpretable neural network representations as the transferred features, propose
to transfer concrete kinematic quantities in autonomous driving.
• The proposed robust-control-based (RC) generic transfer architecture, which call
RL-RC, incorporates a transferable hierarchical RL trajectory planner and a robust
tracking controller based on disturbance observer (DOB).
• The deep RL policies trained with known nominal dynamics model are transfered
directly to the target domain, DOB-based robust tracking control is applied to
tackle the modeling gap including the vehicle dynamics errors and the external
disturbances such as side forces.
• Simulations achieve zero-shot transfer across multiple driving scenarios such as lane
keeping, lane changing and obstacle avoidance.
Zero-shot Deep Reinforcement Learning Driving Policy
Transfer for Autonomous Vehicles based on Robust Control
• The proposed RL-RC policy transfer
architecture consists of a deep RL-
based high-level planning module and
a RC- based low-level tracking
controller.
• The overall methodology consists of
the off-line deep RL policy training
and the on-line policy transfer.
• In the off-line source domain, pre-
train a deep RL policy mapping the
perception input to control commands,
which can drive the source vehicle to
produce a trajectory completing the
control task.
For the on-line transfer in the target domain, it shows how the
system works to transfer the source driving policies.
Zero-shot Deep Reinforcement Learning Driving Policy
Transfer for Autonomous Vehicles based on Robust Control
• According to perception observation, the system constructs an imaginary source agent in the
same setting as the target agent.
• In the imaginary source domain, the pretrained policy is used to control the imaginary source
agent to perform the driving task for a finite horizon, resulting in a trajectory of kinematic
states, which serves as the reference for the target vehicle.
• Given the reference trajectory, the target agent uses a closed-loop robust tracking controller
to produce the actual control command for the target vehicle.
• As the target environment makes one step forward, the observation for the new timestep is
collected, and the system repeats the same procedure.
• In this architecture, the kinematic features are transfered without change, and the modeling
gap is compensated by RC.
• To generate the reference trajectory in the imaginary source domain, train a deep RL policy
network that directly maps perception input to trajectory of kinematic states.
• There are two key underlying assumptions for the RL-RC system:
• (i) the trajectory planned by the imaginary source agent is comparably satisfying for the target task;
• (ii) it is feasible for the target vehicle to track the trajectories produced by the source vehicle.
• These assumptions are reasonable as the source and target vehicles and settings are similar,
and the RL-RC is only effective for such cases.
Zero-shot Deep Reinforcement Learning Driving Policy
Transfer for Autonomous Vehicles based on Robust Control
Illustration of the terminology for the simulated environment and the linear tracking model for the controller.
In both the source domain and the target domain, simulate the vehicle dynamics using nonlinear
bicycle model discretized using forward Euler method with ∆t = 0.02s.
Zero-shot Deep Reinforcement Learning Driving Policy
Transfer for Autonomous Vehicles based on Robust Control
• To learn driving policy using deep RL, define Markov Decision Process (MDP) for the driving tasks.
• Since different tasks have different observations, define clusters of observations as entities.
• Interfaces of lane keeping (LK), lane changing (LC) and obstacle avoidance (OA) are shown in Table I.
• In implementation, a hierarchical RL model that can modularize distinct driving attributes is applied.
• The attributes refer to driving behaviors such as obstacle detection, lane selection, and lane tracking.
• Table II and Figure give the detailed module interfaces and their usages.
Table I DEFINITION OF DRIVING TASKS Table II DEFINITION OF RL MODULES
Zero-shot Deep Reinforcement Learning Driving Policy
Transfer for Autonomous Vehicles based on Robust Control
• Theoretically, all the three basic modules can
be optimized using any kind of method
including deep RL.
• In implementation, both lane-selection and
obstacle-detection are rule-based and
reasonably optimized for the sake of simplicity.
• The lane- tracking is optimized using model-
free deep RL.
• The benefit of hierarchical implementation over
E2E training is that basic attribute modules are
easier to optimize compared to E2E training of
complicated high-level driving policies.
Usage of hierarchical RL modules to assemble
policies for the lane keeping (LK), lane changing
(LC) and obstacle avoidance (OA) tasks.
Zero-shot Deep Reinforcement Learning Driving Policy
Transfer for Autonomous Vehicles based on Robust Control
• To design the controller, first need to obtain an
approximate linear model for the tracking
problem.
• Adopt the constant speed linear bicycle model
for lateral dynamics
• Using the equations above and applying
forward Euler discretization, the overall tracking
model can be obtained as the block diagram:
• The robust controller is designed based on the
nominal model based on the source vehicle.
• Then a DOB is added to the nominal feedback
controller,the block diagram for the overall
closed-loop system is shown:
Block diagram of linear tracking model
Block diagram of closed-loop system
Zero-shot Deep Reinforcement Learning Driving Policy
Transfer for Autonomous Vehicles based on Robust Control
Comparison of the driving behaviors for the RL and the RL-RC with modeling gap in the Lane Changing (LC) task.
Risk-averse Behavior Planning for
Autonomous Driving under Uncertainty
• Autonomous vehicles have to navigate the surrounding environment with partial
observability of other objects sharing the road.
• Sources of uncertainty in autonomous vehicle measurements include sensor fusion
errors, limited sensor range due to weather or object detection latency, occlusion,
and hidden parameters such as other human driver intentions.
• Behavior planning must consider all sources of uncertainty in deciding future vehicle
maneuvers.
• This paper presents a scalable framework for risk-averse behavior planning under
uncertainty by incorporating QMDP, unscented transform, and Monte Carlo tree
search (MCTS).
• It is shown that upper confidence bound (UCB) for expanding the tree results in
noisy Q-value estimates by the MCTS and a degraded performance of QMDP.
• A modification to action selection procedure in MCTS is proposed to achieve robust
performance.
Risk-averse Behavior Planning for
Autonomous Driving under Uncertainty
• A modular stack is considered where a road world model(RWM)is inferred based on sensor
inputs and map information. Let st represent the state of the EV as well as other objects
sharing the road with EV at time t, where j-th object state is defined as
• which represent lateral and longitudinal location, velocity, and acceleration for the object,
respectively, and time indices are dropped for simplicity.
• A hierarchical behavior planner is considered where the action space, A, is defined as a
composition of a high level action set AHL = {LaneKeep, LaneChangeR, LaneChangeL, Yield,
Negotiatelanechange} and a set of parameters, Θ, specifying how to execute the action (safe
time headway, min dist from a lead vehicle, desired velocity, max acceleration/deceleration,
level of politeness, and direction and max time/distance of lane change).
• The behavior planner action is then passed to a motion planning and control module to
generate the trajectory planning and throttle/steering control. Based on a time series of
inputs from the RWM, the behavior planner selects an action based on a risk-averse metric:
Risk-averse Behavior Planning for
Autonomous Driving under Uncertainty
• This is a partially observable Markov decision process (POMDP)problem and its exact
solution is computationally intractable. Instead, use a QMDP-based framework equipped
with the unscented transform to generate sigma-points to sample the belief distribution. In
particular, first select 2ns + 1 sigma-points in a deterministic way as follows:
• In implementation, any sigma-point that is too close to x0 or results in physically infeasible
situation (e.g., a car is placed out of the road, overlaps with another object, or has
acceleration/deceleration/velocity beyond physical limits) is excluded.
• Therefore, the final value of ns could be smaller than the dimension of vector s. The mean
and variance of the Q-function are then calculated as follows:
Risk-averse Behavior Planning for
Autonomous Driving under Uncertainty
• To calculate QMDP(s, a) for a given state s, use an online planner, in particular a Monte Carlo
tree search (MCTS). In baseline MCTS, the tree is expanded using the UCT bound as follows:
• It should be noted that the main objective of the MCTS approach is to select the best action
a∗, and hence ranking of the Q-values across different actions is sufficient, and accuracy of
the value function is not very important, especially for actions with low values.
• It incorporated stronger exploration approach into MCTS, by combining the UCT bound with
ε-greedy exploration.
• This implemented the ε-greedy approach since it resulted in acceptable performance.
• The modified action selection criteria is then given by
Risk-averse Behavior Planning for
Autonomous Driving under Uncertainty
Highway ramp with limited field of view
To investigate two driving scenarios with uncertainty: 1) stationary object on the road
beyond sensor range, and 2) highway ramp with limited field of view.
It assumes there exists a stationary object on EV’s path, 400m away from its current
location. Define sensor range as a distance at which the prob. of detecting a stationary
object is 10%.
Consider a highway ramp where EV experiences a limited field of view due to existing
obstacles (e.g., trees) or difference in the elevation.
Risk-averse Behavior Planning for
Autonomous Driving under Uncertainty
• For baseline,consider two MCTS-based schemes: 1) MCTS-Genie that receives noise-free
observations, and 2) MCTS-Noisy which makes decisions based on noisy measurements.
• The initial velocity of both EV and MV is 20m/s.
• Figure compares EV’s velocity for the candidate algorithms, and Table provides further
information about the vehicles’ status at the merge point for a realization of the noise under
which initial measurements of MV velocity are lower than its actual value.
Risk-averse Behavior Planning for
Autonomous Driving under Uncertainty
• In the beginning, MCTS-Noisy assumes that MV is going slower than EV, and decides to
accelerate to pass MV before reaching the merge point.
• However, after few seconds, it realizes that the initial measurements were off and in fact it is
not able to take over MV.
• Although it applies aggressive deceleration (with jerk -6.8m/s3) at that moment, there is not
enough time left to create a safe gap before the merge point (see distance to lead and time
headway reported in Table).
• MCTS-Genie observes the actual velocity of MV and hence is able to make a correct
decision from the beginning.
• For RA-QMDP, select α = 0.01 and ε = 1.
• RA-QMDP takes three sigma-points one of which considers that MV is going faster than the
value that RWM reports.
• It then makes a risk-averse decision which forces the EV to slow down and increase the gap
with MV.
• Note that RA-QMDP has some delay in making that decision since its evaluation is also
influenced by two other sigma-points that suggest MV is going slow and there might be an
opportunity to take over MV.
Multimodal Trajectory Predictions for Autonomous
Driving using Deep Convolutional Networks
• High uncertainty of traffic behavior and large number of situations
that a Self Driving Vehicle may encounter on the roads, making it
very difficult to create a fully generalizable system.
• To ensure safe and efficient operations, an autonomous vehicle is
required to account for this uncertainty and to anticipate a
multitude of possible behaviors of traffic actors in its surrounding.
• This is a method to predict multiple possible trajectories of actors
while also estimating their probabilities.
• The method encodes each actor’s surrounding context into a
raster image, used as input by deep convolutional networks to
automatically derive relevant features for the task.
Multimodal Trajectory Predictions for Autonomous
Driving using Deep Convolutional Networks
first rasterize an actor-specific BEV raster image encoding the actor’s map surrounding and
neighboring actors (e.g., other vehicles and pedestrians).
Then, given i-th actor’s raster image and state estimate sij at time step tj, use a CNN model
to predict a multitude of M possible future state sequences, as well as each sequence’s
probability pim such that m(pim)= 1, where m indicates mode index.
simplify the task to infer i-th actor’s future x- and y-positions, while the remaining states can
be derived by considering sij and the future position estimates. Both past and future positions
at time tj are represented in the actor-centric coordinate system derived from actor’s state at
time tj , where forward direction is x-axis, left-hand direction is y-axis, and actor’s bounding
box centroid is the origin.
Multimodal Trajectory Predictions for Autonomous
Driving using Deep Convolutional Networks
Modeling multimodality of future 6-second vehicle trajectories; predicted trajectories are marked in
blue, with their probabilities indicated at the end of the trajectories
Multimodal Trajectory Predictions for Autonomous
Driving using Deep Convolutional Networks
It takes an actor-specific 300 × 300 RGB raster image with 0.2m resolution and actor’s current state
(velocity, acceleration, and heading change rate) as input, and outputs M modes of future x- and y-
positions (2H outputs per mode) along with their probabilities (one scalar per mode). This results in (2H
+ 1)M outputs per actor. Probability outputs are passed through a softmax layer to ensure they sum to 1.
Note that any CNN architecture can be used as the base network; here use MobileNet-v2.
Multimodal Trajectory Predictions for Autonomous
Driving using Deep Convolutional Networks
It proposes to use a Multiple-Trajectory Prediction (MTP) loss, that explicitly models the
multimodality of the trajectory space.
Mixture-of-Experts (ME) loss:
First, define a single-mode loss L of the i-th actor’s m-th mode at time tj as average displacement
error (or l2-norm) between the points of ground-truth trajectory and predicted trajectory of the m-th
mode as
Multimodal Trajectory Predictions for Autonomous
Driving using Deep Convolutional Networks
Mode selection methods (modes shown in
blue): the ground truth (green) matches to the
right-turn mode when using displacement, but
the straight mode when using angle.
However, this distance function in the loss does not model the
multimodal behaviors well at intersections, as illustrated in
Figure. To address this problem, propose a function that
measures distance by considering an angle between the last
points of the two trajectories as seen from the actor position,
which improves handling of the intersection scenarios.
In practice, put a threshold of 5◦ on the angle difference.
Then, all modes which differ from the ground-truth trajectory
by less than 5◦ are considered to be potential matches, and
use displacement distance to break the tie among them. It
is found that this simple optimization makes the model
generalize well for both the intersection and straight-road
scenarios.
Multimodal Trajectory Predictions for Autonomous
Driving using Deep Convolutional Networks
It shows rasters for the same scene but using two different following lanes marked in light pink,
one going straight and one turning left. The method outputs trajectories that follow the intended
paths well, and can be used to generate a multitude of trajectories for lane-following vehicles.
Assuming a knowledge of the possible lanes that can be followed and a lane-scoring system
that filters unlikely lanes, add another rasterization layer that encodes this information and
train the network to output a lane-following trajectory. Then, for one scene , generate multiple
rasters with various lanes to be followed, effectively inferring multiple trajectories.
Multimodal Trajectory Predictions for Autonomous
Driving using Deep Convolutional Networks
The effect on output trajectories of varying number of modes M from 1 to 4, respectively from left to right image.
In Figure visualize outputs for an intersection scene with an actor going straight, when increasing
number of modes M. For M = 1 (i.e., the STP model) the inferred trajectory is roughly an average of
straight and right-turn modes. Incrementing the number of modes to 2 get a clear separation
between modes going straight and turning right. Further, setting M to 3 the left-turn mode also
appears, albeit with lower probability. When M = 4 get an interesting effect, where the go-straight
mode splits into "fast" and "slow" modes, modeling variability in actor’s longitudinal movement.
Exploring the Limitations of Behavior
Cloning for Autonomous Driving
• Driving requires reacting to a wide variety of complex environment conditions and agent
behaviors.
• Explicitly modeling each possible scenario is unrealistic.
• In contrast, imitation learning can, in theory, leverage data from large fleets of human-
driven cars.
• Behavior cloning in particular has been successfully used to learn simple visuomotor policies
end-to-end, but scaling to the full spectrum of driving behaviors remains an unsolved
problem.
• A benchmark to experimentally investigate the scalability and limitations of behavior cloning.
• Behavior cloning leads to state-of-the-art results, including in unseen environments,
executing complex lateral and longitudinal maneuvers without these reactions being
explicitly programmed.
• However, well-known limitations (due to dataset bias and overfitting), new generalization
issues (due to dynamic objects and the lack of a causal model), and training instability
requiring further research before behavior cloning can graduate to real-world driving.
Exploring the Limitations of Behavior
Cloning for Autonomous Driving
• Behavior cloning is a form of supervised learning that can learn sensorimotor policies from
off-line collected data.
• The only requirements are pairs of input sensory observations associated with expert actions.
• Use an expanded formulation for self-driving cars called Conditional Imitation Learning, CIL.
• It uses a high-level navigational command c that disambiguates imitation around multiple
types of intersections.
• Given an expert policy π∗(x) with access to the environment state x, can execute this policy
to produce a dataset, D = {⟨oi , ci , ai ⟩}N
i=1, where oi are sensor data observations, ci are
high-level commands (e.g., take the next right, left, or stay in lane) and ai = π∗(xi) are the
resulting vehicle actions (low-level controls).
• Observations oi = {i, vm} contain a single image i and the ego car speed vm added for the
system to properly react to dynamic objects on the road.
• Without the speed context, the model cannot learn if and when it should accelerate or brake
to reach a desired speed or stop.
Exploring the Limitations of Behavior
Cloning for Autonomous Driving
• to learn a policy π parametrized by θ to produce similar actions to π∗ based only on
observations o and high- level commands c.
• The best parameters θ∗ are obtained by minimizing an imitation cost l :
• In order to evaluate the performance of the learned policy π(oi, ci; θ) on-line at test time,
assume access to a score function giving a numeric value expressing the performance of the
policy π on a given benchmark.
• In addition to the distributional shift problem,limitations of Behavior cloning:
• Bias in Naturalistic Driving Datasets.
• Causal Confusion.
• High variance.
Exploring the Limitations of Behavior
Cloning for Autonomous Driving
proposed network architecture, called CILRS, for end-to-end urban driving based on CIL. A ResNet perception
module processes an input image to a latent space followed by two prediction heads: one for controls and one
for speed.
Exploring the Limitations of Behavior
Cloning for Autonomous Driving
• The open source simulator CARLA provides a diverse, photo-realistic, and dynamic
environment with challenging driving conditions.
• A new larger scale CARLA driving benchmark, called NoCrash, designed to test the ability of
ego vehicles to handle complex events caused by changing traffic conditions (e.g., traffic
lights) and dynamic agents in the scene.
• Three different tasks, each one corresponding to 25 goal directed episodes.
• In each episode, the agent starts at a random position and is directed by a high-level
planner into reaching some goal position.
• The three tasks as follows:
• Empty Town: no dynamic objects.
• Regular Traffic: moderate number of cars and pedestrians.
• Dense Traffic: large number of pedestrians and heavy traffic (dense urban scenario).
• Similar to the CARLA Benchmark, NoCrash has six different weather conditions, where four
were seen in training and two reserved for testing.
• Also give two towns, one that is seen during training, and the other reserved for testing.
Thanks

More Related Content

What's hot

Pedestrian behavior/intention modeling for autonomous driving III
Pedestrian behavior/intention modeling for autonomous driving IIIPedestrian behavior/intention modeling for autonomous driving III
Pedestrian behavior/intention modeling for autonomous driving IIIYu Huang
 
Driving Behavior for ADAS and Autonomous Driving X
Driving Behavior for ADAS and Autonomous Driving XDriving Behavior for ADAS and Autonomous Driving X
Driving Behavior for ADAS and Autonomous Driving XYu Huang
 
Driving Behavior for ADAS and Autonomous Driving IV
Driving Behavior for ADAS and Autonomous Driving IVDriving Behavior for ADAS and Autonomous Driving IV
Driving Behavior for ADAS and Autonomous Driving IVYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Driving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIDriving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIYu Huang
 
Pedestrian Behavior/Intention Modeling for Autonomous Driving VI
Pedestrian Behavior/Intention Modeling for Autonomous Driving VIPedestrian Behavior/Intention Modeling for Autonomous Driving VI
Pedestrian Behavior/Intention Modeling for Autonomous Driving VIYu Huang
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIYu Huang
 
Driving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIDriving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIYu Huang
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
 
Pedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IVPedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IVYu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
Driving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivDriving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivYu Huang
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous drivingYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIYu Huang
 
Driving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIDriving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIYu Huang
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic SegmentationYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIYu Huang
 

What's hot (20)

Pedestrian behavior/intention modeling for autonomous driving III
Pedestrian behavior/intention modeling for autonomous driving IIIPedestrian behavior/intention modeling for autonomous driving III
Pedestrian behavior/intention modeling for autonomous driving III
 
Driving Behavior for ADAS and Autonomous Driving X
Driving Behavior for ADAS and Autonomous Driving XDriving Behavior for ADAS and Autonomous Driving X
Driving Behavior for ADAS and Autonomous Driving X
 
Driving Behavior for ADAS and Autonomous Driving IV
Driving Behavior for ADAS and Autonomous Driving IVDriving Behavior for ADAS and Autonomous Driving IV
Driving Behavior for ADAS and Autonomous Driving IV
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Driving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIDriving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XI
 
Pedestrian Behavior/Intention Modeling for Autonomous Driving VI
Pedestrian Behavior/Intention Modeling for Autonomous Driving VIPedestrian Behavior/Intention Modeling for Autonomous Driving VI
Pedestrian Behavior/Intention Modeling for Autonomous Driving VI
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving III
 
Driving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIDriving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving III
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
 
Pedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IVPedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IV
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
Driving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivDriving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xiv
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving3-d interpretation from stereo images for autonomous driving
3-d interpretation from stereo images for autonomous driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XII
 
Driving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIDriving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VII
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 

Similar to Driving Behavior for ADAS and Autonomous Driving V

Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...IRJET Journal
 
Self-Driving Cars With Convolutional Neural Networks (CNN.pptx
Self-Driving Cars With Convolutional Neural Networks (CNN.pptxSelf-Driving Cars With Convolutional Neural Networks (CNN.pptx
Self-Driving Cars With Convolutional Neural Networks (CNN.pptxssuserf79e761
 
IRJET-To Analyze Calibration of Car-Following Behavior of Vehicles
IRJET-To Analyze Calibration of Car-Following Behavior of VehiclesIRJET-To Analyze Calibration of Car-Following Behavior of Vehicles
IRJET-To Analyze Calibration of Car-Following Behavior of VehiclesIRJET Journal
 
SPLT Transformer.pptx
SPLT Transformer.pptxSPLT Transformer.pptx
SPLT Transformer.pptxSeungeon Baek
 
Imitation Learning and Direct Perception for Autonomous Driving
Imitation Learning and Direct Perception for Autonomous DrivingImitation Learning and Direct Perception for Autonomous Driving
Imitation Learning and Direct Perception for Autonomous DrivingRocky Liang
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Poolingivaderivader
 
Aimsun saturadion flow rate calibration
Aimsun saturadion flow rate calibrationAimsun saturadion flow rate calibration
Aimsun saturadion flow rate calibrationJumpingJaq
 
Scalable algorithms for nearest neighbor joins on big trajectory data
Scalable algorithms for nearest neighbor joins on big trajectory dataScalable algorithms for nearest neighbor joins on big trajectory data
Scalable algorithms for nearest neighbor joins on big trajectory dataieeepondy
 
deep-reinforcement-learning-framework.pdf
deep-reinforcement-learning-framework.pdfdeep-reinforcement-learning-framework.pdf
deep-reinforcement-learning-framework.pdfYugank Aman
 
Description of the project thesis at Fraunhofer ITWM
Description of the project thesis at Fraunhofer ITWMDescription of the project thesis at Fraunhofer ITWM
Description of the project thesis at Fraunhofer ITWMAditya Mahesh
 
Victor Eneji Peter.pdf
Victor Eneji Peter.pdfVictor Eneji Peter.pdf
Victor Eneji Peter.pdfVictorPeter18
 
Agent based car following model for heterogeneities of platoon driving with v...
Agent based car following model for heterogeneities of platoon driving with v...Agent based car following model for heterogeneities of platoon driving with v...
Agent based car following model for heterogeneities of platoon driving with v...Institute for Transport Studies (ITS)
 
The Future of Mixed-Autonomy Traffic (AIS302) - AWS re:Invent 2018
The Future of Mixed-Autonomy Traffic (AIS302) - AWS re:Invent 2018The Future of Mixed-Autonomy Traffic (AIS302) - AWS re:Invent 2018
The Future of Mixed-Autonomy Traffic (AIS302) - AWS re:Invent 2018Amazon Web Services
 
Autonomous Ground Vehicles The Darpa Grand Challenge
Autonomous Ground Vehicles The Darpa Grand ChallengeAutonomous Ground Vehicles The Darpa Grand Challenge
Autonomous Ground Vehicles The Darpa Grand Challengexzhou
 
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...Dongmin Lee
 

Similar to Driving Behavior for ADAS and Autonomous Driving V (20)

Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
FinalReport
FinalReportFinalReport
FinalReport
 
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
 
Self-Driving Cars With Convolutional Neural Networks (CNN.pptx
Self-Driving Cars With Convolutional Neural Networks (CNN.pptxSelf-Driving Cars With Convolutional Neural Networks (CNN.pptx
Self-Driving Cars With Convolutional Neural Networks (CNN.pptx
 
IRJET-To Analyze Calibration of Car-Following Behavior of Vehicles
IRJET-To Analyze Calibration of Car-Following Behavior of VehiclesIRJET-To Analyze Calibration of Car-Following Behavior of Vehicles
IRJET-To Analyze Calibration of Car-Following Behavior of Vehicles
 
SPLT Transformer.pptx
SPLT Transformer.pptxSPLT Transformer.pptx
SPLT Transformer.pptx
 
Imitation Learning and Direct Perception for Autonomous Driving
Imitation Learning and Direct Perception for Autonomous DrivingImitation Learning and Direct Perception for Autonomous Driving
Imitation Learning and Direct Perception for Autonomous Driving
 
Cy4301578585
Cy4301578585Cy4301578585
Cy4301578585
 
Hcs
HcsHcs
Hcs
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
 
Aimsun saturadion flow rate calibration
Aimsun saturadion flow rate calibrationAimsun saturadion flow rate calibration
Aimsun saturadion flow rate calibration
 
Scalable algorithms for nearest neighbor joins on big trajectory data
Scalable algorithms for nearest neighbor joins on big trajectory dataScalable algorithms for nearest neighbor joins on big trajectory data
Scalable algorithms for nearest neighbor joins on big trajectory data
 
deep-reinforcement-learning-framework.pdf
deep-reinforcement-learning-framework.pdfdeep-reinforcement-learning-framework.pdf
deep-reinforcement-learning-framework.pdf
 
Description of the project thesis at Fraunhofer ITWM
Description of the project thesis at Fraunhofer ITWMDescription of the project thesis at Fraunhofer ITWM
Description of the project thesis at Fraunhofer ITWM
 
Dave
DaveDave
Dave
 
Victor Eneji Peter.pdf
Victor Eneji Peter.pdfVictor Eneji Peter.pdf
Victor Eneji Peter.pdf
 
Agent based car following model for heterogeneities of platoon driving with v...
Agent based car following model for heterogeneities of platoon driving with v...Agent based car following model for heterogeneities of platoon driving with v...
Agent based car following model for heterogeneities of platoon driving with v...
 
The Future of Mixed-Autonomy Traffic (AIS302) - AWS re:Invent 2018
The Future of Mixed-Autonomy Traffic (AIS302) - AWS re:Invent 2018The Future of Mixed-Autonomy Traffic (AIS302) - AWS re:Invent 2018
The Future of Mixed-Autonomy Traffic (AIS302) - AWS re:Invent 2018
 
Autonomous Ground Vehicles The Darpa Grand Challenge
Autonomous Ground Vehicles The Darpa Grand ChallengeAutonomous Ground Vehicles The Darpa Grand Challenge
Autonomous Ground Vehicles The Darpa Grand Challenge
 
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
 

More from Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationYu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningYu Huang
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainYu Huang
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksYu Huang
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image VYu Huang
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IVYu Huang
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image IIIYu Huang
 

More from Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucks
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
 

Recently uploaded

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessorAshwiniTodkar4
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxMustafa Ahmed
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesChandrakantDivate1
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxNANDHAKUMARA10
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesRashidFaridChishti
 
Post office management system project ..pdf
Post office management system project ..pdfPost office management system project ..pdf
Post office management system project ..pdfKamal Acharya
 
Introduction to Geographic Information Systems
Introduction to Geographic Information SystemsIntroduction to Geographic Information Systems
Introduction to Geographic Information SystemsAnge Felix NSANZIYERA
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...HenryBriggs2
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...ppkakm
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 

Recently uploaded (20)

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
Post office management system project ..pdf
Post office management system project ..pdfPost office management system project ..pdf
Post office management system project ..pdf
 
Introduction to Geographic Information Systems
Introduction to Geographic Information SystemsIntroduction to Geographic Information Systems
Introduction to Geographic Information Systems
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 

Driving Behavior for ADAS and Autonomous Driving V

  • 1. Driving Behavior for ADAS and Autonomous Driving V Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline • An Auto-tuning Framework for AVs • Adaptive Behavior Generation for AD using Deep RL with Compact Semantic States • Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse RL • Zero-shot Deep RL Driving Policy Transfer for AVs based on Robust Control • Risk-averse Behavior Planning for AD under Uncertainty • Multimodal Trajectory Predictions for AD using Deep Convolutional Networks • Exploring the Limitations of Behavior Cloning for AD
  • 3. An Auto-tuning Framework for Autonomous Vehicles • Many AD motion planners generate trajectories by optimizing a reward/cost functional. • Designing and tuning a high-performance reward/cost functional for Level-4 autonomous driving vehicles with exposure to different driving conditions is challenging. • Traditionally, reward/cost functional tuning involves substantial human effort and time spent on both simulations and road tests. • As the scenario becomes more complicated, tuning to improve the motion planner performance becomes increasingly difficult. • To systematically solve this issue, develop a data-driven auto-tuning framework based on the Apollo autonomous driving framework. • The framework includes a rank-based conditional inverse reinforcement learning (IRL) alg., an offline training strategy and an automatic method of collecting and labeling data. • The auto-tuning framework has the advantages that make it suitable for tuning an autonomous driving motion planner. • compared to most IRL algorithms, training is efficient and capable to different scenarios. • offline training strategy offers a safe way to adjust the parameters before public road testing. • expert driving data and information about the surrounding environment are collected and automatically labeled, which considerably reduces the manual effort. • motion planner tuned by the framework is examined via both simulation and public road testing.
  • 4. An Auto-tuning Framework for Autonomous Vehicles Data-driven autonomous driving motion planner on the Apollo platform. Algorithm tuning loop for the motion planner in the Apollo autonomous driving platform
  • 5. An Auto-tuning Framework for Autonomous Vehicles • The idea for learning the reward functional includes two key parts: conditional comparison and rank-based learning. • Instead of comparing the expectation of value functions of the expert demonstration and optimal policy, compare the value functions state by state. • To accelerate the training process and extend the coverage of corner cases, sample random policies and compare against the expert demonstration instead of generating the optimal policy first, as in policy gradient. • The background (driving scenarios) differences may impact the tuned reward functional significantly. Online and offline pipeline of the auto-tuning frame- work by module.
  • 6. An Auto-tuning Framework for Autonomous Vehicles Feature Description Given A Trajectory Point Given a path profile in a station-lateral coordinate system, obstacles and predicted moving trajectories are projected on the station- time graph if there are any interactions with the moving path of the ego car. Then, the goal is to generate a speed profile on the station- time graph that can safely avoid obstacles and maintain smooth driving. The optimal speed profile is generated by optimizing the cost/reward functional, which captures the trajectory smoothness, distance to different obstacles, and path smoothness. Application of speed profile generation inside EM plann
  • 7. An Auto-tuning Framework for Autonomous Vehicles Siamese network used in RC-IRL. The value networks of both the human and the sampled trajectories share the same network parameter settings. The loss function evaluates the difference between the sampled data and the generated trajectory via the value network outputs. The value network inside the Siamese model is used to capture driving behavior based on encoded features. The network is a trainable linear combination of encoded rewards at different times. The weight of the encoded reward is a learnable time decay factor. The encoded reward includes an input layer with 21 raw features and a hidden layer with 15 nodes.
  • 8. An Auto-tuning Framework for Autonomous Vehicles • To validate the general performance in terms of conditional IRL, consider a general adversarial network (GAN) for comparison. • In GAN, each sampled trajectory is labeled as human driving (1) or random trajectory (0) and trained with cross entropy loss. • The network is hoped to distinguish human driving from a randomly generated trajectory. • Since the GAN cannot distinguish different scenarios, the procedure compares the average performance of human driving trajectories and that of randomly generated trajectories. Simulation Results of the Motion Planner
  • 9. • IEEE Intelligent Vehicles Symposium (IV), June 2018 • Making the right decision in traffic is a challenging task that is highly dependent on individual preferences as well as the surrounding environment. Therefore it is hard to model solely based on expert knowledge. • It uses Deep Reinforcement Learning to learn maneuver decisions based on a compact semantic state representation. • This ensures a consistent model of the environment across scenarios as well as a behavior adaptation function, enabling on-line changes of desired behaviors without re-training. • The input for the neural network is a simulated object list similar to that of Radar or Lidar sensors, superimposed by a relational semantic scene description. • The state & reward are extended by a behavior adaptation function and a parameterization respectively. • With little expert knowledge and a set of mid-level actions, it can be seen that the agent is capable to adhere to traffic rules and learns to drive safely in a variety of situations. Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States
  • 10. Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States • A reinforcement learning process is commonly modeled as an MDP (S,A,R,δ,T) where S is the set of states, A the set of actions, R : S×A×S → R the reward function, δ : S × A → S the state transition model and T the set of terminal states. At timestep i an agent in state s ∈S can choose an action a ∈A according to a policy π and will progress into the successor state s′ receiving reward r. This is defined as a transition t = (s, a, s′, r). • The aim of reinforcement learning is to maximize the future discounted return . • A DQN uses Q-Learning to learn Q-values for each action given input state s based on past transitions. The predicted Q-values of the DQN are used to adapt the policy π and therefore change the agent’s behavior.
  • 11. The initial traffic scene is transformed into a compact semantic state representation s and used as input for the RL agent. The agent estimates the action a with the highest return (Q-value) and executes it, e.g., changing lanes. Afterwards a reward r is collected and a new state s′ is reached. The transition (s, a, r, s′ ) is stored in the agent’s replay memory. Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States
  • 12. Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States (a) The ego vehicle (blue) is driving on a two lane road. Five other vehicles are in its sensor range. Based on a vehicle scope with Λlateral = Λahead = Λbehind = 1 only four vehicles (green) are considered in the semantic state representation. (b) The relational grid resulting from the scene of (a) with Λlateral = Λahead = Λbehind = 1. Moreover, there is no adjacent lane to the right, which is represented in the grid as well. An example scene (a) is transformed to a relational grid (b) using a vehicle scope Λ with Λlateral = Λahead = Λbehind = 1. The red vehicle, which is in sensor range, is not represented in the grid. Since there is no vehicle driving on the same lane in front of the ego vehicle, the respective relational position in the grid is empty. define a relational grid, centered at the ego vehicle vego ∈V, The rows correspond to the relational lane topology, whereas the columns correspond to the vehicle topology on these lanes.
  • 13. Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States • For the input state representation,adapt the ontology- based concept focusing on relations with other traffic participants as well as the road topology. • Design the state representation to use high level preprocessed sensory inputs such as object lists provided by common Radar and Lidar sensors and lane boundaries from visual sensors or map data. To generate the desired behavior the reward is comprised of different factors with varying priorities. • A traffic scene τ is described by a semantic entity- relationship model, consisting of all scene objects and relations between them. • The scene objects contain all static and dynamic objects, such as vehicles, pedestrians, lane segments, signs and traffic lights. • This work focus on vehicles V ⊂ E, lane segments L ⊂ E and the three relation types vehicle- vehicle relations, vehicle-lane relations and lane-lane relations. • Every and relation holds several properties or attributes of the scene objects, such as e.g. absolute positions or relative velocities. • This scene description combines low level attributes with high level relational knowledge in a generic way. It is thus applicable to any traffic scene and vehicle sensor setup, making it a beneficial state representation.
  • 14. Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States Partially visualized entity-relationship model of the scene in the last figure (a). The vehicle topology is modeled by vehicle-vehicle relations whereas the lane topology is modeled by lane-lane relations. But the representation is of varying size and includes more aspects than are relevant for a given driving task. In order to use this representation as the input to a neural network, transform it to a fixed-size relational grid that includes only the most relevant relations.
  • 15. Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States • The set of vehicles inside the vehicle scope Λ is denoted by VΛ ⊆ V. The different features of vehicles and lanes are represented by separate layers of the grid, resulting in a semantic state representation. • The column vector of each grid cell holds attributes of the vehicle and the lane it belongs to. While the ego vehicle’s features are absolute values, other vehicles’ features are relative to the ego vehicle or the lane they are in (induced by the vehicle-vehicle and vehicle-lane relations of the entity-relationship model): • 1) Non-ego vehicle features: longitudinal pos & velocity, lateral alignment and heading; • 2) Ego vehicle features: behavior adaptation function, longitudinal velocity, lane index; • 3) Lane features: lane type and lane ending; • Non-perceivable features or missing entities are indicated in the semantic state representation by a dedicated value. • The relational grid ensures a consistent representation of the environment, independent of the road geometry or the number of surrounding vehicles.
  • 16. Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States The relational grid contains one layer per feature. The vehicle features fv i and fv ego share layers and are in the cells of the vi and vego respectively. The lane features fk l are on additional layers in the k-th row of the grid.
  • 17. Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States • The vehicle’s actions space is defined by a set of semantic actions that is deemed sufficient for most on-road driving tasks, excluding special cases such as U-turns. • The longitudinal movement of the vehicle is controlled by the actions accelerate and decelerate. • While executing these actions, the ego vehicle keeps its lane. • Lateral movement is generated by the actions lane change left as well as lane change right respectively. • Only a single action is executed at a time and actions are executed in their entirety, the vehicle is not able to prematurely abort an action. • The default action results in no change of either velocity, lateral alignment or heading.
  • 18. Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States • With the aim to generate adaptive behavior, extend the reward function R(s,a) by a parameterization θ. • This parameterization is used in the behavior adaptation function Ω(τ,θ), so that the agent is able to learn different desired behaviors without the need to train a new model for varying parameter values. • Furthermore, the desired driving behavior consists of several individual goals, modeled by separated rewards. • To rank these reward functions by three different priorities. • The highest priority has collision avoidance, important but to a lesser extent are rewards associated with traffic rules, least prioritized are rewards connected to the driving style. • The overall reward function R(s, a, θ) can be expressed
  • 19. Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States • The subset Scollision ⊂ S consists of all states s describing a collision state of the ego vehicle vego and another vehicle vi. • In these states the agent only receives the immediate reward without any possibility to earn any other future rewards. • Additionally, attempting a lane change to a nonexistent adjacent lane is also treated as a collision. • The state dependent evaluation of the reward factors facilitates the learning process. • As the reward for a state is independent of rewards with lower priority, the eligibility trace is more concise for the agent being trained. • For example, driving at the desired velocity does not mitigate the reward for collisions.
  • 20. Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States The current state from SUMO is retrieved and transformed into the semantic state representation. This is the input to the RL agent, which is trained using the TensorForce library. Chosen actions are mapped to a respective state change of the simulation. The agent’s reward is then computed based on the initial state, the chosen action and the successor state. While the concept is able to handle data from many preprocessing methods used in autonomous vehicles, it tested the approach with the traffic simulation SUMO.
  • 21. Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States To examine the agent’s compliance to traffic rules, it is trained and evaluated on two different traffic scenarios. In (a) the agent has the obligation to drive on the right most lane and must not pass others from the right, amongst other constraints. In (b) the agent is allowed to accelerate while on the on-ramp and also might overtake vehicles on its left. But it has to leave the on-ramp before it ends. Results of the trained agents. The agents gH (highway)and gM (merging ramp)were only evaluated on their respective scenario, while gC (both highway and merging ramp)was evaluated on both. The results for gC are listed separately for each scenario.
  • 22. Adaptive Behavior Generation for Autonomous Driving using Deep Reinforcement Learning with Compact Semantic States Overtaking maneuver. The agent (blue) is driving behind a slower vehicle (green). In this situation the action lane change left has the highest estimated Q-value. After the lane change, the agent accelerates and overtakes the slower vehicle. Subsequently, the agent changes back to the right most lane. The average speed of each agent given varying desired velocities [m/s]. The agents have been evaluated on the training scenarios with normal traffic density as well as on an empty highway. While not every agent is able to achieve the desired velocity precisely, their behavior adapts to the different parameter values.
  • 23. Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning • To safely and efficiently interact with other road participants, AVs have to accurately predict the behavior of surrounding vehicles and plan accordingly. • Such prediction should be probabilistic, to address the uncertainties in human behavior. • Such prediction should also be interactive, since the distribution over all possible trajectories of the predicted vehicle depends not only on historical information, but also on future plans of other vehicles that interact with it. • To achieve such interaction-aware predictions, propose a probabilistic prediction approach based on hierarchical inverse reinforcement learning (IRL). • First, explicitly consider the hierarchical trajectory-generation process of human drivers involving both discrete and continuous driving decisions. • Based on this, the distribution over all future trajectories of the predicted vehicle is formulated as a mixture of distributions partitioned by the discrete decisions. • Then apply IRL hierarchically to learn the distributions from real human demonstrations. • A case study for the ramp-merging driving scenario is provided. • The quantitative results show that the proposed approach can accurately predict both the discrete driving decisions such as yield or pass as well as the continuous trajectories.
  • 24. Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning • It focus on predicting human drivers’ interactive behavior within two vehicles: a host vehicle (denoted by (·)H ) and a predicted vehicle (denoted by (·)M ). • All other in-scene vehicles are treated as surrounding vehicles, denoted by (·)i O (i=1, 2, · · ·, N is the index for surrounding vehicles). • It uses ξ to represent historical vehicle trajectories, and ξˆ for future trajectories. • It is obvious that the probability distribution over all possible trajectories of the predicted vehicle depends on his own historical trajectory and those of surrounding vehicles. • Mathematically, such time-domain state dependency can be modelled as a conditional probability density function (PDF): • The probability distribution over all possible trajectories of the predicted vehicle should also be conditioned on potential plans of the host vehicle.
  • 25. Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning • The trajectory-generation process of human drivers is naturally probabilistic and hierarchical. • It involves both discrete and continuous driving decisions. • The discrete driving decisions determine “rough” pattern (or homotopy class) of his future trajectory as a game-theoretic result (e.g., to yield or to pass), whereas the continuous driving decisions influence details of the trajectory such as velocities, accelerations and smoothness. • The conditional distribution is formulated as a mixture of distributions, which explicitly captures the influences of both discrete and continuous driving decisions of human drivers: • Thus propose to apply inverse reinforcement learning (IRL)hierarchically to learn all the models from observed demonstrations of human drivers.
  • 26. Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning • Figure shows the probabilistic and hierarchical trajectory– generation process for a lane changing scenario. • The predicted vehicle (blue) is trying to merge into the lane of the host vehicle (red). • Given all observed historical trajectories ξ={ξ1:N O, ξH , ξM } and his belief about the host vehicle’s future trajectory ξH , the trajectory distribution of the predicted vehicle over all the trajectory space is partitioned first by the discrete decisions: merge behind (d1 M) and merge front (d2 M). • Under different discrete decisions, the distributions of continuous trajectories can be significantly different, and each of them is represented via a probability distribution model. • The observed demonstrations are samples satisfying the distributions.
  • 27. Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning • Based on the principle of maximum entropy, assume that all drivers are exponentially more likely to make decisions (both discrete and continuous) that lead to a lower cost. • This introduces an family of exponential distributions depending on the cost functions, and interest is to find the optimal hierarchical cost functions that ultimately lead to trajectory distributions matching that of the observed trajectories in a given demonstration set Ξ. • Since the trajectory space is continuous and demonstrations are noisily local-optimal, use Continuous Inverse Optimal Control with Locally Optimal Examples. • Continuous space IRL: the goal is to find the optimal parameter vector such the given demonstration set is most likely to happen. • Under discrete decision, assume that the cost of each trajectory can be linearly parametrized by a group of selected features.
  • 28. Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning • Features of Continuous Driving Decisions: • Speed: vs the certain speed limit • Traffic: based on intelligent driver model (IDM) • Control effort and smoothness: a set of kinematics-related features • Clearance to other road participants: a distance-related feature • Goal: • Courtesy:
  • 29. Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning • Features of Discrete Driving Decisions: • Rotation angle: • Minimum cost: • Discrete space IRL: assume that the decisions with lower cost are exponentially more probable. • The goal is to find the optimal parameter vector such that the likelihood of the demonstration set is maximized • The parameter vector is updated via
  • 30. Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning • Collect human driving data from the Next Generation SIMulation (NGSIM) dataset. It captures the highway driving behaviors/trajectories by cameras mounted on top of surrounding buildings. The sampling time of the trajectories is △t=0.01s. • Choose 134 ramp-merging trajectories on Interstate 80 (near Emeryville, California), and separated them into two sets: a training set of size 80 (denoted by Ξ, i.e., the human demonstrations), and the other 54 trajectories. Figure shows the road map and an example group of trajectories. There are four cars in scene, one merging vehicle (red), one lane-keeping vehicle (blue) and two surrounding vehicles (black), with one ahead of the blue car and the other behind. interest focuses on the interactive driving behavior of both the merging vehicle and the lane-keeping vehicle.
  • 31. Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning • Use the same hierarchical IRL approach to model the conditional probability distributions for both the merging vehicle and the lane-keeping vehicle. • Driving Decisions: In the ramp-merging scenario, the driving decisions are listed as in Table. • As mentioned above, x=[x1, · · · , xL]T and y=[y1, · · · , yL]T are, respectively, the coordinate vectors in Frenet Frame along the longitudinal and lateral directions. • L is set to be 50, i.e., in each demonstration, 5s trajectories are collected. • Feature selection: Since the right of way for the merging vehicle and the lane-keeping vehicles are different, define different features for them. • Lane keeping: yield and pass; • Merge: merge back and front, feature for IDM and courtesy.
  • 32. Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning • Use Tensorflow to implement the hierarchical IRL algorithm. • Figure gives the training curves regarding to continuous and discrete driving decisions. • Due to the hierarchical structure, first learn all four continuous distribution models for both the merging vehicle and the lane- keeping vehicle under different discrete decisions. • Randomly sample subsets of trajectories from the training set and perform multiple trains. • As seen, the parameters in each continuous model converge consistently with small variance.
  • 33. Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning • In this test, generate the most probable trajectories under different discrete driving decisions by solving a finite horizon Model Predictive Control (MPC) problem using the learned continuous cost functions. • Three illustrative examples:The red dotted lines and blue solid lines represent, respectively, the predicted most likely trajectories and the ground truth trajectories;The thick black dash- dot lines are trajectories of other vehicles;The predicted trajectories are very close to the ground truth ones.
  • 34. Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control • Although deep RL methods have lots of strengths that are favorable if applied to autonomous driving, real deep RL applications in autonomous driving have been slowed down by the modeling gap between the source (training) domain and the target (deployment) domain. • Unlike current policy transfer approaches, which generally limit to the usage of uninterpretable neural network representations as the transferred features, propose to transfer concrete kinematic quantities in autonomous driving. • The proposed robust-control-based (RC) generic transfer architecture, which call RL-RC, incorporates a transferable hierarchical RL trajectory planner and a robust tracking controller based on disturbance observer (DOB). • The deep RL policies trained with known nominal dynamics model are transfered directly to the target domain, DOB-based robust tracking control is applied to tackle the modeling gap including the vehicle dynamics errors and the external disturbances such as side forces. • Simulations achieve zero-shot transfer across multiple driving scenarios such as lane keeping, lane changing and obstacle avoidance.
  • 35. Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control • The proposed RL-RC policy transfer architecture consists of a deep RL- based high-level planning module and a RC- based low-level tracking controller. • The overall methodology consists of the off-line deep RL policy training and the on-line policy transfer. • In the off-line source domain, pre- train a deep RL policy mapping the perception input to control commands, which can drive the source vehicle to produce a trajectory completing the control task. For the on-line transfer in the target domain, it shows how the system works to transfer the source driving policies.
  • 36. Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control • According to perception observation, the system constructs an imaginary source agent in the same setting as the target agent. • In the imaginary source domain, the pretrained policy is used to control the imaginary source agent to perform the driving task for a finite horizon, resulting in a trajectory of kinematic states, which serves as the reference for the target vehicle. • Given the reference trajectory, the target agent uses a closed-loop robust tracking controller to produce the actual control command for the target vehicle. • As the target environment makes one step forward, the observation for the new timestep is collected, and the system repeats the same procedure. • In this architecture, the kinematic features are transfered without change, and the modeling gap is compensated by RC. • To generate the reference trajectory in the imaginary source domain, train a deep RL policy network that directly maps perception input to trajectory of kinematic states. • There are two key underlying assumptions for the RL-RC system: • (i) the trajectory planned by the imaginary source agent is comparably satisfying for the target task; • (ii) it is feasible for the target vehicle to track the trajectories produced by the source vehicle. • These assumptions are reasonable as the source and target vehicles and settings are similar, and the RL-RC is only effective for such cases.
  • 37. Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control Illustration of the terminology for the simulated environment and the linear tracking model for the controller. In both the source domain and the target domain, simulate the vehicle dynamics using nonlinear bicycle model discretized using forward Euler method with ∆t = 0.02s.
  • 38. Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control • To learn driving policy using deep RL, define Markov Decision Process (MDP) for the driving tasks. • Since different tasks have different observations, define clusters of observations as entities. • Interfaces of lane keeping (LK), lane changing (LC) and obstacle avoidance (OA) are shown in Table I. • In implementation, a hierarchical RL model that can modularize distinct driving attributes is applied. • The attributes refer to driving behaviors such as obstacle detection, lane selection, and lane tracking. • Table II and Figure give the detailed module interfaces and their usages. Table I DEFINITION OF DRIVING TASKS Table II DEFINITION OF RL MODULES
  • 39. Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control • Theoretically, all the three basic modules can be optimized using any kind of method including deep RL. • In implementation, both lane-selection and obstacle-detection are rule-based and reasonably optimized for the sake of simplicity. • The lane- tracking is optimized using model- free deep RL. • The benefit of hierarchical implementation over E2E training is that basic attribute modules are easier to optimize compared to E2E training of complicated high-level driving policies. Usage of hierarchical RL modules to assemble policies for the lane keeping (LK), lane changing (LC) and obstacle avoidance (OA) tasks.
  • 40. Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control • To design the controller, first need to obtain an approximate linear model for the tracking problem. • Adopt the constant speed linear bicycle model for lateral dynamics • Using the equations above and applying forward Euler discretization, the overall tracking model can be obtained as the block diagram: • The robust controller is designed based on the nominal model based on the source vehicle. • Then a DOB is added to the nominal feedback controller,the block diagram for the overall closed-loop system is shown: Block diagram of linear tracking model Block diagram of closed-loop system
  • 41. Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control Comparison of the driving behaviors for the RL and the RL-RC with modeling gap in the Lane Changing (LC) task.
  • 42. Risk-averse Behavior Planning for Autonomous Driving under Uncertainty • Autonomous vehicles have to navigate the surrounding environment with partial observability of other objects sharing the road. • Sources of uncertainty in autonomous vehicle measurements include sensor fusion errors, limited sensor range due to weather or object detection latency, occlusion, and hidden parameters such as other human driver intentions. • Behavior planning must consider all sources of uncertainty in deciding future vehicle maneuvers. • This paper presents a scalable framework for risk-averse behavior planning under uncertainty by incorporating QMDP, unscented transform, and Monte Carlo tree search (MCTS). • It is shown that upper confidence bound (UCB) for expanding the tree results in noisy Q-value estimates by the MCTS and a degraded performance of QMDP. • A modification to action selection procedure in MCTS is proposed to achieve robust performance.
  • 43. Risk-averse Behavior Planning for Autonomous Driving under Uncertainty • A modular stack is considered where a road world model(RWM)is inferred based on sensor inputs and map information. Let st represent the state of the EV as well as other objects sharing the road with EV at time t, where j-th object state is defined as • which represent lateral and longitudinal location, velocity, and acceleration for the object, respectively, and time indices are dropped for simplicity. • A hierarchical behavior planner is considered where the action space, A, is defined as a composition of a high level action set AHL = {LaneKeep, LaneChangeR, LaneChangeL, Yield, Negotiatelanechange} and a set of parameters, Θ, specifying how to execute the action (safe time headway, min dist from a lead vehicle, desired velocity, max acceleration/deceleration, level of politeness, and direction and max time/distance of lane change). • The behavior planner action is then passed to a motion planning and control module to generate the trajectory planning and throttle/steering control. Based on a time series of inputs from the RWM, the behavior planner selects an action based on a risk-averse metric:
  • 44. Risk-averse Behavior Planning for Autonomous Driving under Uncertainty • This is a partially observable Markov decision process (POMDP)problem and its exact solution is computationally intractable. Instead, use a QMDP-based framework equipped with the unscented transform to generate sigma-points to sample the belief distribution. In particular, first select 2ns + 1 sigma-points in a deterministic way as follows: • In implementation, any sigma-point that is too close to x0 or results in physically infeasible situation (e.g., a car is placed out of the road, overlaps with another object, or has acceleration/deceleration/velocity beyond physical limits) is excluded. • Therefore, the final value of ns could be smaller than the dimension of vector s. The mean and variance of the Q-function are then calculated as follows:
  • 45. Risk-averse Behavior Planning for Autonomous Driving under Uncertainty • To calculate QMDP(s, a) for a given state s, use an online planner, in particular a Monte Carlo tree search (MCTS). In baseline MCTS, the tree is expanded using the UCT bound as follows: • It should be noted that the main objective of the MCTS approach is to select the best action a∗, and hence ranking of the Q-values across different actions is sufficient, and accuracy of the value function is not very important, especially for actions with low values. • It incorporated stronger exploration approach into MCTS, by combining the UCT bound with ε-greedy exploration. • This implemented the ε-greedy approach since it resulted in acceptable performance. • The modified action selection criteria is then given by
  • 46. Risk-averse Behavior Planning for Autonomous Driving under Uncertainty Highway ramp with limited field of view To investigate two driving scenarios with uncertainty: 1) stationary object on the road beyond sensor range, and 2) highway ramp with limited field of view. It assumes there exists a stationary object on EV’s path, 400m away from its current location. Define sensor range as a distance at which the prob. of detecting a stationary object is 10%. Consider a highway ramp where EV experiences a limited field of view due to existing obstacles (e.g., trees) or difference in the elevation.
  • 47. Risk-averse Behavior Planning for Autonomous Driving under Uncertainty • For baseline,consider two MCTS-based schemes: 1) MCTS-Genie that receives noise-free observations, and 2) MCTS-Noisy which makes decisions based on noisy measurements. • The initial velocity of both EV and MV is 20m/s. • Figure compares EV’s velocity for the candidate algorithms, and Table provides further information about the vehicles’ status at the merge point for a realization of the noise under which initial measurements of MV velocity are lower than its actual value.
  • 48. Risk-averse Behavior Planning for Autonomous Driving under Uncertainty • In the beginning, MCTS-Noisy assumes that MV is going slower than EV, and decides to accelerate to pass MV before reaching the merge point. • However, after few seconds, it realizes that the initial measurements were off and in fact it is not able to take over MV. • Although it applies aggressive deceleration (with jerk -6.8m/s3) at that moment, there is not enough time left to create a safe gap before the merge point (see distance to lead and time headway reported in Table). • MCTS-Genie observes the actual velocity of MV and hence is able to make a correct decision from the beginning. • For RA-QMDP, select α = 0.01 and ε = 1. • RA-QMDP takes three sigma-points one of which considers that MV is going faster than the value that RWM reports. • It then makes a risk-averse decision which forces the EV to slow down and increase the gap with MV. • Note that RA-QMDP has some delay in making that decision since its evaluation is also influenced by two other sigma-points that suggest MV is going slow and there might be an opportunity to take over MV.
  • 49. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks • High uncertainty of traffic behavior and large number of situations that a Self Driving Vehicle may encounter on the roads, making it very difficult to create a fully generalizable system. • To ensure safe and efficient operations, an autonomous vehicle is required to account for this uncertainty and to anticipate a multitude of possible behaviors of traffic actors in its surrounding. • This is a method to predict multiple possible trajectories of actors while also estimating their probabilities. • The method encodes each actor’s surrounding context into a raster image, used as input by deep convolutional networks to automatically derive relevant features for the task.
  • 50. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks first rasterize an actor-specific BEV raster image encoding the actor’s map surrounding and neighboring actors (e.g., other vehicles and pedestrians). Then, given i-th actor’s raster image and state estimate sij at time step tj, use a CNN model to predict a multitude of M possible future state sequences, as well as each sequence’s probability pim such that m(pim)= 1, where m indicates mode index. simplify the task to infer i-th actor’s future x- and y-positions, while the remaining states can be derived by considering sij and the future position estimates. Both past and future positions at time tj are represented in the actor-centric coordinate system derived from actor’s state at time tj , where forward direction is x-axis, left-hand direction is y-axis, and actor’s bounding box centroid is the origin.
  • 51. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks Modeling multimodality of future 6-second vehicle trajectories; predicted trajectories are marked in blue, with their probabilities indicated at the end of the trajectories
  • 52. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks It takes an actor-specific 300 × 300 RGB raster image with 0.2m resolution and actor’s current state (velocity, acceleration, and heading change rate) as input, and outputs M modes of future x- and y- positions (2H outputs per mode) along with their probabilities (one scalar per mode). This results in (2H + 1)M outputs per actor. Probability outputs are passed through a softmax layer to ensure they sum to 1. Note that any CNN architecture can be used as the base network; here use MobileNet-v2.
  • 53. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks It proposes to use a Multiple-Trajectory Prediction (MTP) loss, that explicitly models the multimodality of the trajectory space. Mixture-of-Experts (ME) loss: First, define a single-mode loss L of the i-th actor’s m-th mode at time tj as average displacement error (or l2-norm) between the points of ground-truth trajectory and predicted trajectory of the m-th mode as
  • 54. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks Mode selection methods (modes shown in blue): the ground truth (green) matches to the right-turn mode when using displacement, but the straight mode when using angle. However, this distance function in the loss does not model the multimodal behaviors well at intersections, as illustrated in Figure. To address this problem, propose a function that measures distance by considering an angle between the last points of the two trajectories as seen from the actor position, which improves handling of the intersection scenarios. In practice, put a threshold of 5◦ on the angle difference. Then, all modes which differ from the ground-truth trajectory by less than 5◦ are considered to be potential matches, and use displacement distance to break the tie among them. It is found that this simple optimization makes the model generalize well for both the intersection and straight-road scenarios.
  • 55. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks It shows rasters for the same scene but using two different following lanes marked in light pink, one going straight and one turning left. The method outputs trajectories that follow the intended paths well, and can be used to generate a multitude of trajectories for lane-following vehicles. Assuming a knowledge of the possible lanes that can be followed and a lane-scoring system that filters unlikely lanes, add another rasterization layer that encodes this information and train the network to output a lane-following trajectory. Then, for one scene , generate multiple rasters with various lanes to be followed, effectively inferring multiple trajectories.
  • 56. Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks The effect on output trajectories of varying number of modes M from 1 to 4, respectively from left to right image. In Figure visualize outputs for an intersection scene with an actor going straight, when increasing number of modes M. For M = 1 (i.e., the STP model) the inferred trajectory is roughly an average of straight and right-turn modes. Incrementing the number of modes to 2 get a clear separation between modes going straight and turning right. Further, setting M to 3 the left-turn mode also appears, albeit with lower probability. When M = 4 get an interesting effect, where the go-straight mode splits into "fast" and "slow" modes, modeling variability in actor’s longitudinal movement.
  • 57. Exploring the Limitations of Behavior Cloning for Autonomous Driving • Driving requires reacting to a wide variety of complex environment conditions and agent behaviors. • Explicitly modeling each possible scenario is unrealistic. • In contrast, imitation learning can, in theory, leverage data from large fleets of human- driven cars. • Behavior cloning in particular has been successfully used to learn simple visuomotor policies end-to-end, but scaling to the full spectrum of driving behaviors remains an unsolved problem. • A benchmark to experimentally investigate the scalability and limitations of behavior cloning. • Behavior cloning leads to state-of-the-art results, including in unseen environments, executing complex lateral and longitudinal maneuvers without these reactions being explicitly programmed. • However, well-known limitations (due to dataset bias and overfitting), new generalization issues (due to dynamic objects and the lack of a causal model), and training instability requiring further research before behavior cloning can graduate to real-world driving.
  • 58. Exploring the Limitations of Behavior Cloning for Autonomous Driving • Behavior cloning is a form of supervised learning that can learn sensorimotor policies from off-line collected data. • The only requirements are pairs of input sensory observations associated with expert actions. • Use an expanded formulation for self-driving cars called Conditional Imitation Learning, CIL. • It uses a high-level navigational command c that disambiguates imitation around multiple types of intersections. • Given an expert policy π∗(x) with access to the environment state x, can execute this policy to produce a dataset, D = {⟨oi , ci , ai ⟩}N i=1, where oi are sensor data observations, ci are high-level commands (e.g., take the next right, left, or stay in lane) and ai = π∗(xi) are the resulting vehicle actions (low-level controls). • Observations oi = {i, vm} contain a single image i and the ego car speed vm added for the system to properly react to dynamic objects on the road. • Without the speed context, the model cannot learn if and when it should accelerate or brake to reach a desired speed or stop.
  • 59. Exploring the Limitations of Behavior Cloning for Autonomous Driving • to learn a policy π parametrized by θ to produce similar actions to π∗ based only on observations o and high- level commands c. • The best parameters θ∗ are obtained by minimizing an imitation cost l : • In order to evaluate the performance of the learned policy π(oi, ci; θ) on-line at test time, assume access to a score function giving a numeric value expressing the performance of the policy π on a given benchmark. • In addition to the distributional shift problem,limitations of Behavior cloning: • Bias in Naturalistic Driving Datasets. • Causal Confusion. • High variance.
  • 60. Exploring the Limitations of Behavior Cloning for Autonomous Driving proposed network architecture, called CILRS, for end-to-end urban driving based on CIL. A ResNet perception module processes an input image to a latent space followed by two prediction heads: one for controls and one for speed.
  • 61. Exploring the Limitations of Behavior Cloning for Autonomous Driving • The open source simulator CARLA provides a diverse, photo-realistic, and dynamic environment with challenging driving conditions. • A new larger scale CARLA driving benchmark, called NoCrash, designed to test the ability of ego vehicles to handle complex events caused by changing traffic conditions (e.g., traffic lights) and dynamic agents in the scene. • Three different tasks, each one corresponding to 25 goal directed episodes. • In each episode, the agent starts at a random position and is directed by a high-level planner into reaching some goal position. • The three tasks as follows: • Empty Town: no dynamic objects. • Regular Traffic: moderate number of cars and pedestrians. • Dense Traffic: large number of pedestrians and heavy traffic (dense urban scenario). • Similar to the CARLA Benchmark, NoCrash has six different weather conditions, where four were seen in training and two reserved for testing. • Also give two towns, one that is seen during training, and the other reserved for testing.