Driving Behavior for ADAS and Autonomous Driving IX

Driving Behaviors for ADAS
and Autonomous Driving IX
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California

Outline
• Jointly Learnable Behavior and Trajectory Planning for Self-Driving Vehicles
• GRIP: Graph-based Interaction-aware Trajectory Prediction
• Deep Predictive Autonomous Driving Using Multi-Agent Joint Trajectory
Prediction and Traffic Rules
• NeuroTrajectory: A Neuroevolutionary Approach to Local State Trajectory
Learning for Autonomous Vehicles
• Generic Prediction Architecture Considering both Rational and Irrational Driving
Behaviors

Jointly Learnable Behavior and Trajectory Planning
for Self-Driving Vehicles
• The motion planners used in self-driving vehicles need to generate trajectories that are safe,
comfortable, and obey the traffic rules.
• This is usually achieved by two modules: behavior planner, which handles high-level decisions and
produces a coarse trajectory, and trajectory planner that generates a smooth, feasible trajectory
for the duration of the planning horizon.
• These planners, however, are typically developed separately, and changes in the behavior planner
might affect the trajectory planner in unexpected ways.
• Furthermore, the final trajectory outputted by the trajectory planner might differ significantly
from the one generated by the behavior planner, as they do not share the same objective.
• This paper proposes a jointly learnable behavior and trajectory planner, while unlike most existing
learnable motion planners that address either only behavior planning, or use an uninterpretable
neural network to represent the entire logic from sensors to driving commands.
• This approach features an interpretable cost function on top of perception, prediction and vehicle
dynamics, and a joint learning algorithm that learns a shared cost function employed by the
behavior and trajectory components.

The learnable motion planner has discrete and continuous components, minimizing
the same cost function with a same set of learned cost weights.

A: Given a scenario, we generate a set of possible SDV (self-driving vehicles) behaviors. B:
Left and right lane boundaries and the driving path that are relevant to the intended
behavior are considered in the cost function. C: SDV geometry for spatiotemporal
overlapping cost are approximated using circles. D: The SDV yields to pedestrians through
stop lines on the driving paths.

• The objective of the planner is then to find a behavior and a trajectory that is safe, comfortable,
and progressing along the route.
• Given sets of candidate behaviors and trajectories, the cost function is to choose the best (b, τ ).
• The cost function consists of sub-costs c that focus on different aspects of the trajectories such as
safety, comfort, feasibility, mission completion, and traffic rules.
• Obstacles: A safe trajectory for the SDV should not only be collision free, but also satisfy a safety-
distance to the surrounding obstacles, including both the static and dynamic objects such as
vehicles, pedestrians, cyclists, unknown objects, etc. The overlap cost coverlap is then 1 if a
trajectory violates the spatial occupancy of any obstacle in a given predicted trajectory, and is
averaged across all possible predicted trajectories weighted by their probabilities. The obstacle
cost cobstacle penalizes the squared distance of the violation of the safety-distance dsafe. This cost is
scaled by the speed of the SDV, making the distance violation more costly at higher speeds.

• Driving-path and lane boundary: The SDV is expected to adhere to the structure of the road, i.e. ,
it should not go out of the lane boundary and should stay close to the center of the lane. The
driving-path cost cpath is the squared distance towards the driving path. The lane boundary cost
clane is the squared violation distance of a safety threshold.
• Headway: SDV should keep a safe longitudinal distance that depends on the speed of the SDV and
the leading vehicle for lane following or lane change behavior. The headway cost is computed as
the violation of the safety distance after applying a comfortable constant deceleration, assuming
that the leading vehicle applies a hard brake.
• Yield: When a pedestrian is predicted reaching the boundary of the SDV lane or crossing it, they
impose a stopping point at a safe longitudinal distance and penalize any trajectory that violates it.
The yield cost cyield penalizes the squared longitudinal violation distance weighted by the
pedestrian prediction probability.

Left: Headway cost penalizes unsafe distance to leading vehicles. Right:
for each sampled trajectory, a weight function determines how relevant
an obstacle is to the SDV in terms of its lateral offset.

• Route: A behavior is desirable if the goal lane is closer to the route than the current lane. So they
penalize the number of lane-changes that is required to converge to the route.
• Cost-to-go: they compute the deceleration needed for slowing-down to possible upcoming speed-
limits and use the square of the violation of the comfortable deceleration as cost-to-go.
• Speed limit, travel distance and dynamics: Using the speed-limit of a lane, which is available in
the map data, they introduce a cost that penalizes a trajectory if it goes above the eligible speed.
The speed limit cost cspeed is the squared violation in speed. In order to favor trajectories that
advance in the route, they use the travelled longitudinal distance as a reward. Since the SDV is
physically limited to certain ranges of acceleration, curvature, etc, they prune trajectories that
violate such constraints. Additionally, they introduce costs that penalize aggressive motions to
promote comfortable driving. Specifically, the dynamics cost cdyn consists of the squared values of
jerk and violation thereof, acceleration and violation thereof, lateral acceleration and violation
thereof, lateral jerk and violation thereof, curvature, twist, and wrench.

• The inference process contains two stages of
optimization.
• In the behavioral planning stage, it adopts a
coarse-level parameterization for trajectory
generation. The resulting trajectory is found
by selecting the one with the lowest cost.
• In the trajectory planning stage, it uses a fine-
level parameterization where they model the
trajectory as a function of vehicle control
variables. The trajectory is initialized with the
output of the behavior planner, and optimized
through a continuous optimization solver.

Example trajectories in a nudging scenario Behavioral decisions include obstacle side
assignment and lane information, which are sent
through the behavioral trajectory interface.

• It uses a combination of max-margin objective
and imitation learning as the loss function
max-margin loss
imitation learning loss
gradient of the learning objective

• Two real-world driving datasets: ManualDrive is a set of human driving recordings where the
drivers are instructed to drive smoothly and carefully respecting all traffic rules; TOR-4D is
composed of very challenging scenarios.

GRIP: Graph-based Interaction-aware Trajectory Prediction
• Nowadays, autonomous driving cars have become commercially available.
• The safety of a self-driving car is still a challenging problem that has not been well studied.
• Prediction of the future trajectories of the surrounding objects, e.g., vehicles, pedestrians,
bicycles, etc., is one of such intelligent algorithms.
• This paper proposes a novel scheme called GRIP which is designed to predict trajectories for
traffic agents around an autonomous car efficiently.
• Considering that, in the autonomous driving application scenario, the motion of an object is
profoundly impacted by the movements of its surrounding objects.
• This is highly similar to people’s behaviors on a social network (one person is usually to be
impacted by his/her friends).
• GRIP uses a graph to represent the interactions of close objects, applies several graph
convolutional blocks to extract features, and subsequently uses an encoder-decoder long short-
term memory (LSTM) model to make predictions.

• This inspires us to represent the inter-object interaction using an undirected graph G = {V, E} as
what researchers have done for a social network.
• The model consists of three components: (1) Input Preprocessing Model, (2) Graph Convolutional
Model, and (3) Trajectory Prediction Model.
• Before feeding the trajectory data of objects into the model, it converts the raw data into a
specific format for subsequent efficient computation.
• It represents the inter-object interaction using an undirected graph G = {V, E} as what researchers
have done for a social network.
• In this graph, each node in node set V corresponds to an object in a traffic scene.
• At each time step t, objects that have interactions should be connected with edges. The edge set
E is composed of two parts: (1) The first part describes the interaction information between two
objects in spatial space at time t. (2) The second part is the inter-frame edges, which represents
the historical information frame by frame in temporal space.
• The adjacency matrix A = {A0, A1}, where A0 is an identity matrix I representing self-connections
in temporal space, and A1 is a spatial connection adjacency matrix.

The architecture of the proposed Scheme.

• The Graph Convolutional Model consists of several convolutional layers as well as graph
operations.
• These convolutional layers are designed to capture useful temporal features, e.g., motion pattern
of one object, and graph operations to handle the inter-object interaction in spatial space.
• One graph operation layer is added to the end of each convolutional layer in this Graph
Convolutional Model to process the input data temporally and spatially alternatively.
• The Trajectory Prediction Model is an LSTM encoder-decoder network that takes the computed
output of the Graph Convolutional Model fgraph as input.
• The output of the graph convolutional model is fed into the encoder LSTM at each time step.
• Then, the hidden feature of the encoder LSTM, and coordinates of objects at the previous time
step, are fed into a decoder LSTM to predict the position coordinates at the current time step.
• Such a decoding process is repeated several times until the model predicts positions for all
expected time steps (tf) in the future.

Blue rectangles are the cars located in the middle which is the car that CS-LSTM (Convolutional social pooling) trys
to predict. Black boxes are surrounding cars. Black-solid lines are the observed history, red-dashed lines are the
ground truth in the future, yellow-dashed lines are the predicted results (5 seconds) of GRIP, and the green-dashed
lines are the predicted results (5 seconds) of CS-LSTM. Region from −90 to 90 feet are observed areas.

Deep Predictive Autonomous Driving Using Multi-
Agent Joint Trajectory Prediction and Traffic Rules
• Autonomous driving is a challenging problem because the autonomous vehicle must understand
complex and dynamic environment.
• This understanding consists of predicting future behavior of nearby vehicles and recognizing
predefined rules.
• It is observed that not all rules have equivalent values, and the priority of the rules may change
depending on the situation or the driver’s driving style.
• This work jointly reason both a future trajectories of vehicles and degree of satisfaction of each
rule in the deep learning framework.
• Joint reasoning allows modeling interactions between vehicles, and leads to better prediction
results.
• A rule is represented as a signal temporal logic (STL) formula, and a robustness slackness, a
margin to the satisfaction of the rule, is predicted for both autonomous and other vehicle,
besides of future trajectories.
• Learned robustness slackness decides which rule should be prioritized for the given situation for
the autonomous vehicle, and filter out non-valid predicted trajectories for surrounding vehicles.
• The predicted information from the deep learning framework is used in model predictive control
(MPC), which allows the autonomous vehicle navigate efficiently and safely.

• Signal temporal logic (STL) is a logical formalism which is able to specify the properties of real-
value, dense-time signals.
• A STL formula is a composition of Boolean and temporal operations on the defined predicates.
• 5 different rules are defined with the corresponding STL formulas ϕ = [ϕ1, ..., ϕ5] , which have
some meaning in autonomous driving situations. The state vector is [xt, yt, θt, vt]T
• 1) Lane keeping (left): ϕ1 = yt ≤ yl,max
• 2) Lane keeping (right): ϕ2 = yt ≥ yl,min
• 3) Collision avoidance (Front vehicle): ϕ3 = (xt ≤ xc,min) ∨ (xt ≥ xc,max) ∨ (yt ≤ yc,min) ∨ (yt ≥ yc,max)
• 4) Speed limit: ϕ4 = vt ≤ vmax
• 5) Slow down before the front vehicle: ϕ5 = (vt ≤ vth)U[ta,tb] (xt ≤ xc,min)
• Also let ϕc be the collision between vehicles beside the front vehicle.
• It denotes the robustness slackness of the STL rules ϕ as r.

Vehicle and feature descriptions in the track driving scenarios with
respect to the ego vehicle (blue). Up to six nearby vehicles are
considered; The front and rear vehicles are considered, which are in
the left, middle and right lanes with respect to the ego vehicle Vego.
They mark the vehicle to be controlled as Vego. The nearby vehicles are
denoted as Vnear = {Vlf , Vlr, Vcf , Vcr, Vrf , Vrr} where {lf, lr, cf, cr, rf, rr}
are used for the surrounding vehicles. A feature representation fego
consists of distances to nearby vehicles Vnear and lane deviate distance
fego = (dlf , dlr, dcf , dcr, drf , drr, ddev).

• The proposed method consists of four modules: encoder module, interaction module, prediction
module and control module.
• Through the deep neural networks including encoder module, interaction module and prediction
module, it jointly reasons about the future trajectories and robustness slackness of STL formulas
ϕ for both the ego vehicle Vego and near vehicles Vnear.
• Since the model makes joint predictions for vehicles Vego, Vnear, it has the capacity to model
interactions between vehicles.
• From predicted robustness slackness and trajectories, the ego vehicle is controlled through MPC
procedure under the rule constraints ϕ.
• Instead of the strict satisfaction of rule restrictions, the designed controller is able to decide
which rules should be prioritized and how satisfied they should be depending on the situation.
• They choose to use long short-term memory units (LSTMs) as the RNN in both encoder and
decoder module.

Overall procedure for the proposed deep predictive control framework.

Simulation results on the US Highway 101 (a,b)
and the intersection 80 (c). The ego vehicle
Vego is marked as blue and the near vehicles
Vnear as red. The first four columns show the
movement of the ego vehicle over time, and
the last column represent the predicted
robustness slackness of the ego vehicle for one
of the first four selected columns (yellow box).
Computed trajectory of the proposed method
is shown in green solid line. Blue lines are
generated trajectory prediction samples of the
near vehicles while red line is ground truth
trajectory of the near vehicles.

NeuroTrajectory: A Neuroevolutionary Approach to Local
State Trajectory Learning for Autonomous Vehicles
• Auto vehicles are controlled today either based on sequences of decoupled perception-planning-
action operations, either based on End2End or Deep Reinforcement Learning (DRL) systems.
• DL solutions for auto driving are subject to several limitations (e.g. they estimate driving actions
through a direct mapping of sensors to actuators, or require complex reward shaping methods).
• Although the cost function used for training can aggregate multiple weighted objectives, the
gradient descent step is computed by the backpropagation algorithm using a single objective loss.
• NeuroTrajectory is a multi-objective neuroevolutionary approach to local state trajectory learning
for auto driving, estimated over a finite prediction horizon by a perception-planning deep NN.
• In comparison to DRL methods, which predict optimal actions for the upcoming sampling time,
they estimate a sequence of optimal states that can be used for motion control.
• They propose an approach which uses genetic algorithms for training a population of deep NN,
where each network individual is evaluated based on a multi-objective fitness vector, with the
purpose of establishing a so-called Pareto front of optimal deep NNs.
• The performance of an individual is given by a fitness vector composed of 3 elements. Each
element describes the vehicle’s travel path, lateral velocity and longitudinal speed, respectively.
• They have benchmarked the system against a baseline Dynamic Window Approach (DWA), as
well as against an End2End supervised learning method.

From a modular pipeline to a perception-planning deep neural network approach for
autonomous vehicles. Green symbolizes learning components. (a) Mapping sensors to
actuators using a traditional pipeline. The output of each module provides input to the
adjoining component. (b) Monolithic deep network for direct mapping of sensory data to
control actions. (c) Perception-Planning deep neural network (this approach).
In comparison to DRL or End2End, which are used to estimate optimal driving actions,
this work focuses on estimating an optimal local state trajectory over a finite prediction
horizon. To implement motion control, the predicted states can be used as input to a
model predictive controller (MPC).

• The training data is paired sequences of Occupancy Grids (OGs) and vehicle trajectory labels.
• An element in a trajectory sequence encodes the vehicle’s position, steering angle and velocity.
• The vehicle is modeled based on the single-track kinematic model of a robot, with position state y
= (px , py) and no-slip assumptions.
• Note: The heading is not taken into consideration for trajectory estimation.
Local state trajectory estimation for autonomous
driving. Given the current position of the ego-vehicle
pego, a desired destination pdest and an input
sequence of occupancy grids X, the goal is to
estimate a driving trajectory Y, where each element in
the output sequence represents the desired position
of the ego-vehicle at that specific moment in time.

• The above problem can be modeled as a Markov Decision Process (MDP): M = (S, A, T, L).
• S represents a finite set of states for an agent at time. To encode the location of the agent in the
driving OG space, each state set denotes an axis-aligned discrete grid sequence.
• A represents a finite set of trajectory sequences, allowing the agent to navigate through the
environment. A trajectory Y is defined as a collection of estimated trajectory state set-points,
which the agent should follow in the future time.
• T : S×A×S → [0,1] is a stochastic transition function, which describes the probability of arriving,
after performing a motion along trajectory Y.
• L : S × A × S → R3 is a multi-objective fitness vector function which quantifies the trajectory quality
of the ego-vehicle.
• It learns an optimal state trajectory by combining Convolutional Neural Networks (CNN) with the
robust temporal predictions of Long Short-Term Memory (LSTM) networks.

Deep neural network architecture for estimating local driving trajectories.
An observation x is firstly processed by a CNN, implemented as a series of convolutional layers, aiming to
extract relevant spatial features from the input data. The CNN outputs a feature-space representation for each
observation in X. Each processed spatial observation in the input interval is flatten and passed through two
fully connected layers of 1024 and 512 units, respectively. The input sequence into an LSTM block is
represented by a sequence of spatially processed observations. The same network topology can be trained
separately on synthetic, as well as on real-world data.

• For computing the state trajectory of the ego-vehicle, the OG sequences are processed by a set of
convolutional layers, before being feed to different LSTM network branches.
• Each LSTM branch is responsible for estimating trajectory setpoints along time interval.
• The choice for a stack of LSTM branches over a single LSTM network that would predict all future
state set-points comes from the experiments with different network architectures.
• A single LSTM acts as a many-to-many, or sequence-to-sequence, mapping function, where the
input sequence is used to generate a time-dependent output sequence.
• In the case of LSTM branches, the original sequence-to-sequence problem is divided into a stack
of many-to-one subproblems, each LSTM branch providing a many-to-one solution, thus
simplifying the search space.
• Single LSTMs behave well in natural language processing (NLP) applications, where the input and
output domains are represented by text sequences, whereas the input-output to this proposed
NeuroTrajectory method is represented by sequences of occupancy grids and state set-points,
respectively.

Neuroevolutionary training for learning local
ego-vehicle state trajectories. The training
data is used to evolve a population of deep
neural networks Φ = [ϕ1(Θ1),...,ϕK(ΘK)] by
learning their weights [Θ1,...,ΘK] using
genetic algorithms. The training aims to
optimize the multi-objective fitness vector L
= [l1,...,lW ], in the multidimensional objective
space L, where each coordinate axis
represents a fitness value. In this illustration,
L is composed of two fitness values, l1 and l2.
The best performing networks ϕ∗ (Θ∗) lie on
the so-called Pareto front in objective space.

Mapping of solution vectors Θ from
the decision space S to objective
space L. Each solution Θ in decision
space corresponds to a coordinate in
objective space. The red marked
coordinates are the set of Pareto
optimal solutions Θ∗ for a multi-
objective minimization problem,
located on the Pareto front drawn
with thick black line.
The solutions form the so-called feasible decision variable space S, or simply decision space. A core
difference between single and multi-objective optimization is that, in the latter case, the objective
functions make up a W-dimensional space entitled objective space L. For each solution Θ in decision
variable space, there exists a coordinate in objective space. In Pareto optimization, there exists a set of
optimal solutions Θ∗, none of them usually minimizing, or maximizing, all objective functions
simultaneously. Optimal solutions are called Pareto optimal, meaning that they cannot be improved in
any of the objectives without degrading at least one objective. The set of Pareto optimal solutions is
entitled Pareto boundary, or Pareto front.

• Tested the NeuroTrajectory algorithm in two different environments: I) in the GridSim simulator
and II) with the full scale autonomous driving test car.
• GridSim is an autonomous driving simulation engine that uses kinematic models to generate synthetic
occupancy grids from simulated sensors.
• Using the test car, they do experiments on sequences of occupancy grids acquired on 50km of highway
and 50km of inner-city driving.
• The Dynamic Window Approach (DWA) is an online collision avoidance strategy for mobile
robots, which uses robot dynamics and constraints imposed on the robot’s velocities and
accelerations to calculate a collision free path in the 2D plane.
• In the case of End2End learning, it mapped sequences of OGs directly to the discrete driving
commands of the vehicle, as commonly encountered in deep learning based autonomous driving.
• Its network topology is based on the same deep network architecture, having the same configuration
for the number of layers and processing units.
• The difference is that the final layer of an LSTM branch is configured to output discrete steering
commands (turn left, turn right, accelerate, decelerate), instead of a continuous state trajectory.

Generic Prediction Architecture Considering both
Rational and Irrational Driving Behaviors
• The prediction module is expected to generate reasonable results in the presence of unseen and
corner scenarios.
• Two prediction models are typically used: learning-based model and planning-based model.
• Learning-based model utilizes real driving data to model the human behaviors. Depending on the
structure of the data, learning-based models can predict both rational and irrational behaviors.
• Planning-based model, on the other hand, usually assumes human as a rational agent, i.e., it
anticipates only rational behavior of human drivers.
• This paper proposed a generic prediction architecture to address various rationalities in human
behavior.
• It leverages the advantages from both learning-based and planning-based prediction models.
• It is able to predict continuous trajectories that reflect possible future situations of other drivers.
• Moreover, the prediction performance remains stable under various unseen driving scenarios.

• In this problem, it considers the interactive behavior between two vehicles: the ego autonomous
vehicle and the predicted human-driven vehicle.
• They aim at predicting the behavior of the selected vehicle while considering the potential
influence of its future behavior from its own vehicle.
• It uses the conditional probability density function (PDF) to represent the correlated future
trajectories of two interactive vehicles.
• Instead of Cartesian coordinate, they utilized the Frenet Frame to represent vehicle ´ state.
• The vehicle motion in the Frenet´ Frame can be represented with the longitudinal position along
the path s(t), and lateral deviation to the path d(t).
• Therefore the vehicle state at time step t can be defined as xt = (s(t), d(t)).
• Note that the reference path of a vehicle will change according to the road it is current driving on,
then it defines the origin as the cross point of their reference path.

• The proposed generic prediction framework contains the following six steps:
• 1) Sample Joint Trajectories: Given the observed historical trajectories, the future joint
trajectories can be sampled from the predicted joint distribution by a learning-based method.
• 2) Convert to Conditional Distribution: Since they are interested in predicting the trajectories of
the predicted vehicle, they convert the predicted joint distribution into a conditional distribution.
• 3) Optimal Trajectory Generation: It generates the most probable trajectory by solving a finite
horizon MPC problem using the learned continuous cost function via a planning-based method.
• 4) Weight Ratio Update: The next step is to determine how many optimal trajectory pairs should
they add to the trajectory set so that the resampling result are reasonable.
• 5) Distribution Reweighting: They are able to re-evaluate each trajectory’s probabilities using the
learned cost function by the principle of maximum entropy.
• 6) Resampling: Finally, they resample trajectories from the sample set according to the updated
conditional distribution obtained from the previous step.

This plot shows the process of proposed architecture. Note that trajectories directly sampled from the
learning-based method can have three different degrees of rationality: (a) fully rational; (b) partially rational;
(c) fully irrational. Here, it demonstrates intuitive illustration of the reweighting process in each of these cases,
where the original probability density function (PDF) of all sampled trajectories is shown in blue, the PDF after
the reweighting process is shown in red, and the shaded area contains infeasible sample points.

The learning-based trajectory prediction method it
applies is called the conditional variational
autoencoder (CVAE). The goal is to model the
underlying probability distribution of the data using
a factored, low-dimensional representation. In this
problem, its objective is to estimate the probability
distribution of joint trajectories by utilizing the
encoder-decoder structure of CVAE.
The planning-based trajectory prediction stems from
Theory of Mind which describes the prediction
process of human. It let the ego vehicle simulate
what the other vehicle will do assuming that it is
approximately optimal planners with respect to
some reward or cost functions. It adopts the
continuous domain maximum entropy inverse
reinforcement learning (IRL).

• They conduct experiments on a roundabout scenario included in the INTERACTION dataset.
• It is a 8-way roundabout and each of the branch has one entry lane and one exit lane.
Selected artificial test scenarios and the corresponding results. The pink dot represents
the cross point of two vehicles ground-truth reference paths. The number on the top
right corner denotes the percentage rate of rational or irrational behaviors.

Driving Behavior for ADAS and Autonomous Driving IX

Driving Behavior for ADAS and Autonomous Driving IX

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Driving Behavior for ADAS and Autonomous Driving IX

Similar to Driving Behavior for ADAS and Autonomous Driving IX (20)

More from Yu Huang

More from Yu Huang (20)

Recently uploaded

Recently uploaded (20)

Driving Behavior for ADAS and Autonomous Driving IX