Driving Behavior for ADAS and Autonomous Driving III

Driving Behavior for ADAS
and Autonomous Driving III
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California

Outline
• Learning and Generalizing Motion Primitives from Driving Data for Path-Tracking Applications;
• A Tempt to Unify Heterogeneous Driving Databases using Traffic Primitives;
• Predictions of short-term driving intention using RNN on sequential data;
• Driving Policy Transfer via Modularity and Abstraction;
• Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning;
• Identifying Driver Behaviors using Trajectory Features for Vehicle Navigation;
• Courteous Autonomous Cars.

Learning and Generalizing Motion Primitives from
Driving Data for Path-Tracking Applications
• Considering the driving habits which are learned from the naturalistic driving data in the path-
tracking system can significantly improve the acceptance of intelligent vehicles.
• To generate the prediction results of lateral commands with confidence regions according to
the reference based on the learned motion primitives.
• A two-level structure for learning and generalizing motion primitives through demonstrations.
• The lower-level motion primitives are generated under the path segmentation and clustering
layer in the upper-level.
• The Gaussian Mixture Model (GMM) is utilized to represent the primitives and Gaussian
Mixture Regression (GMR) is selected to generalize the motion primitives.
• The model is trained and validated by using the driving data collected from the Beijing
Institute of Technology (BIT) intelligent vehicle platform.

a) 81 hours of driving data are collected by the platform. (b) Sampled data of one typical demonstration.

(c) The path types are identified by using segmentation
and clustering method in the upper-level.
(d) The motion primitives take into account the path and
operation commands of previous, current and future.

• In the upper-level framework, the path primitives are defined by the cluster labels.
• The path primitives can be segmented and represented by a set of features, whereby the
features are computed from the raw data of path point sequences.
• Finding an appropriate feature set to segment and represent the path primitive is essential
for the clustering algorithm.
• One suitable way to solve this problem is to use the zero crossing course deviation as an
intuitive criterion to obtain the segmentation of paths, and apply GMM to cluster the
selected features of the segmented path.
• The switch between different path primitives is considered to be discrete, only one path
primitive is active at a time.
• The motion primitives are learned and generalized based on the chosen path primitive in
the lower- level framework.
• Three kinds of features are selected to train the motion primitive models: the previous,
current and future state of path point sequences and steering wheel angle sequences.
• The motion primitives are trained by using GMM and generalized by applying GMR.

A Tempt to Unify Heterogeneous Driving
Databases using Traffic Primitives
• A data unification framework based on traffic primitives with ability to automatically unify
and label heterogeneous traffic data.
• This is achieved by two steps:
• 1) Carefully arrange raw multidimensional time series driving data into a relational database;
• 2) Automatically extract labeled and indexed traffic primitives from traffic data through a Bayesian
nonparametric learning method.
Architecture of auto-recognizing
heterogeneous naturalistic driving datasets.

• In order to provide well-labeled multi-dimensional traffic data, traffic primitive that represents
principal compositions of driving scenarios should be recognized and extracted.
• Powerful unsupervised learning-based technologies have been developed to achieve this.
• The entire driving process can be treated as a logic combination of primitives and hence the
dynamic process among primitives can be treated as a probabilistic process.
• The traffic were modeled as a dynamic process of primitives based on HMM, which consists of
two layers: hidden layer and observation layer.
• The hidden layer represents traffic primitives and observation layer represents the collected data.
• If the priors for observations and transition distributions are learned correctly, the full-conditional
posteriors can be learned correctly and the full-conditional posteriors can be computed using
Gibbs sampling method.
• To parse a long-term multi-dimensional time-series data into finite primitives and
simultaneously cluster them without requiring the prior knowledge of traffic primitives.

(a) Database entity relation (b) Modified entity relation diagram diagram Scenario-based database entity relation (ER) diagram.

The MKZ platform with
the equipped sensors
and devices.

Result of the nonparametric
Bayesian learning cluster.

Predictions of short-term driving intention using
recurrent neural network on sequential data
• Predictions of driver’s intentions and their behaviors using the road is of great importance
for planning and decision making processes of autonomous driving vehicles.
• In particular, relatively short-term driving intentions are the fundamental units that
constitute more sophisticated driving goals, behaviors, such as overtaking the slow vehicle in
front, exit or merge onto a high way, etc.
• While it is not uncommon that most of the time human driver can rationalize, in advance,
various on-road behaviors, intentions, as well as the associated risks, aggressiveness,
reciprocity characteristics, etc., such reasoning skills can be challenging and difficult for an
autonomous driving system to learn.
• Here is a disciplined methodology that can be used to build and train a predictive drive
system, which includes various components such as traffic data, traffic scene generator,
simulation and experimentation platform, supervised learning framework for sequential data
using RNN approach, validation of the modeling using both quantitative and qualitative
methods etc., therefore to learn the on-road characteristics aforementioned.

• The simulation environment, in which to parameterize and configure relatively challenging
traffic scenes, customize different vehicle physics and controls for various types of vehicles,
test and utilize HD map of the road model in algorithms, generate sensor data out of LiDAR,
cameras for training DNNs, is crucial for driving intention, behavior, collision risk modeling
since collecting statistically significant amount of such data as well as experimentation
processes in the real world can be extremely time and resource consuming.
• Experiment with the different recurrent structures, such as LSTM, GRU, to predict the
sequence intentions, where, instead of projecting the cell hidden states to future for
sequence generation, re-formalize the use as a classification problem;
• Subtleties of this classification problem reside in taking into account the temporal
correlations within the sequence, i.e., classifying the intention labels at each time step of a
sequence is necessary for a better capture of the intention transitions.
• Implement time dependent classifications by aggregating and comparing the output state
vector of the recurrent cell in RNN with the training labels at each time step of a sequence.

Basic maneuvers such as lane keep, car following,
stopping, etc. in the high way scene. Left: the
rendering of the simulated scene, Right: the
“skeleton” version of the scene.
Scene of urban driving both in regular rendering and “skeleton”
mode with traffic lights, HD map information provided.
A 3D simulator, “maya”, using Unity3D development
suite.

• The other main purpose of “maya” is to generate large volume of training data for deep
learning, reinforcement learning algorithms.
• Maya can generate basic format of autonomous driving vehicle data including LIDAR point
cloud, camera images, as well as HD map data of road, lane geometries, traffic light status,
etc., and send them out either as offline training dataset or online processing or training for
systems such as reinforcement learning.
• The so called “ego vehicle” is the car with camouflage color, equipped with a simulated
LIDAR device, and 3 cameras: two on the side, one on the dash board facing front.
• The LIDAR point cloud consist of a number (10,000 ) of measurements of 3D positions which
are generated using the “ray cast” method in Unity3D, each point is a measurement of
collision between a ray tracing from the center of the simulated LIDAR device to the
surrounding objects such as road, trees, posts, opponent vehicles.
• Image data is in format of RGB intensities with parametrizable number of pixels.
• HD map data includes features on the road surface such as road segment polygons, lane
marker polygons, road boundaries, road divider positions, and the traffic light status.

• The simulator “maya” provides perceived info. such as the 3D Bbox coord.s in both LIDAR
point cloud and image, velocity, tire angle, control signals including throttle, brake,
steering wheel, hand brake, left and right signal lights of each vehicle.
• All of these data are available through either inter-process communication (IPC) using
shared memory mechanism, or TCP/IP socket communication.
• The simulation environment can be visualized in either normal graphics rendering mode
or the so called “skeleton” mode, which is dedicated for debugging the path planner,
prediction algorithms.
• In skeleton mode, only the skeletons of the vehicles, HD map info., traffic lights are
displayed in a light-weight manner.
• The path planning routine is used to generate the “ground truth” training labels for
intention predictions.

Use Unity3D provided car physics, wheel physics,
and a slip-based tire friction model to instantiate
vehicles, where characteristics such as engine
torque, brake torque, wheel suspensions, slip,
collisions, etc. are all taken into account and
parametrizable to certain extent.
In order to generate training labels for short-term road
intentions, implement a basic path planning system which
can perform simple collision avoidance, generate paths for
short-term intentions such as lane change, decelerate,
accelerate, car follow, and lane keep.
The path for intentions such as lane-keep, car follow,
accelerate and decelerate are generated using the car
location, longitudinal velocity, accelerations, deceleration,
and HD map info. of the road segment polygons, lane
marker polygons, etc., whereas for lane-change intention
vehicle’s lateral velocity, acceleration, deceleration are
also considered.

Examples of intention of changing to the
left lane as well as planned path in the high
way scene as the blue dots and arrow. Left:
the rendering of the simulated scene. Right:
the “skeleton” version of the scene.

Classifying the intention labels at each time
step of a sequence is necessary for a better
capture of the intention transitions. A
standard RNN, denoted as seer net, that
takes measurements of sequence data as
input vectors at each time step, and outputs a
sequence of classes of intentions as
predictions. The input vectors are embedded
using fully connected layers with a ReLU
activation, thereafter fed into the RNN cells
of either LSTM or GRU cells.
The architecture of the seer net where the baseline
configuration uses 12 frames to construct a sequence.

Demonstration of the prediction results

Demonstration of the prediction of potential risks

Some of the configuration parameters
for maya simulation: the simulator can
be run either in graphics mode for
visualization, or in batch mode for data
collection purposes.

Driving Policy Transfer via Modularity and Abstraction
• Training driving policies in simulation need transferring such policies to the real world.
• To transfer driving policies from simulation to reality via modularity and abstraction.
• This approach is inspired by classic driving systems and aims to combine the benefits of
modular architectures and end-to-end deep learning approaches.
• The key idea is to encapsulate the driving policy such that it is not directly exposed to raw
perceptual input or low-level vehicle dynamics.
• The architecture is organized into 3 stages: perception, driving policy, and low-level control.
• Perception maps raw sensor readings to a per-pixel semantic segmentation of the scene.
• The driving policy maps from the semantic segmentation to a local trajectory plan, specified by
waypoints that the car should drive through;
• A low-level motion controller actuates the vehicle towards the waypoints.
• Both the perception system and the driving policy are learned, while the low-level controller
can be either learned or hand-designed.
• The driving policy is trained on the output of the actual perception system, as opposed to
perfect ground-truth segmentation.

• The combination of learning and modularization brings several benefits.
• It enables direct transfer of the driving policy from simulation to reality. This is made possible by
abstracting both the appearance of the environment (handled by the perception system) and the
vehicle dynamics (handled by the low-level controller).
• The driving policy is still learned from data, and can adapt to the complex noise characteristics of
the perception system, which are not captured well by analytical uncertainty models.
• The interfaces between the modules (semantic map, waypoints) are easy to analyze and interpret,
which can help training and maintenance.
The autonomous driving system comprises three modules: a perception module
implemented by an encoder-decoder network, a command-conditional driving policy
implemented by a branched convolutional network, and a low-level PID controller.

Waypoints are encoded by the distance to the vehicle
and the relative angle to the vehicle’s heading.
At every frame, predict two waypoints. One would be
sufficient to control steering, but the second can be useful
for longer-term maneuvers, such as controlling the
throttle ahead of a turn. The waypoints wj are encoded by
the distance rj and the (oriented) angle φj with respect to
the heading direction v of the car:
Train the driving policy in simulation using conditional
imitation learning (CIL) – a variant of imitation learning
that enables the driving policy to be conditioned on high-
level commands, such as turning left or right at an
upcoming intersection.
Given the training dataset, a function approximator f with learnable parameters θ is trained to predict actions
from observations and commands:

Simulation environment. Maps of the two towns, along with example images that show the towns in
two conditions: clear daytime (Weather 1) and cloudy daytime after rain (Weather 2). Use Town
1/Weather 1 during training. The other three combinations (Town 1/Weather 2, Town 2/Weather 1,
and Town 2/Weather 2) are used to evaluate generalization in simulation.

Some of the environments the truck was tested in. Note the variation in scene structure, weather, and lighting.
Hardware setup for the robotic vehicle.

Toward Driving Scene Understanding: A Dataset for
Learning Driver Behavior and Causal Reasoning
• Honda Research Institute Driving Dataset (HDD), a challenging dataset to enable research on
learning driver behavior in real-life environments.
• https://usa.honda-ri.com/HDD;
• The dataset includes 104 hours of real human driving in the San Francisco Bay Area collected
using an instrumented vehicle equipped with different sensors.
• A annotation methodology is introduced to enable research on driver behavior understanding
from untrimmed data sequences.
• A set of baseline algorithms for driver behavior detection are trained and tested.
• Detecting unevenly (but naturally) distributed driver behaviors in untrimmed videos is a
challenging research problem.
• Interactions between drivers and traffic participants can be explored from cause and effect labels.
• A multi-task learning framework for learning driving control can be explored.
• With the predefined driver behavior labels, the annotations can be used as an auxiliary task (i.e.,
classification of behavior labels) to improve the prediction of future driver actions.
• A multimodal fusion for driver behavior detection can be studied.

An example illustrating different driver
behaviors in traffic scenes. The yellow
trajectory indicates GPS positions of
the instrumented vehicle. The driver
performs actions and reasons about
the scenes. To understand driver
behavior, we define a 4- layer
annotation scheme: Goal-oriented
action, Stimulus-driven action, Cause
and Attention. In Cause and Attention,
use bounding boxes to indicate when
the traffic participant causes a stop or
is attended by the driver.

(i) 3 x Point Grey Grasshopper 3 video camera,
resolution: 1920x1200 pixels, frame rate: 30Hz,
field of view (FOV): 80 degreesx 1 (center) and 90
degrees x 2 (left and right).
(ii) 1 x Velodyne HDL-64E S2 3D LiDAR sensor,
spin rate: 10Hz, number of laser channel: 64,
range: 100 m, horizontal FOV: 360 degrees,
vertical FOV: 26.9 degrees.
(iii) 1 x GeneSys Eletronik GmbH Automotive
Dynamic Motion Analyzer with DGPS outputs
gyros, accelerometers and GPS signals at 120 Hz.
(iv) a Vehicle Controller Area Network (CAN) that
provides various signals from around the vehicle.
The throttle angle, brake pressure, steering angle,
yaw rate and speed at 100 Hz.

All sensors on the vehicle were logged using a
PC running Ubuntu Linux 14.04 with two 8-core
Intel i5- 6600K 3.5 GHz Quad-Core processors,
16 GB DDR3 memory, and a RAID 0 array of 4
2TB SSDs, for a total capacity of 8 TB.
The sensor data are synchronized and time
stamped using ROS2 and a customized HW and
SW designed for multimodal data analysis.

Identifying Driver Behaviors using Trajectory
Features for Vehicle Navigation
• An approach to automatically identify driver behaviors from vehicle trajectories and use them
for safe navigation of autonomous vehicles.
• Trajectory to Driver Behavior Mapping (TDBM);
• TDBM enables a navigation algorithm to automatically classify the driving behavior of other vehicles.
• A set of proposed features can be easily extracted from car trajectories.
• Data-driven mapping btw these features and 6 driver behaviors using a web-based user study;
• Factor analysis on the 6 behaviors, derived from two commonly studied behaviors:
• Aggressiveness and Carefulness;
• There exists a latent variable, summarized score, indicating a level of awareness while driving
next to other vehicles.
• The navigation algorithm identifies potentially dangerous drivers in realtime and chooses a
path that avoids potentially dangerous drivers, accord. to the neighboring drivers’ behavior.

6 Driving Behavior metrics and 4 Attention metrics used in TDBM
Lasso analysis performs regularization and feature selection by eliminating weak subsets of features.
The objective function for Lasso analysis conducted on bi is

navigation decision
Videos in the user study to rate the 6 driving behavior metrics
and 4 attention metrics of the target car colored in red.

Courteous Autonomous Cars
• Autonomous cars optimize for a combination of safety, efficiency, and driving quality.
• As we get better at this way, we start seeing behavior go from conservative to aggressive.
• The car’s behavior exposes the incentives in its cost function.
• Cars do not optimize a purely selfish cost, but also try to be courteous to other drivers.
• Formalize courtesy as a term in the objective that measures the increase in another driver’s
cost induced by the autonomous car’s behavior.
• Such a courtesy term enables the robot car to be aware of possible irrationality of the
human behavior, and plan accordingly.
• Courteous robot cars leave more space when merging in front of a human driver.
• Such a courtesy term can help explain real human driver behavior on the NGSIM dataset.

The task is to enable a courteous robot car
which cares about the potential inconvenience
to the human driver’s utilities, and generates
trajectories socially predictable and acceptable.

• A courteous planning strategy based on one key
observation: human is not perfectly rational, and
one of the irrationality is that they weight losses
higher than gains when evaluating their actions.
• A courteous robot car should balance the
minimization of its own cost function and the
inconvenience (loss) it brings to the human.
We define the courtesy term based on the difference
between what cost the human has, and what cost
they would have had in the alternative:
Human Data Collection: The human data is collected
from the Next Generation SIMulation (NGSIM)
dataset, which captures the highway driving
behaviors/trajectories by digital video cameras
mounted on top of surrounding buildings.

Inverse Reinforcement Learning (IRL) to learn an
appropriate cost function from human data.

An example pair of simulated trajectories with
courteous (top) and selfish (bottom) cost functions.
A blocking-area overtaking scenario: (a) a selfish
cost function; (b)(c) a courtesy-aware robot car.

Driving Behavior for ADAS and Autonomous Driving III

Driving Behavior for ADAS and Autonomous Driving III

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Driving Behavior for ADAS and Autonomous Driving III

Similar to Driving Behavior for ADAS and Autonomous Driving III (20)

More from Yu Huang

More from Yu Huang (20)

Recently uploaded

Recently uploaded (20)

Driving Behavior for ADAS and Autonomous Driving III