camera-based Lane detection by deep learningYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Prediction and planning for self driving at waymoYu Huang
ChauffeurNet: Learning To Drive By Imitating The Best Synthesizing The Worst
Multipath: Multiple Probabilistic Anchor Trajectory Hypotheses For Behavior Prediction
VectorNet: Encoding HD Maps And Agent Dynamics From Vectorized Representation
TNT: Target-driven Trajectory Prediction
Large Scale Interactive Motion Forecasting For Autonomous Driving : The Waymo Open Motion Dataset
Identifying Driver Interactions Via Conditional Behavior Prediction
Peeking Into The Future: Predicting Future Person Activities And Locations In Videos
STINet: Spatio-temporal-interactive Network For Pedestrian Detection And Trajectory Prediction
camera-based Lane detection by deep learningYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Prediction and planning for self driving at waymoYu Huang
ChauffeurNet: Learning To Drive By Imitating The Best Synthesizing The Worst
Multipath: Multiple Probabilistic Anchor Trajectory Hypotheses For Behavior Prediction
VectorNet: Encoding HD Maps And Agent Dynamics From Vectorized Representation
TNT: Target-driven Trajectory Prediction
Large Scale Interactive Motion Forecasting For Autonomous Driving : The Waymo Open Motion Dataset
Identifying Driver Interactions Via Conditional Behavior Prediction
Peeking Into The Future: Predicting Future Person Activities And Locations In Videos
STINet: Spatio-temporal-interactive Network For Pedestrian Detection And Trajectory Prediction
Simulation for autonomous driving at uber atgYu Huang
Testing Safety of SDVs by Simulating Perception and Prediction
LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World
Recovering and Simulating Pedestrians in the Wild
S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling
SceneGen: Learning to Generate Realistic Traffic Scenes
TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors
GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving
AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles
Appendix: (Waymo)
SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving
An Assessment of Image Matching Algorithms in Depth EstimationCSCJournals
Computer vision is often used with mobile robot for feature tracking, landmark sensing, and obstacle detection. Almost all high-end robotics systems are now equipped with pairs of cameras arranged to provide depth perception. In stereo vision application, the disparity between the stereo images allows depth estimation within a scene. Detecting conjugate pair in stereo images is a challenging problem known as the correspondence problem. The goal of this research is to assess the performance of SIFT, MSER, and SURF, the well known matching algorithms, in solving the correspondence problem and then in estimating the depth within the scene. The results of each algorithm are evaluated and presented. The conclusion and recommendations for future works, lead towards the improvement of these powerful algorithms to achieve a higher level of efficiency within the scope of their performance.
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
Road-line detection and 3D reconstruction using fisheye cameras
• Vehicle Re-ID for Surround-view Camera System
• SynDistNet: Self-Supervised Monocular Fisheye Camera Distance
Estimation Synergized with Semantic Segmentation for Autonomous
Driving
• Universal Semantic Segmentation for Fisheye Urban Driving Images
• UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a
Generic Framework for Handling Common Camera Distortion Models
• OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving
• Adversarial Attacks on Multi-task Visual Perception for Autonomous Driving
Deconstructing SfM-Net architecture and beyond
"SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations. Given a sequence of frames, SfM-Net predicts depth, segmentation, camera and rigid object motions, converts those into a dense frame-to-frame motion field (optical flow), differentiably warps frames in time to match pixels and back-propagates."
Alternative download:
https://www.dropbox.com/s/aezl7ro8sy2xq7j/sfm_net_v2.pdf?dl=0
2019年6月13日、SSII2019 Organized Session: Multimodal 4D sensing。エンドユーザー向け SLAM 技術の現在。登壇者:武笠 知幸(Research Scientist, Rakuten Institute of Technology)
https://confit.atlas.jp/guide/event/ssii2019/static/organized#OS2
Similar to Driving Behavior for ADAS and Autonomous Driving VII (20)
Application of Foundation Model for Autonomous DrivingYu Huang
Since DARPA’s Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Recently powered by large language models (LLMs), chat systems, such as chatGPT and PaLM, emerge and rapidly become a promising direction to achieve artificial general intelligence (AGI) in natural language processing (NLP). There comes a natural thinking that we could employ these abilities to reformulate autonomous driving. By combining LLM with foundation models, it is possible to utilize the human knowledge, commonsense and reasoning to rebuild autonomous driving systems from the current long-tailed AI dilemma. In this paper, we investigate the techniques of foundation models and LLMs applied for autonomous driving, categorized as simulation, world model, data annotation and planning or E2E solutions etc.
Fisheye based Perception for Autonomous Driving VIYu Huang
Disentangling and Vectorization: A 3D Visual Perception Approach for Autonomous Driving Based on Surround-View Fisheye Cameras
SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround View Fisheye Cameras
FisheyeDistanceNet++: Self-Supervised Fisheye Distance Estimation with Self-Attention, Robust Loss Function and Camera View Generalization
An Online Learning System for Wireless Charging Alignment using Surround-view Fisheye Cameras
RoadEdgeNet: Road Edge Detection System Using Surround View Camera Images
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
FisheyeMultiNet: Real-time Multi-task Learning Architecture for
Surround-view Automated Parking System
• Generalized Object Detection on Fisheye Cameras for Autonomous
Driving: Dataset, Representations and Baseline
• SynWoodScape: Synthetic Surround-view Fisheye Camera Dataset for
Autonomous Driving
• Feasible Self-Calibration of Larger Field-of-View (FOV) Camera Sensors
for the ADAS
Autonomous driving for robotaxi, like perception, prediction, planning, decision making and control etc. As well as simulation, visualization and data closed loop etc.
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
Canadian Adverse Driving Conditions Dataset, 2020, 2
Deep multimodal sensor fusion in unseen adverse weather, 2020, 8
RADIATE: A Radar Dataset for Automotive Perception in Bad Weather, 2021, 4
Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection, 2021, 7
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather, 2021, 8
DSOR: A Scalable Statistical Filter for Removing Falling Snow from LiDAR Point Clouds in Severe Winter Weather, 2021, 9
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
Formal Scenario-Based Testing of Autonomous Vehicles: From Simulation to the Real World, 2020
A Scenario-Based Development Framework for Autonomous Driving, 2020
A Customizable Dynamic Scenario Modeling and Data Generation Platform for Autonomous Driving, 2020
Large Scale Autonomous Driving Scenarios Clustering with Self-supervised Feature Extraction, 2021
Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles, 2021
Systems Approach to Creating Test Scenarios for Automated Driving Systems, Reliability Engineering and System Safety (215), 2021
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
Introduction;
data driven models for autonomous driving;
cloud computing infrastructure and big data processing;
annotation tools for training data;
large scale model training platform;
model testing and verification;
related machine learning techniques;
Conclusion.
RegNet: Multimodal Sensor Registration Using Deep Neural Networks
CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks
RGGNet: Tolerance Aware LiDAR-Camera Online Calibration with Geometric Deep Learning and Generative Model
CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional Neural Network and Geometric Constraints
LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network
CFNet: LiDAR-Camera Registration Using Calibration Flow Network
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
Driving Behavior for ADAS and Autonomous Driving VII
1. Driving Behavior for ADAS
and Autonomous Driving VII
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
2. Outline
• DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents
• INFER: INtermediate representations for FuturE pRediction
• Deep Imitative Models for Flexible Inference, Planning, and Control
• Multi-Agent Tensor Fusion for Contextual Trajectory Prediction
• AGen: Adaptable Generative Prediction Networks for Autonomous Driving
• Conditional Generative Neural System for Probabilistic Trajectory Prediction
• Coordination and Trajectory Prediction for Vehicle Interactions via Bayesian
Generative Modeling
• Interaction-aware Multi-agent Tracking and Probabilistic Behavior Prediction
via Adversarial Learning
3. DESIRE: Distant Future Prediction in
Dynamic Scenes with Interacting Agents
• This is a Deep Stochastic IOC (Inverse Optimal Control) RNN Encoder- decoder framework,
DESIRE, for the task of future predictions of interacting agents in dynamic scenes.
• DESIRE predicts future locations of objects in multiple scenes by 1) accounting for the multi-
modal nature of prediction (i.e., given the same context, future may vary), 2) foreseeing the
future outcomes and make a strategic prediction, and 3) reasoning not only from the past
motion history, but also from the scene context as well as the interactions among the agents.
• DESIRE achieves these computationally efficient in a single E2E trainable NN model.
• The model first obtains a diverse set of hypothetical future prediction samples employing a
conditional variational auto-encoder (CVAE), ranked and refined by the following RNN
scoring-regression module.
• Samples are scored by accounting for accumulated future rewards, which enables better
long-term strategic decisions similar to IOC frameworks.
• An RNN scene context fusion module jointly captures past motion histories, the semantic
scene context and interactions among multiple agents.
• A feedback mechanism iterates over ranking and refinement to boost prediction accuracy.
CVPR2017
4. DESIRE: Distant Future Prediction in
Dynamic Scenes with Interacting Agents
(a) A driving scenario: The white van may steer into left or right while trying to avoid a collision to other
dynamic agents. DESIRE produces accurate future predictions (shown as blue paths) by tackling multi-
modality of future prediction while accounting for a rich set of both static and dynamic scene contexts. (b)
DESIRE generates a diverse set of hypothetical prediction samples, and then ranks and refines them
through a deep IOC network.
5. DESIRE: Distant Future Prediction in
Dynamic Scenes with Interacting Agents
• Sample Generation Module
• Future prediction can be inherently ambiguous and has uncertainties as multiple plausible
scenarios can be explained under the same past situation (e.g., a vehicle heading toward an
intersection can make different turns);
• Thus, learning a deterministic function f that directly maps past trajectories to future trajectories
will under-represent potential prediction space and easily over-fit to training data.
• Moreover, a naively trained network with a simple loss will produce predictions that average out
all possible outcomes.
• This sample generation module produces a set of diverse hypotheses critical to capturing the
multimodality of the pre- diction task, through a effective combination of CVAE and RNN
encoder-decoder.
• RNNs are implemented using gated recurrent units (GRU) to learn long-term dependencies, yet
they can be easily replaced with other popular RNNs like long short-term memory units (LSTM).
• The CVAE module generates diverse set of future trajectories based on a past trajectory.
• Two loss terms: reconstruction loss and KLD loss.
6. DESIRE: Distant Future Prediction in
Dynamic Scenes with Interacting Agents
The overview of DESIRE. First, DESIRE generates multiple plausible prediction samples Yˆ via a CVAE-based
RNN encoder-decoder (Sample Generation Module). Then the following module assigns a reward to the
prediction samples at each time-step sequentially as IOC frameworks and learns displacements vector ∆Yˆ to
regress the prediction hypotheses (Ranking and Refinement Module). The regressed prediction samples are
refined by iterative feedback. The final prediction is the sample with the maximum accumulated future
reward. Note that the flow via aquamarine-colored paths is only available during the training phase.
7. DESIRE: Distant Future Prediction in
Dynamic Scenes with Interacting Agents
• Ranking and Refinement Module
• Predicting a distant future can be far more challenging than predicting one close by.
• To tackle this, adopt the concept of decision-making process in reinforcement learning (RL) where
an agent is trained to choose its actions that maximizes long-term rewards to achieve its goal.
• Instead of designing manually, however, IOC learns an unknown reward function.
• It designs an RNN model that assigns rewards to each prediction hypothesis and measures their
goodness based on the accumulated long-term rewards.
• Thereafter, also directly refine prediction hypotheses by learning displacements to the actual
prediction through another FC layer.
• Lastly, the module receives iterative feedbacks from regressed predictions and keeps adjusting so
that it produces precise predictions at the end.
• There are two loss terms in training the IOC ranking and refinement module:
• Cross-entropy loss;
• Regression loss.
• The total loss:
8. DESIRE: Distant Future Prediction in
Dynamic Scenes with Interacting Agents
• Scene Context Fusion
• It is important that the RNN must contain the
information about 1) individual past motion
context, 2) semantic scene context and 3) the
interaction between multiple agents, in order to
provide proper hidden representations that can
score and refine a prediction;
• It implements a spatial grid based pooling layer
similar to the SP layer in social LSTM.
• Instead of using the max pooling, operation with
rectangular grids, adopt log-polar grids with an
average pooling.
• Combined with CNN features, the SCF module
provides the RNN decoder with both static and
dynamic scene information.
• It learns consistency between semantics of
agents and scenes for reliable prediction.
Details of Scene Context Fusion (SCF) unit in
RNN Decoder2. Note that the input to the
GRU cell at each time-step integrates multiple
cues (i.e., the dynamics of agents, scene
context and interaction between agents).
9. DESIRE: Distant Future Prediction in
Dynamic Scenes with Interacting Agents
KITTI results (left 3 rows): The row 1&2 in (b) show highly reactive nature of RNN ED-SI (i.e., prediction turns after it hits
near non-drivable area). On the contrary, DESIRE shows its long-term prediction capability by considering potential future
rewards. DESIRE-SI also produces more convincing predictions in the presence of other vehicles. Stanford Drone Data
results (right 3 rows): The row 1 shows the multi-modal nature of the prediction problem. While the cyclist is making a
right turn, it is also possible that he turns around the round-about (denoted with arrow). DESIRE-SI predicts such equally
possible future as the top prediction, while covering the ground truth future within top 10 predictions. The row 2&3 also
show that DESIRE-SI provides superior predictions by reasoning about both static and dynamic scene contexts.
10. INFER: INtermediate representations for
FuturE pRediction
• 2019.3
• In urban driving scenarios, forecasting future trajectories of surrounding vehicles is of
paramount importance.
• While several approaches for the problem have been proposed, the best-performing ones
tend to require extremely detailed input representations (e.g. image sequences).
• But, such methods do not generalize to datasets they have not been trained on.
• Here is intermediate representations that are particularly well-suited for future prediction.
• As opposed to using texture (color) information, it relies on semantics and train an AR model
to accurately predict future trajectories of traffic participants (vehicles).
• Using semantics provides a significant boost over techniques that operate over raw pixel
intensities/disparities.
• Uncharacteristic of state-of-the-art approaches, this represents and models generalize to
completely different datasets, collected across several cities, and also across countries where
people drive on opposite sides of the road (left-handed vs right-handed driving).
• Code and data: https://rebrand.ly/INFER-results.
11. INFER: INtermediate representations for
FuturE pRediction
• The design philosophy is based on the following three desired characteristics that
knowledge representation systems must possess:
• 1) Representational adequacy: to adequately represent task- relevant information.
• 2) Inferential adequacy: to infer traits not be inferred from the original unprocessed data.
• 3) Generalizability: necessarily generalize to other data distributions (for the same task).
• The model takes as input an intermediate representation of the scene semantics
(intermediate, because it is neither too primitive, e.g. raw pixel intensities, nor too abstract
e.g. velocities, steering angles).
• Using these intermediate representations, predict the plausible future locations of the
Vehicles of Interest (VoI).
• The proposed representation does not rely heavily on the camera viewing angle, as camera
mounting parameters (height, viewing angle, etc.) vary across datasets, and this approach
hopes to be robust to such variations.
12. INFER: INtermediate representations for
FuturE pRediction
First generate intermediate representations by fusing monocular images with depth information (from either stereo
or Lidar), obtaining semantic and instance segmentation from monocular image, followed by an orthographic
projection to bird’s-eye view. The generated intermediate representations are fed through the network, and finally it
results in prediction of the target vehicle’s trajectory registered in the sensor coordinate frame.
13. INFER: INtermediate representations for
FuturE pRediction
• It formulates trajectory prediction as a per-cell regression over an occupancy grid.
• It uses the intermediate representations to simplify the objective and help the network
generalize better.
• It trains an autoregressive model that outputs the VoI’s position on an occupancy grid,
conditioned on the previous intermediate representations.
• It uses a simple Encoder- Decoder model connected by a convolutional LSTM to learn
temporal dynamics.
• It adds skip connections between corresponding encoder and decoder branches.
• The proposed trajectory prediction scheme takes as input a sequence of intermediate
representations and produces a single channel output occupancy grid.
• The training objective comprises two terms: reconstruction loss term, and safety loss term.
14. INFER: INtermediate representations for
FuturE pRediction
The qualitative results from the validation fold of KITTI showcase the efficacy of INFER-Skip in using the intermediate
representation to predict complex trajectories. For example, in the left most plot, the network is able to accurately
predict the unseen second curve in the trajectory (predicted and ground truth trajectories are shown in red and blue
color, respectively). The green and red 3D bounding boxes indicate start of preconditioning and start of prediction of the
vehicle of interest (VoI), respectively. It is worth noting that the predicted trajectories are well within the lane (dark gray)
and road region (cyan), while avoiding collisions with the obstacles (magenta).
15. Deep Imitative Models for Flexible Inference,
Planning, and Control
• Imitation learning produces behavioral policies with limited flexibility to accommodate new
goals at test-time.
• In contrast, model-based reinforcement learning (MBRL) can plan to arbitrary goals using a
predictive dynamics model learned from data.
• It proposes “imitative models” to combine the benefits of imitation learning and MBRL.
• Imitative models are probabilistic predictive models able to plan interpretable expert-like
trajectories to achieve arbitrary goals.
• Inference with them resembles trajectory optimization in model-based reinforcement
learning, and learning them resembles imitation learning.
• This method substantially outperforms six direct imitation learning approaches (five of them
prior work) and an MBRL approach in a dynamic simulated autonomous driving task, and
can be learned efficiently from a fixed set of expert demonstrations without additional online
data collection.
2019.6
16. Deep Imitative Models for Flexible Inference,
Planning, and Control
To apply the algorithm to navigation in CARLA. Left: Image depicting the current scene, in which the light recently
turned from green to red. Left-Middle: Plot showing LIDAR observations of the agent, the goals it received from a
route planner, and the plan produced by the method. The model smoothly chooses between goals based on its
prior of expert behavior. Here, the stationary agents chooses to accelerate to follow the vehicle ahead. Right-Middle:
Image depicting an intersection scene. Right: LIDAR observations, goals, cost map of simulated potholes, and a
variety of plans the method produces, colored by the planner’s preference. Although the imitative model never
observed pothole-avoidance behavior, it is able to plan a reasonable, on-road path around them with a test-time
cost map. Its preferred plan enters the intersection and steers around a pothole.
17. Deep Imitative Models for Flexible Inference,
Planning, and Control
• Reinforcement learning (RL) algorithms automatically learn desirable behaviors from raw
sensory inputs with minimal engineering; However, RL generally requires online learning: the
agent must collect more data with its latest strategy, use it to update a model, and repeat.
• Deploying a partially-trained policy on a real-world autonomous system, can be dangerous.
• Learning behavior should happen offline from expert demonstrations.
• How to incorporate such demo into an autonomous car, to perform a variety of tasks?
• One is imitation learning (IL), learning policies that stay near the expert’s distribution.
• Another is model-based methods, which can use the data to fit a dynamics model, and in
principle can be used with planning algorithms to achieve any user-specified goal.
• However, model-based (MB) and model-free RL algorithms are vulnerable to distributional
drift: when acting accord. to the learned model or policy, the agent visits states different
from those in training, and in those unlikely to determine an effective course of action.
• This is problematic when the data intentionally excludes adverse events such as crashes.
• Therefore, MBRL algorithms usually require online collection and training.
18. Deep Imitative Models for Flexible Inference,
Planning, and Control
Imitative planning to goals: multi-goal
waypoint planning enables fine-grained
control of the plans.
Costs can be assigned to “potholes” only seen at test-time;
expert demonstrations with potholes were never observed.
The planner prefers routes around the potholes.
19. Deep Imitative Models for Flexible Inference,
Planning, and Control
• IL algorithms use expert demonstration data and, despite similar drift shortcomings, can
sometimes learn effective policies without online data collection. However, standard IL offers
little task flexibility since it only predicts low-level behavior.
• While several works augmented IL with goal conditioning, these goals must be specified in
advance during training, and are typically simple (e.g., turning left or right).
• The goal is to devise an algorithm that combines the advantages of IL and MBRL by offering
the flexibility to achieve new user-specified goals and the ability to learn from offline data.
• By learning a deep conditional probabilistic forecasting model from expert data, it captures
the distribution of expert behaviors without using manually designed reward functions.
• To plan to a goal, this method infers the most probable expert state trajectory under a
posterior distribution induced by the model and a task-specifying goal distribution.
• By incorporating a model-based representation, it can easily plan to previously unseen user-
specified goals while behaving similar to the expert, and can be flexibly repurposed to
perform a variety of test-time tasks without any additional training.
20. Deep Imitative Models for Flexible Inference,
Planning, and Control
Illustration of the method applied to autonomous driving. This method trains an imitative model
from a dataset of expert examples. After training, the model is repurposed as an imitative planner.
At test-time, a route planner provides waypoints to the imitative planner, which computes expert-
like paths to each goal. The best plan is chosen according to the planning objective and provided
to a low-level PID-controller in order to produce steering and throttle actions.
21. Deep Imitative Models for Flexible Inference,
Planning, and Control
Tolerating bad waypoints. The planner prefers waypoints in the distribution of expert
behavior (on the road at a reasonable distance). Columns 1, 2: Planning with ½ decoy
waypoints. Columns 3,4: Planning with all waypoints on the wrong side of the road.
22. Deep Imitative Models for Flexible Inference,
Planning, and Control
Test-time plans prefer steering around potholes.
Table: Robustness to waypoint noise and test-time pothole adaptation.
The method is robust to waypoints on the wrong side of the road, and fairly
robust to decoy waypoints. The method is flexible enough to safely produce
behavior not demonstrated (pothole avoidance) by incorporating a test-
time cost. Ten episodes are collected in each Town.
23. Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
• Accurate prediction of others’ trajectories is essential for autonomous driving.
• Trajectory prediction is challenging because it requires reasoning about agents’ past
movements, social interactions among varying numbers and kinds of agents, constraints
from the scene context, and the stochasticity of human behavior.
• This approach models these interactions and constraints jointly within a Multi-Agent
Tensor Fusion (MATF) network.
• Specifically, the model encodes multiple agents’ past trajectories and the scene context
into a Multi-Agent Tensor, then applies convolutional fusion to capture multiagent
interactions while retaining the spatial structure of agents and the scene context.
• The model decodes recurrently to multiple agents’ future trajectories, using adversarial
loss to learn stochastic predictions.
• Experiments on both highway driving and pedestrian crowd datasets show that the model
achieves state-of- the-art prediction accuracy.
2019.7
24. Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
• There are two parallel encoding streams in the MATF architecture.
• One encodes the past trajectories of each individual agent xi independently using single agent
LSTM encoders, and another encodes the static scene context image c with a CNN.
• Each LSTM encoder shares the same set of parameters, so the architecture is invariant to the
number of agents in the scene.
• The outputs of the LSTM encoders are 1-D agent state vectors {x′1 , x′2 , .., x′n } without
temporal structure.
• The output of the scene context encoder CNN is a scaled feature map c′ retaining the spatial
structure of the bird’s-eye view static scene context image.
• Next, the two encoding streams are concatenated spatially into a Multi-Agent Tensor.
• Agent encodings {x′1, x′2, .., x′n} are placed into one bird’s-eye view spatial tensor, which is
initialized to 0 and is of the same shape (width and height) as the encoded scene image c′.
25. Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
• The dimension axis of the encodings fits into the channel axis of the tensor.
• The agent encodings are placed into the spatial tensor with respect to their positions at the
last time step of their past trajectories.
• This tensor is then concatenated with the encoded scene image in the channel dimension to
get a combined tensor. If multiple agents are placed into the same cell in the tensor due to
discretization, element-wise max pooling is performed.
• The Multi-Agent Tensor is fed into fully convolutional layers, which learn to represent
interactions among multiple agents and between agents and the scene context, while
retaining spatial locality, to produce a fused Multi-Agent Tensor.
• Specifically, these layers operate at multiple spatial resolution scale levels by adopting U-
Net-like architectures to model interaction at different spatial scales.
• The output feature map of this fused model c′′ has exactly the same shape as c′ in width and
height to retain the spatial structure of the encoding.
26. Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
The Multi-Agent Tensor encoding is a spatial
feature map of the scene context and multiple
agents from an overhead perspective, including
agent channels (above) and context channels
(below). Agents’ feature vectors (red) output
from single- Agent LSTM encoders are placed
spatially w.r.t. agents’ coordinates to form the
agent channels. The agent channels are aligned
spatially with the context channels (a context
feature map) output from scene context
encoding layers to retain the spatial structure.
27. Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
• To decode each agent’s predicted trajectory, agent- specific representations with fused
interaction features for each agent {x1′′ , x2′′ , .., xn′′ } are sliced out according to their
coordinates from the fused Multi-Agent Tensor output c′′.
• These agent-specific representations are then added as a residual to the original encoded
agent vectors to form final agent encoding vectors {x1′ + x1′′ , x2′ + x2′′ , ..., xn′ + xn′′ }, which
encode all the information from the past trajectories of the agents themselves, the static
scene context, and the interaction features among multiple agents.
• In this way, this approach allows each agent to get a different social and contextual
embedding focused on itself.
• Importantly, the model gets these embeddings for multiple agents using shared feature
extractors instead of operating n times for n agents.
• Finally, for each agent in the scene, its final vector xi′ + xi′′ is decoded to future trajectory
prediction yiˆ by LSTM decoders.
• Similar to the encoders for each agent, parameters are shared to guarantee that the network
can generalize well when the number of agents in the scene varies.
28. Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
Illustration of the Multi-Agent Tensor Fusion (MATF) architecture.
29. Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
Qualitative results from Massachusetts driving dataset. Past trajectories are shown in different colors for each
vehicle, followed by 100 sampled future trajectories. Ground truth future trajectories are shown in black, and lane
centers are shown in gray. (a) A complex scenario involving five vehicles; MATF accurately predicts the trajectory and
velocity profile for all. (b) MATF correctly predicts that the red vehicle will complete a lane change. (c) MATF
captures the uncertainty over whether the red vehicle will take the highway exit. (d) As soon as the purple vehicle
passes a highway exit, MATF predicts it will not take that exit. (e) Here, MATF fails to predict the precise ground truth
trajectory; however, the red vehicle is predicted to initiate a lane change maneuver in a very small number of
sampled trajectories.
30. AGen: Adaptable Generative Prediction
Networks for Autonomous Driving
• In highly interactive driving scenarios, accurate prediction of other road participants is critical
for safe and efficient navigation of autonomous cars.
• Prediction is challenging due to the difficulty in modeling various driving behavior, or
learning such a model.
• The model should be interactive and reflect individual differences.
• Imitation learning methods, such as parameter sharing generative adversarial imitation
learning (PS-GAIL), are able to learn interactive models.
• However, the learned models average out individual differences.
• When used to predict trajectories of individual vehicles, these models are biased.
• An adaptable generative prediction framework (AGen), performs online adaptation of the
offline learned models to recover individual differences for better prediction.
• In particular, combine the recursive least square parameter adaptation algorithm (RLS-
PAA) with the offline learned model from PS-GAIL.
• RLS-PAA has analytical solutions and is able to adapt the model for every single vehicle
efficiently online.
IVS 2019
31. AGen: Adaptable Generative Prediction
Networks for Autonomous Driving
Offline model learning extracts features for
average driving behavior. Online model adaptation
can perturb the average model to fit the behavior
of a specific driver at a specific time. In particular,
take the offline pretrained policy network of PS-
GAIL as the feature extractor for averaged driving
behavior, while adapting individual vehicle
behavior using RLS-PAA online.
Heterogeneity among drivers needs to be explicitly
accounted to improve prediction accuracy in real
world scenarios. As mentioned earlier, it is
intractable to fit a policy network for every
individual vehicle. To make heterogeneous
prediction scalable, combine offline model
learning with online model adaptation.
32. AGen: Adaptable Generative Prediction
Networks for Autonomous Driving
(a) Offline training using PS-GAIL. Critic computes
the diff btw the expert trajectory and the roll-out
trajectory from the policy network. PS-GAIL
iteratively updates the policy to minimize the diff
and the critic to maximize the diff.
(b) Online adaptation using RLS-PAA. The critic computes the 2-
norm diff btw the the expert trajectory and the roll-out
trajectory from the policy network. RLS-PAA updates the policy
network to minimize the diff. Either 1-step or 2-step adaptation.
33. AGen: Adaptable Generative Prediction
Networks for Autonomous Driving
Predicted 2 s trajectories for 22 agents after 3 s adaptions.
Average position RMSE over time
in the 22-agent scenario
34. Conditional Generative Neural System for
Probabilistic Trajectory Prediction
• Effective understanding of the environment and accurate trajectory prediction of
surrounding dynamic obstacles are critical for intelligent systems such as autonomous
vehicles and wheeled mobile robotics navigating in complex scenarios to achieve safe and
high-quality decision making, motion planning and control.
• Due to the uncertain nature of the future, it is desired to make inference from a probability
perspective instead of deterministic prediction.
• They propose a conditional generative neural system (CGNS) for probabilistic trajectory
prediction to approximate the data distribution, with which realistic, feasible and diverse
future trajectory hypotheses can be sampled.
• The system combines the strengths of conditional latent space learning and variational
divergence minimization, and leverages both static context and interaction information with
soft attention mechanisms.
• Also propose a regularization method for incorporating soft constraints into deep neural
networks with differentiable barrier functions, which can regulate and push the generated
samples into the feasible regions.
2019.7
35. Conditional Generative Neural System for
Probabilistic Trajectory Prediction
Typical urban traffic scenarios with large uncertainty and interactions
among multiple entities. The shaded areas represent the reachable
sets of possible trajectories. (a) Unsignalized roundabout with four-
way yield signs; (b) Unsignalized intersection with four-way stop signs.
36. Conditional Generative Neural System for
Probabilistic Trajectory Prediction
• Requirements to generate diverse, realistic future trajectories:
• 1) Context-aware: The system should be able to forecast trajectories which are inside the
traversable regions and collision-free with static obstacles in the environment. For instance,
when the vehicles navigate in a roundabout they need to advance along the curves and
avoid collisions with road boundaries.
• 2) Interaction-aware: The system needs to generate reason- able trajectories compliant to
traffic or social rules, which takes into account interactions and reactions among multiple
entities. For instance, when the vehicles approach an unsignalized intersection, they need to
anticipate others’ possible intentions and motions as well as the influences of their own
behaviors on surrounding entities.
• 3) Feasibility-aware: The system should anticipate naturalistic and physically-feasible
trajectories which are compliant to vehicle kinematics or dynamics constraints, although
these constraints can be ignored for pedestrians due to the large flexibility of their motions.
• 4) Probabilistic prediction: Since the future is full of uncertainty, the system should be able to
learn an approximated distribution of future trajectories close to data distribution and
generate diverse samples which represent various possible behavior patterns.
37. Conditional Generative Neural System for
Probabilistic Trajectory Prediction
Overviewof theproposedconditionalgenerativeneuralsystem(CGNS)whichconsistsoffourkeycontributions:
(a)adeepfeatureextractorwithsoftattentionmechanism,whichextractsmulti-levelfeaturesfromscene
contextimagesequencesandtrajectories;(b)Anencodertolearnconditionallatentspacerepresentations;(c)A
generator(decoder)tosamplefuturetrajectoryhypotheses;(d)Adiscriminatortodistinguishpredicted
trajectoriesfromgroundtruth.
38. Conditional Generative Neural System for
Probabilistic Trajectory Prediction
Fig. 3. The visualization of the context image masks and trajectory block attention masks. Particularly, in the
trajectory masks, there are four rows representing 4 historical time steps and 6 columns representing 6
vehicles in the scene. The 1st column corresponds to the predicted vehicle and the others corresponds to
surrounding ones. Brighter colors indicate larger attention weights. The predicted vehicles are indicated with
red bounding boxes. In all the cases, the image masks have a large weight around the predicted vehicle and the
area of its heading direction. In the 1st three cases, only the historical trajectories of the predicted vehicle are
assigned large attention weights, which implies that the other vehicles have little effect in these situations.
However, in the last 3 cases, more attention is paid to other vehicles since there exist strong interactions which
increases the inter-dependency.
39. Coordination and Trajectory Prediction for Vehicle
Interactions via Bayesian Generative Modeling
• Coordination recognition and subtle pattern prediction of future trajectories play a significant
role when modeling interactive behaviors of multiple agents.
• Due to the essential property of uncertainty in the future evolution, deterministic predictors are
not sufficiently safe and robust.
• In order to tackle the task of probabilistic prediction for multiple, interactive entities, propose a
coordination and trajectory prediction system (CTPS), which has a hierarchical structure
including a macro-level coordination recognition module and a micro-level subtle pattern
prediction module which solves a probabilistic generation task.
• Two types of representation of the coordination variable: categorized and real-valued.
• Bayesian deep learning into generative models to generate diversified prediction hypotheses.
• The proposed system is tested on multiple driving datasets in various traffic scenarios, which
achieves better performance than baseline approaches in terms of a set of evaluation metrics.
• Using the categorized coordination can better capture multi-modality and generate more
diversified samples than the real-valued coordination, while the latter can generate prediction
hypotheses with smaller errors with a sacrifice of sample diversity.
• NNs with weight uncertainty is able to generate samples with larger variance and diversity.
2019.5
40. Coordination and Trajectory Prediction for Vehicle
Interactions via Bayesian Generative Modeling
Typical highway and urban driving scenarios
where two or more entities coordinate and
interact with each other. The shaded areas
represent possible future motions which
consider multi-modality. (a) Ramp merging and
lane change behaviors on highway scenarios;
(b) Unsignalized roundabout with yield signs;
(c) Unsignalized intersection with stop signs.
Although the contexts are different, they can
be treated as generalized merging scenarios.
41. Coordination and Trajectory Prediction for Vehicle
Interactions via Bayesian Generative Modeling
• The multi-modal conditional distribution of future trajectories for interactive agents can be
factorized into categorized and real-valued.
• This factorization naturally divides the system into a coordination recognition module
(macro-level) and a subtle pattern prediction module (micro-level).
• The coordination c can not only be categorized to represent meaningful semantics, but also
be real-value vectors to encode the underlying representations.
• If c is categorized, the micro-level module takes c in as an indicator through one-hot
encoding; if c is a real-valued variable, the micro-level module takes c in as an additional
input feature.
• The macro-level module is based on a variational recurrent neural network (VRNN)
followed by a probabilistic classifier.
• And the micro-level module is based on a Coordination-Bayesian Conditional Generative
Adversarial Network (C-BCGAN).
42. Coordination and Trajectory Prediction for Vehicle
Interactions via Bayesian Generative Modeling
Overview of CTPS: (a) Coordination recognition module: The coordination variable can be discrete categories or continuous
real-valued vectors. The discrete distribution of categorized coordination is obtained by a probabilistic classifier based on latent
features extracted by VRNN. The continuous distribution of real-valued coordination is obtained by maximizing mutual info based
on a VAE-style model. Choose either formulation according to the objective and emphasis in particular tasks; (b) Subtle pattern
prediction module: The model is based on the proposed C-BCGAN in which the generator takes as input the historical info,
coordinator indicator as well as a noise from the normal distribution. Weight uncertainties are incorporated in both generator and
discriminator network.
43. Coordination and Trajectory Prediction for Vehicle
Interactions via Bayesian Generative Modeling
The visualization of prediction results in the highway scenario. (a) Generation with learned coordination; (b)
Generation with real-valued coordination. Note that, to only predict the longitudinal motions for surrounding
vehicles but both longitudinal and lateral motions for the center vehicle. That is the reason why the predicted
trajectories of surrounding vehicles do not have lateral deviation.
44. Interaction-aware Multi-agent Tracking and Probabilistic
Behavior Prediction via Adversarial Learning
• In order to enable high-quality decision making and motion planning of intelligent systems such as
robotics and autonomous vehicles, accurate probabilistic predictions for surrounding interactive
objects is a crucial prerequisite.
• Although many research studies have been devoted to making predictions on a single entity, it remains
an open challenge to forecast future behaviors for multiple interactive agents simultaneously.
• In this work, take advantage of the Generative Adversarial Network (GAN) due to its capability of
distribution learning and propose a generic multi-agent probabilistic prediction and tracking
framework which takes the interactions among multiple entities into account, in which all the entities
are treated as a whole.
• However, since GAN is very hard to train, make an empirical research and present the relationship
between training performance and hyperparameter values with a numerical case study.
• The results imply that the proposed model can capture both the mean, variance and multi- modalities
of the ground truth distribution.
• Moreover, apply the proposed approach to a real-world task of vehicle behavior prediction to
demonstrate its effectiveness and accuracy.
• The proposed model trained by adversarial learning can achieve a better prediction performance
than other SoA models trained by traditional supervised learning which maximizes the data likelihood.
• The well-trained model can also be utilized as an implicit proposal distribution for particle filtered
based Bayesian state estimation.
2019.4
45. Interaction-aware Multi-agent Tracking and Probabilistic
Behavior Prediction via Adversarial Learning
The general diagram of the proposed model, which consists of a generator network and a discriminator network.
46. Interaction-aware Multi-agent Tracking and Probabilistic
Behavior Prediction via Adversarial Learning
• To apply the proposed approach to solve a trajectory prediction task of interactive on-road
vehicles as an illustrative example, although it can be utilized to solve many other tasks such
as interactive pedestrian trajectory prediction and human-robot interactions.
A typical highway scenario is investigated where the gray car is the ego vehicle which aims
at forecasting future motions of its surrounding vehicles (red, green and yellow ones). The
observations of environment can be obtained by on-board sensors. The approach can also
be adopted in overhead traffic surveillance systems with camera- based monitors.
47. Interaction-aware Multi-agent Tracking and Probabilistic
Behavior Prediction via Adversarial Learning
Visualization of cases. (a) lane change left; (b) lane change right. The red dash lines are ground truth trajectories.