Techniques and Challenges in Autonomous Driving

Techniques and Challenges in
Autonomous Driving
Yu Huang
Chief Scientist of Autonomous Driving

Outline
1. Introduction
2. Perception
3. Mapping & Localization
4. Prediction
5. Planning & Control
6. V2X
7. Safety
8. Data Closed Loop
9. Annotation
10. Simulation
11. Scenario-based Development
12. Summary

Introduction
• DARPA Grand (Rural) Challenge (2004-2005): Stanford
• DARPA Urban Challenge (2007): CMU

Introduction
• Levels of Automation (SAE): L0 - L1 - L2 - L3 - L4 - L5

Introduction
• Levels of Automation (SAE): L0 - L1 - L2 - L3 - L4 - L5
• ODD (Operation Design Domain)
• Robotaxi/driverless cargo delivery/autonomous commercial truck or bus/
• /Highway pilot/Urban pilot/Traffic Jam pilot/Autonomous valet parking
• Popular Development Paths：
• Gradual method: L2 -> L2+ -> L3 -> L4
• One-stop method: L4 -> L5
• Dimension reduction method: L2+ <- L4
• Problems:
• Techniques: Long tailed, Corner cases
• Safety: ISO26262, SOTIF
• Mass production: Monetization, Cost, Closed loop, OTA (over-the-air)

Introduction
• Platform Architecture:
• SW: hierarchical structure
• Modular: a pipeline
• End-to-End: fully or partially

Perception
• Collect info from sensors and discover
relevant knowledge from the environment;
• Calibration: sensor coordinate systems
• Detection, Segmentation, Tracking
• Camera: RGB image for 2-D/3-D detection
• Pseudo-LiDAR
• Radar: All-weather
• LiDAR: 3-D point cloud
• Multiple object tracking (MOT)
• Sensor fusion
• End-to-end perception
• Spatio-temporal fusion
• BEV (bird-eye-view)

Perception
• Tesla’s E2E NN framework
• Virtual camera
• rectify
• RegNet
• BiFPN
• Transformer
• BEV vector space
• Feature queue
• Kinematics：IMU
• Video module
• Spatial RNN

Mapping
• HD map is a priori knowledge for perception and localization
• Semantic layer: road and lane topology
• Lanes, road boundaries, road marks, crosswalks, walkway
• Traffic signs, traffic light, pole-like objects, stop line
NavInfo
HD maps
四维图新

Mapping
• HD map is a priori knowledge for perception and localization
• Semantic layer: road and lane topology
• Lanes, road boundaries, road marks, crosswalks, walkway
• Traffic signs, traffic light, pole-like objects, stop line
• Geometric layer:
• LiDAR point cloud alignment/Visual reconstruction
• SLAM
• Front end: odometry, ego-motion estimation
• Back end: global optimization, Pose Graph or Bundle Adjustment
• Visual /LiDAR/Radar SLAM
• Map update/Online mapping
• Crowd sourcing
• Deep learning plays a role: learn to build the map

Mapping
Q Li, Y Wang, Y-L Wang, “HDMapNet: An Online HD Map Construction and Evaluation Framework”, arXiv July, 2021
J Philion, S Fidler, “Lift-Splat-Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D”, arXiv, 2008

Localization
• Determine ego location w.r.t. the environ.
• Global/Local (incremental) localization
• Loop closure, failure recovery
• Localization by feature matching
• 2D-to-3D matching: PnP
• 2D-to-2D matching: Visual correspondence
• 3D-to-3D matching: Point cloud
• Localization by semantic matching:
• Lanes (lateral info), signs (longitudinal info.)
• Sensor fusion:
• GNSS, IMU, LiDAR, Camera, Wheel encoders,…
• State space estimation
• Deep learning is a potential: learn to localize (locally or globally)

Localization
X Wei, I A Barsan, S Wang, J Martinez, R Urtasum, “Learning to Localize Through Compressed Binary Maps”, arXiv, 2020

Prediction
• Anticipate surrounding traffic players
• Prediction for pedestrians: articulated motion and social rules
• Prediction for vehicles: driving limited by roads and traffic rules
• Physics-based: state estimation
• Maneuver-based: clustering and classification (self supervised)
• Interaction-aware: learning by imitation and reasoning by planning
• Challenges:
• Interaction modeling
• Multimodal uncertainty
“Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations”, arXiv, Aug. 2020

Prediction
• Cruise.AI’s Prediction NN model
• Encoder-decoder architecture
• Encode object history and scenes
together (HD map)
• Attention for interaction and social
• Mixture of experts for varieties
• Decode in a two-stage way
• Initialization and refinement
• Multi-modal uncertainty
• Auxiliary tasks in MTL
• Joint prediction
• Self supervision

Planning
• Perform decision making from modules of
localization, perceptions and prediction
• Partition and organize into a hierarchical structure.
• Route (mission) planning
• Take appropriate macro-level route to take
• Behavior planning (decision making)
• Interact with other agents and follow rules restrictions
• Motion (path) planning
• Generates appropriate paths and/or sets of actions
• Sampling-based: discrete search
• Imitation learning: deep learning
• Game theoretical: reinforcement learning
“A Survey of Motion Planning and Control Techniques for Self-driving Urban Vehicles”, arXiv, April, 2016

Planning
S Casas, A Sadat, R Urtasum, “MP3: A Unified Model to Map, Perceive, Predict and Plan’, CVPR, 2021

Control
• Executing the planned maneuvers, accounting for error / uncertainty
• Closed loop feedback control
• PID
• Linear Quadratic Regulator
• MPC with feedforward control
• Robustness and stability
• Path/Trajectory tracking
• Geometric
• Model-based
• Joint/Separate lateral and longitudinal control
• Deep learning for control
• Imitation learning
• Reinforcement learning
“A Survey of Motion Planning and Control Techniques for Self-driving Urban Vehicles”, arXiv, April, 2016

Control
“Learning Robust Control Policies for End-to-End Autonomous Driving from Data-Driven Simulation”, IEEE RAL, 2020
• Vista: a data-driven simulator;
• It is an end-to-end training
controllers with reinforcement
learning within simulation space;
• Trained agents can be deployed
directly in the real-world.

V2X
• V2X (vehicle-to-everything): communicate with the traffic and the environ around
them, i.e. vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I).
• By accumulating detailed info from other peers, drawbacks of the ego-vehicle such
as sensing range, blind spots and insufficient planning, may be alleviated.
• V2X helps in increasing safety and traffic efficiency.
• Collaborative perception;
• Collaborative localization;
• Collaborative planning:
• Centralized
• Decentralized
• Collaborative computing:
• Training and inference: cloud-edge-vehicle

V2X
• V2VNet: Build and send/receive compressed intermediate representations;
• Aggregating the information received from multiple nearby vehicles by a
spatially aware GNN, observe the same scene from different viewpoints.
“V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction”, arXiv, Aug. 2020

Safety
• Safety is a system-level concept to minimize the risk of hazards due to malfunctioning
of system components;
• AI safety is the new issue addressing a variety of ML vulnerabilities;
• Functional safety standards (ISO26262):
• Identify safety needs, define safety requirements, and finally verify the design accordingly;
• SOTIF (Safety Of Intended Functionality):
• Address functional insufficiencies as the absence of unreasonable risk due to malfunctioning;
• Safety models:
• Mobileye’s RSS (responsibility sensitivity safety) model
• Nvidia’s SFF (safety force field) model
• Main issues:
• Corner cases, adversarial attack, interpretability, uncertainty, verification, ...
“Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety”, arXiv, April, 2021
“A Survey of Safety and Trustworthiness of Deep Neural Networks: Verification, Testing, Adversarial Attack
and Defence, and Interpretability”, arXiv, 2019

Safety
• “Scenario manager” coordinates
the simulator and AI agent to run a
driving scenario and monitor the
state and the safety of the EV.
• It is bundled with a “campaign
manager” that takes a config file
as input to select a fault model,
SW or HW module sites, the
number of faults, and a scenario;
• “Campaign manager” uses the
specified config to inject one or
more transient faults per run into
the ADS system;
• “Event-driven synchronization”
module helps coordinate.
“ML-based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection”, arXiv July, 2019
• DriveFI: a ML-based fault injection engine, to mine situations and
faults that maximally impact AV safety;
• It uses a DBN, specifically a 3-Temporal Bayesian Network (TBN);

Data Closed Loop
• ADS development faces significant data challenges;
• Long tailed distribution with rare corner cases;

Data Closed Loop
• Data driven model development is the competitive power;
• Build infrastructure to support data closed loop in ADS development;
Tesla
Waymo

Data Closed Loop
• Data driven model development is the competitive power;
• Build infrastructure to support data closed loop in ADS development;
• Data capture with “smart” selection;
• Active learning with uncertainty estimation, corner case/out-of-distribution detection;
• Efficient data annotation;
• Fully automated labeling tools: offline, large NN models on servers.
• Incremental model training;
• Adversarial augmentation, domain adaptation, open world learning.
• Simulation platform with scenario-based testing & validation;
• MIL (model-in-loop), SIL (SW-in-loop), HIL(HW-in-loop) and VIL (vehicle-in-loop);
• Deployment with OTA: shadow mode (Tesla)

Data Closed Loop
• A fully differentiable AV stack trainable from human demonstrations;
• Closed-loop data-driven reactive simulation;
• Large-scale, low-cost data collections towards scalability issues;
A Jain, L D Pero, H Grimmett, P Ondruska, “Autonomy 2.0: Why is self-driving always 5 years away?” arXiv, July 2021

Annotation
• Annotation is time consuming and labor expensive;
• Automatic labeling
• Offline, not real time, on server instead of vehicle client;
• Higher performance
• May need more data input
• Semi-automatic labeling
• Interactive with human-in-the-loop
• Rely on solid algorithms which better than manual operation
• Integrated platform with label transfer within different sensors
• 2D-3D space
• HD map building is a special case

Annotation
“Offboard 3D Object Detection from Point Cloud Sequences”, CVPR,
2021

Simulation
• Simulating a driving environment reduces cost for testing
• Sensor simulation:
• Image/video rendering
• LiDAR/radar
• Traffic simulation
• Road network simulation
• Road actors simulation
• Vehicles, pedestrians, cyclist, motorist, …
• Kinematic/dynamic models
• Neural rendering: real-to-simulation by deep learning (not ray tracing)
• Style transfer: GAN
Cruise.AI
Tesla

Simulation
• Representing scenes as compositional generative neural feature fields;
• Combining this scene representation with a neural rendering pipeline yields a fast and
realistic image synthesis model;
• Neural Radiance Fields (NeRFs) in which combining an implicit neural model with
volume rendering for novel view synthesis of complex scenes.
M Niemeyer, A Geiger, “GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields”, CVPR’21
“NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis”, ECCV, 2020

Scenario-based Development
• A scenario is the dynamic description of the components of the autonomous
vehicle and its driving environ over a period of time;
• Extracting interesting scenarios from real world data as well as generating
failure cases is important for the testing;
• A Hazard Based Testing (HBT) approach selects “smart miles” that reflect
(safety-critical) hazard-based scenarios, in which ADS fails;

• A scenario is the dynamic description of the components of the autonomous
vehicle and its driving environ over a period of time;
• Extracting interesting scenarios from real world data as well as generating
failure cases is important for the testing;
• A Hazard Based Testing (HBT) approach selects “smart miles” that reflect
hazard-based scenarios, in which ADS fails;
• Pegasus project:
• Functional scenario->Logical scenario->Concrete scenario
• Methods to generate concrete scenarios:
• Knowledge-driven: human experts define;
• Data-driven: clustering for patterns.
• Adversarial attack: automatic generation of safety-critical scenarios
“Finding Critical Scenarios for Automated Driving Systems: A Systematic Literature Review”, arXiv, Oct. 2021

“AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles”, arXiv, April 2021
• Perturb the maneuvers of interactive actors in an existing scenario with adversarial
behaviors that cause realistic autonomy system failures;
• Given an existing scenario and its original sensor data, perturb the scenario and
update accordingly how the SDV would observe the LiDAR sensor data based on
the new scene configuration;
• Then evaluate the ADS on the modified scenario, compute an adversarial objective,
and update the proposed perturbation using a search algorithm.

Summary
• Autonomous Driving development is a challenging work;
• Deep learning is the core in algorithm development;
• Data closed loop is the competitive power in ADS;
• Safety-critical scenarios are “gold” sources for ADS upgrade;
• New sensor development is also the propulsion;
• ODD (Operation Domain Design ) and mass production are important.

Techniques and Challenges in Autonomous Driving

Techniques and Challenges in Autonomous Driving

More Related Content

What's hot

Similar to Techniques and Challenges in Autonomous Driving

More from Yu Huang

Recently uploaded

In this document

Techniques and Challenges in Autonomous Driving