The New Perception Framework in Autonomous Driving: An Introduction of BEV Network

The New Perception Framework
in Autonomous Driving:
Yu Huang
Chief Scientist
AnIntroductionofBEVNetwork

Autonomous Driving is one of the most challenging AI applications in the world, defined from L2 to L5,
with Operation Design Domain, like Highway pilot, Urban pilot, Traffic Jam pilot, Robtaxi/bus/truck etc.

A solution could be modular, i.e. a pipeline of perception, mapping & localization, prediction, planning
and control, or end-to-end (E2E) or partially E2E;

n There are roughly two research & development routes, progressive step by step (L2->L4) or leaps and
bounds (L4), additionally acting like dimension reduction (L4->L2+)；
n Challenging problems in AV: long tailed with corner cases, safety-critical scenarios, and mass
production requirements (closed loop).

BEV Network
The Bird’s-Eye-View (BEV) is a natural view to serve as a unified representation for 3-D environment
understanding for perception module in autonomous driving;

BEV contains rich semantic info, precise localization, and absolute scales, which can be directly deployed
by many downstream real-world applications such as behavior prediction, motion planning, etc.
BEVerse for 3D detection/map segmentation/motion prediction

n BEV provides a physics-interpretable way to fuse information from different views, modalities, time
series, and agents.
v Spatial and temporal fusion
BEVFormer for multiple cameras’ spatial-temporal fusion

series, and agents.
v Sensor fusion
Multi-task Fusion framework in BEVFusion

series, and agents.
v V2X collaboration
UniBEV

View transformation plays a vital role in camera-only 3D perception, from Perspective View (PV) to BEV.
Copied from the survey paper “Delving into the Devils of Bird’s-eye-view Perception”

Current BEV approaches can be divided into two main categories based on view transformation:
geometry-based and network-based;
Copied from the survey paper “Delving into the Devils of Bird’s-eye-view Perception”
geometry-based
network-based

In geometry-based methods, earlier work tries homograph based on the flat-ground constraints.
Sim2Real for BEV Segmentation

The state-of-art solution in geometry-based approaches is lifting 2D features to 3D space by explicit or
implicit depth estimation, i.e. depth-based (point-based or voxel-based).
Lift, Splat, Shoot (LSS)

In network-based methods, the straightforward idea is to use MLP in a bottom-up strategy to project the
PV features to BEV;
Fishing Net for Semantic Segmentation

Another framework in network-based BEV employs a top-down strategy by directly constructing BEV
queries and searching corresponding features on PV images by the cross attention mechanism, i.e.
transformer (with either sparse queries or dense queries).
Ego3RT: Ego 3D Representation

Though by a hard flat-ground assumption, homograph-based methods has good
interpretability, where IPM (inverse perspective mapping) plays a role in image
projection or feature projection for downstream perception tasks;
Depth-based methods are usually built on an explicit 3D representation, quantized
voxels or point clouds (like pseudo-LiDAR) scattering in continuous 3D space.
l Point-based suffer from the model complexity and lower performance;
l Voxel-based is popular due to computation efficiency and flexibility.
MLP-based view transform is hard due to lack of depth info, occlusion etc.;
Transformer with either sparse (detection) or dense (map segmentation as well) queries,
gains impressive performance with strong relation modeling and data-dependent
property, but the efficiency is still a problem.

01
02
03 • Backbone (RegNet)/Bottleneck (FPN)
04 • Shared backbone or not?
05 • Auxiliary task design, multiple stage training
06
07

To apply BEV for autonomous driving, a data closed loop is required to build:
• Data selection is performed at both the vehicle and server side, where the data is selected from the vehicles based
on rough rules ,like shadow modes, abnormal driving operations or specific scenario detection, and then the
collected data at the server selectively goes to annotation and training based on AI rules, such as active learning;

• A big model (offline, non-real-time) for BEV only works at the server, where transformer network with dense
queries is used for view transform;
毫末

• A light model (real-time online) for BEV is deployed only for the vehicle on board, where the voxel-based view
transform with depth supervision is designed;

• BEV data annotation is specific due to innate 3-D structure, captured either from 3-D sensor (LiDAR)
NuScenes

• BEV data annotation is specific due to innate 3-D structure, captured either from 3-D sensor (LiDAR) or from 3-D
visual reconstruction of cameras;
Images
IMU
Odometry
GPS
Big
Neural Net
Model
Segment
Depth
Flow
Static BG & Ego Traject
Moving Objects & Kine
Tesla
Elevation

• Simulation platform is used for photo-realistic image data synthesis, digital twin (from real-to-sim) , scenario
generalization and style transfer (from sim-to-real);
Google Block-NeRF Simulation with ground truth Carla Simulator
Nvidia OmniVerse

• A teacher-student training framework assists the knowledge distillation in BEV model training and deployment.

BEV network is the new paradigm for computer vision, showing its strong potential in
autonomous driving application;
BEV’s network design relies on the computing platform, either at the server side or the client
side (vehicle in ADS);
The data closed loop is a must for autonomous driving R&D, where BEV needs pay more attention
to data selection and annotation;
Simulation platform can relieve the burden of BEV data annotation with State-of-art techniques
like photorealistic rendering, digital twin, scenario generalization and style transfer etc.;
To optimize the best deployment of BEV, knowledge distillation is helpful in trade-off of
performance and computation complexity.

The New Perception Framework in Autonomous Driving: An Introduction of BEV Network

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The New Perception Framework in Autonomous Driving: An Introduction of BEV Network

Similar to The New Perception Framework in Autonomous Driving: An Introduction of BEV Network (20)

More from Yu Huang

More from Yu Huang (20)

Recently uploaded

Recently uploaded (20)

The New Perception Framework in Autonomous Driving: An Introduction of BEV Network