SlideShare a Scribd company logo
1 of 27
Download to read offline
The New Perception Framework
in Autonomous Driving:
Yu Huang
Chief Scientist
AnIntroductionofBEVNetwork
01
02
03
04
05
06
07
Autonomous Driving is one of the most challenging AI applications in the world, defined from L2 to L5,
with Operation Design Domain, like Highway pilot, Urban pilot, Traffic Jam pilot, Robtaxi/bus/truck etc.
A solution could be modular, i.e. a pipeline of perception, mapping & localization, prediction, planning
and control, or end-to-end (E2E) or partially E2E;
n There are roughly two research & development routes, progressive step by step (L2->L4) or leaps and
bounds (L4), additionally acting like dimension reduction (L4->L2+);
n Challenging problems in AV: long tailed with corner cases, safety-critical scenarios, and mass
production requirements (closed loop).
BEV Network
The Bird’s-Eye-View (BEV) is a natural view to serve as a unified representation for 3-D environment
understanding for perception module in autonomous driving;
BEV contains rich semantic info, precise localization, and absolute scales, which can be directly deployed
by many downstream real-world applications such as behavior prediction, motion planning, etc.
BEVerse for 3D detection/map segmentation/motion prediction
n BEV provides a physics-interpretable way to fuse information from different views, modalities, time
series, and agents.
v Spatial and temporal fusion
BEVFormer for multiple cameras’ spatial-temporal fusion
n BEV provides a physics-interpretable way to fuse information from different views, modalities, time
series, and agents.
v Sensor fusion
Multi-task Fusion framework in BEVFusion
n BEV provides a physics-interpretable way to fuse information from different views, modalities, time
series, and agents.
v V2X collaboration
UniBEV
View transformation plays a vital role in camera-only 3D perception, from Perspective View (PV) to BEV.
Copied from the survey paper “Delving into the Devils of Bird’s-eye-view Perception”
Current BEV approaches can be divided into two main categories based on view transformation:
geometry-based and network-based;
Copied from the survey paper “Delving into the Devils of Bird’s-eye-view Perception”
geometry-based
network-based
In geometry-based methods, earlier work tries homograph based on the flat-ground constraints.
Sim2Real for BEV Segmentation
The state-of-art solution in geometry-based approaches is lifting 2D features to 3D space by explicit or
implicit depth estimation, i.e. depth-based (point-based or voxel-based).
Lift, Splat, Shoot (LSS)
In network-based methods, the straightforward idea is to use MLP in a bottom-up strategy to project the
PV features to BEV;
Fishing Net for Semantic Segmentation
Another framework in network-based BEV employs a top-down strategy by directly constructing BEV
queries and searching corresponding features on PV images by the cross attention mechanism, i.e.
transformer (with either sparse queries or dense queries).
Ego3RT: Ego 3D Representation
Though by a hard flat-ground assumption, homograph-based methods has good
interpretability, where IPM (inverse perspective mapping) plays a role in image
projection or feature projection for downstream perception tasks;
Depth-based methods are usually built on an explicit 3D representation, quantized
voxels or point clouds (like pseudo-LiDAR) scattering in continuous 3D space.
l Point-based suffer from the model complexity and lower performance;
l Voxel-based is popular due to computation efficiency and flexibility.
MLP-based view transform is hard due to lack of depth info, occlusion etc.;
Transformer with either sparse (detection) or dense (map segmentation as well) queries,
gains impressive performance with strong relation modeling and data-dependent
property, but the efficiency is still a problem.
01
02
03 • Backbone (RegNet)/Bottleneck (FPN)
04 • Shared backbone or not?
05 • Auxiliary task design, multiple stage training
06
07
To apply BEV for autonomous driving, a data closed loop is required to build:
• Data selection is performed at both the vehicle and server side, where the data is selected from the vehicles based
on rough rules ,like shadow modes, abnormal driving operations or specific scenario detection, and then the
collected data at the server selectively goes to annotation and training based on AI rules, such as active learning;
To apply BEV for autonomous driving, a data closed loop is required to build:
• A big model (offline, non-real-time) for BEV only works at the server, where transformer network with dense
queries is used for view transform;
毫末
To apply BEV for autonomous driving, a data closed loop is required to build:
• A light model (real-time online) for BEV is deployed only for the vehicle on board, where the voxel-based view
transform with depth supervision is designed;
To apply BEV for autonomous driving, a data closed loop is required to build:
• BEV data annotation is specific due to innate 3-D structure, captured either from 3-D sensor (LiDAR)
NuScenes
To apply BEV for autonomous driving, a data closed loop is required to build:
• BEV data annotation is specific due to innate 3-D structure, captured either from 3-D sensor (LiDAR) or from 3-D
visual reconstruction of cameras;
Images
IMU
Odometry
GPS
Big
Neural Net
Model
Segment
Depth
Flow
Static BG & Ego Traject
Moving Objects & Kine
Tesla
Elevation
To apply BEV for autonomous driving, a data closed loop is required to build:
• Simulation platform is used for photo-realistic image data synthesis, digital twin (from real-to-sim) , scenario
generalization and style transfer (from sim-to-real);
Google Block-NeRF Simulation with ground truth Carla Simulator
Nvidia OmniVerse
To apply BEV for autonomous driving, a data closed loop is required to build:
• A teacher-student training framework assists the knowledge distillation in BEV model training and deployment.
BEV network is the new paradigm for computer vision, showing its strong potential in
autonomous driving application;
BEV’s network design relies on the computing platform, either at the server side or the client
side (vehicle in ADS);
The data closed loop is a must for autonomous driving R&D, where BEV needs pay more attention
to data selection and annotation;
Simulation platform can relieve the burden of BEV data annotation with State-of-art techniques
like photorealistic rendering, digital twin, scenario generalization and style transfer etc.;
To optimize the best deployment of BEV, knowledge distillation is helpful in trade-off of
performance and computation complexity.
Questions?

More Related Content

What's hot

Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Hwa Pyung Kim
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Visual Object Tracking: review
Visual Object Tracking: reviewVisual Object Tracking: review
Visual Object Tracking: reviewDmytro Mishkin
 
CHARGING INNOVATIONS AND INFRASTRUCTURE DEPLOYMENT FOR 2030 EV ADOPTION
CHARGING INNOVATIONS AND INFRASTRUCTURE DEPLOYMENT FOR 2030 EV ADOPTIONCHARGING INNOVATIONS AND INFRASTRUCTURE DEPLOYMENT FOR 2030 EV ADOPTION
CHARGING INNOVATIONS AND INFRASTRUCTURE DEPLOYMENT FOR 2030 EV ADOPTIONiQHub
 
Hybrid electric vehicle
Hybrid electric vehicleHybrid electric vehicle
Hybrid electric vehicleganeshbehera6
 
Lecture_16_Self-supervised_Learning.pptx
Lecture_16_Self-supervised_Learning.pptxLecture_16_Self-supervised_Learning.pptx
Lecture_16_Self-supervised_Learning.pptxKarimdabbabi
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIYu Huang
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421穗碧 陳
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Domain adaptation for Image Segmentation
Domain adaptation for Image SegmentationDomain adaptation for Image Segmentation
Domain adaptation for Image SegmentationDeepak Thukral
 
FCN to DeepLab.v3+
FCN to DeepLab.v3+FCN to DeepLab.v3+
FCN to DeepLab.v3+Whi Kwon
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving IYu Huang
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
Deep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentationDeep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentation경훈 김
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Universitat Politècnica de Catalunya
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving IIYu Huang
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)Yu Huang
 
Introduction to Grad-CAM (short version)
Introduction to Grad-CAM (short version)Introduction to Grad-CAM (short version)
Introduction to Grad-CAM (short version)Hsing-chuan Hsieh
 

What's hot (20)

Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Visual Object Tracking: review
Visual Object Tracking: reviewVisual Object Tracking: review
Visual Object Tracking: review
 
CHARGING INNOVATIONS AND INFRASTRUCTURE DEPLOYMENT FOR 2030 EV ADOPTION
CHARGING INNOVATIONS AND INFRASTRUCTURE DEPLOYMENT FOR 2030 EV ADOPTIONCHARGING INNOVATIONS AND INFRASTRUCTURE DEPLOYMENT FOR 2030 EV ADOPTION
CHARGING INNOVATIONS AND INFRASTRUCTURE DEPLOYMENT FOR 2030 EV ADOPTION
 
Hybrid electric vehicle
Hybrid electric vehicleHybrid electric vehicle
Hybrid electric vehicle
 
Lecture_16_Self-supervised_Learning.pptx
Lecture_16_Self-supervised_Learning.pptxLecture_16_Self-supervised_Learning.pptx
Lecture_16_Self-supervised_Learning.pptx
 
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving II
 
Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421Yolo v2 ai_tech_20190421
Yolo v2 ai_tech_20190421
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Domain adaptation for Image Segmentation
Domain adaptation for Image SegmentationDomain adaptation for Image Segmentation
Domain adaptation for Image Segmentation
 
FCN to DeepLab.v3+
FCN to DeepLab.v3+FCN to DeepLab.v3+
FCN to DeepLab.v3+
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving I
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Deep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentationDeep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentation
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
 
Introduction to Grad-CAM (short version)
Introduction to Grad-CAM (short version)Introduction to Grad-CAM (short version)
Introduction to Grad-CAM (short version)
 

Similar to The New Perception Framework in Autonomous Driving: An Introduction of BEV Network

Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
An Experimental Analysis on Self Driving Car Using CNN
An Experimental Analysis on Self Driving Car Using CNNAn Experimental Analysis on Self Driving Car Using CNN
An Experimental Analysis on Self Driving Car Using CNNIRJET Journal
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingYu Huang
 
IRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep LearningIRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep LearningIRJET Journal
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIYu Huang
 
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...IRJET Journal
 
Car Steering Angle Prediction Using Deep Learning
Car Steering Angle Prediction Using Deep LearningCar Steering Angle Prediction Using Deep Learning
Car Steering Angle Prediction Using Deep LearningIRJET Journal
 
Review On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsReview On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsIRJET Journal
 
20100117US001c-3DVisualizationOfRailroadWheelFlaws
20100117US001c-3DVisualizationOfRailroadWheelFlaws20100117US001c-3DVisualizationOfRailroadWheelFlaws
20100117US001c-3DVisualizationOfRailroadWheelFlawsBen Rayner
 
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
IRJET- Automatic Traffic Sign Detection and Recognition using CNNIRJET- Automatic Traffic Sign Detection and Recognition using CNN
IRJET- Automatic Traffic Sign Detection and Recognition using CNNIRJET Journal
 
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...Edge AI and Vision Alliance
 
IRJET- Traffic Sign Classification and Detection using Deep Learning
IRJET- Traffic Sign Classification and Detection using Deep LearningIRJET- Traffic Sign Classification and Detection using Deep Learning
IRJET- Traffic Sign Classification and Detection using Deep LearningIRJET Journal
 
IRJET- Front View Identification of Vehicles by using Machine Learning Te...
IRJET-  	  Front View Identification of Vehicles by using Machine Learning Te...IRJET-  	  Front View Identification of Vehicles by using Machine Learning Te...
IRJET- Front View Identification of Vehicles by using Machine Learning Te...IRJET Journal
 
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...SBGC
 
Bandit framework for systematic learning in wireless video based face recogni...
Bandit framework for systematic learning in wireless video based face recogni...Bandit framework for systematic learning in wireless video based face recogni...
Bandit framework for systematic learning in wireless video based face recogni...ieeepondy
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Real Time Object Identification for Intelligent Video Surveillance Applications
Real Time Object Identification for Intelligent Video Surveillance ApplicationsReal Time Object Identification for Intelligent Video Surveillance Applications
Real Time Object Identification for Intelligent Video Surveillance ApplicationsEditor IJCATR
 
Vision-Based Motorcycle Crash Detection and Reporting Using Deep Learning
Vision-Based Motorcycle Crash Detection and Reporting Using Deep LearningVision-Based Motorcycle Crash Detection and Reporting Using Deep Learning
Vision-Based Motorcycle Crash Detection and Reporting Using Deep LearningIRJET Journal
 
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...IRJET Journal
 
BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC...
 BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC... BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC...
BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC...Nexgen Technology
 

Similar to The New Perception Framework in Autonomous Driving: An Introduction of BEV Network (20)

Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
An Experimental Analysis on Self Driving Car Using CNN
An Experimental Analysis on Self Driving Car Using CNNAn Experimental Analysis on Self Driving Car Using CNN
An Experimental Analysis on Self Driving Car Using CNN
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
IRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep LearningIRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep Learning
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving III
 
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
 
Car Steering Angle Prediction Using Deep Learning
Car Steering Angle Prediction Using Deep LearningCar Steering Angle Prediction Using Deep Learning
Car Steering Angle Prediction Using Deep Learning
 
Review On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsReview On Different Feature Extraction Algorithms
Review On Different Feature Extraction Algorithms
 
20100117US001c-3DVisualizationOfRailroadWheelFlaws
20100117US001c-3DVisualizationOfRailroadWheelFlaws20100117US001c-3DVisualizationOfRailroadWheelFlaws
20100117US001c-3DVisualizationOfRailroadWheelFlaws
 
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
IRJET- Automatic Traffic Sign Detection and Recognition using CNNIRJET- Automatic Traffic Sign Detection and Recognition using CNN
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
 
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
 
IRJET- Traffic Sign Classification and Detection using Deep Learning
IRJET- Traffic Sign Classification and Detection using Deep LearningIRJET- Traffic Sign Classification and Detection using Deep Learning
IRJET- Traffic Sign Classification and Detection using Deep Learning
 
IRJET- Front View Identification of Vehicles by using Machine Learning Te...
IRJET-  	  Front View Identification of Vehicles by using Machine Learning Te...IRJET-  	  Front View Identification of Vehicles by using Machine Learning Te...
IRJET- Front View Identification of Vehicles by using Machine Learning Te...
 
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
 
Bandit framework for systematic learning in wireless video based face recogni...
Bandit framework for systematic learning in wireless video based face recogni...Bandit framework for systematic learning in wireless video based face recogni...
Bandit framework for systematic learning in wireless video based face recogni...
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Real Time Object Identification for Intelligent Video Surveillance Applications
Real Time Object Identification for Intelligent Video Surveillance ApplicationsReal Time Object Identification for Intelligent Video Surveillance Applications
Real Time Object Identification for Intelligent Video Surveillance Applications
 
Vision-Based Motorcycle Crash Detection and Reporting Using Deep Learning
Vision-Based Motorcycle Crash Detection and Reporting Using Deep LearningVision-Based Motorcycle Crash Detection and Reporting Using Deep Learning
Vision-Based Motorcycle Crash Detection and Reporting Using Deep Learning
 
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...IRJET -  	  A Survey Paper on Efficient Object Detection and Matching using F...
IRJET - A Survey Paper on Efficient Object Detection and Matching using F...
 
BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC...
 BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC... BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC...
BULK IEEE PROJECTS IN MATLAB ,BULK IEEE PROJECTS, IEEE 2015-16 MATLAB PROJEC...
 

More from Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningYu Huang
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainYu Huang
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksYu Huang
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image VYu Huang
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IVYu Huang
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image IIIYu Huang
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic SegmentationYu Huang
 

More from Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucks
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 

Recently uploaded

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 

Recently uploaded (20)

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 

The New Perception Framework in Autonomous Driving: An Introduction of BEV Network

  • 1. The New Perception Framework in Autonomous Driving: Yu Huang Chief Scientist AnIntroductionofBEVNetwork
  • 3. Autonomous Driving is one of the most challenging AI applications in the world, defined from L2 to L5, with Operation Design Domain, like Highway pilot, Urban pilot, Traffic Jam pilot, Robtaxi/bus/truck etc.
  • 4. A solution could be modular, i.e. a pipeline of perception, mapping & localization, prediction, planning and control, or end-to-end (E2E) or partially E2E;
  • 5. n There are roughly two research & development routes, progressive step by step (L2->L4) or leaps and bounds (L4), additionally acting like dimension reduction (L4->L2+); n Challenging problems in AV: long tailed with corner cases, safety-critical scenarios, and mass production requirements (closed loop).
  • 6. BEV Network The Bird’s-Eye-View (BEV) is a natural view to serve as a unified representation for 3-D environment understanding for perception module in autonomous driving;
  • 7. BEV contains rich semantic info, precise localization, and absolute scales, which can be directly deployed by many downstream real-world applications such as behavior prediction, motion planning, etc. BEVerse for 3D detection/map segmentation/motion prediction
  • 8. n BEV provides a physics-interpretable way to fuse information from different views, modalities, time series, and agents. v Spatial and temporal fusion BEVFormer for multiple cameras’ spatial-temporal fusion
  • 9. n BEV provides a physics-interpretable way to fuse information from different views, modalities, time series, and agents. v Sensor fusion Multi-task Fusion framework in BEVFusion
  • 10. n BEV provides a physics-interpretable way to fuse information from different views, modalities, time series, and agents. v V2X collaboration UniBEV
  • 11. View transformation plays a vital role in camera-only 3D perception, from Perspective View (PV) to BEV. Copied from the survey paper “Delving into the Devils of Bird’s-eye-view Perception”
  • 12. Current BEV approaches can be divided into two main categories based on view transformation: geometry-based and network-based; Copied from the survey paper “Delving into the Devils of Bird’s-eye-view Perception” geometry-based network-based
  • 13. In geometry-based methods, earlier work tries homograph based on the flat-ground constraints. Sim2Real for BEV Segmentation
  • 14. The state-of-art solution in geometry-based approaches is lifting 2D features to 3D space by explicit or implicit depth estimation, i.e. depth-based (point-based or voxel-based). Lift, Splat, Shoot (LSS)
  • 15. In network-based methods, the straightforward idea is to use MLP in a bottom-up strategy to project the PV features to BEV; Fishing Net for Semantic Segmentation
  • 16. Another framework in network-based BEV employs a top-down strategy by directly constructing BEV queries and searching corresponding features on PV images by the cross attention mechanism, i.e. transformer (with either sparse queries or dense queries). Ego3RT: Ego 3D Representation
  • 17. Though by a hard flat-ground assumption, homograph-based methods has good interpretability, where IPM (inverse perspective mapping) plays a role in image projection or feature projection for downstream perception tasks; Depth-based methods are usually built on an explicit 3D representation, quantized voxels or point clouds (like pseudo-LiDAR) scattering in continuous 3D space. l Point-based suffer from the model complexity and lower performance; l Voxel-based is popular due to computation efficiency and flexibility. MLP-based view transform is hard due to lack of depth info, occlusion etc.; Transformer with either sparse (detection) or dense (map segmentation as well) queries, gains impressive performance with strong relation modeling and data-dependent property, but the efficiency is still a problem.
  • 18. 01 02 03 • Backbone (RegNet)/Bottleneck (FPN) 04 • Shared backbone or not? 05 • Auxiliary task design, multiple stage training 06 07
  • 19. To apply BEV for autonomous driving, a data closed loop is required to build: • Data selection is performed at both the vehicle and server side, where the data is selected from the vehicles based on rough rules ,like shadow modes, abnormal driving operations or specific scenario detection, and then the collected data at the server selectively goes to annotation and training based on AI rules, such as active learning;
  • 20. To apply BEV for autonomous driving, a data closed loop is required to build: • A big model (offline, non-real-time) for BEV only works at the server, where transformer network with dense queries is used for view transform; 毫末
  • 21. To apply BEV for autonomous driving, a data closed loop is required to build: • A light model (real-time online) for BEV is deployed only for the vehicle on board, where the voxel-based view transform with depth supervision is designed;
  • 22. To apply BEV for autonomous driving, a data closed loop is required to build: • BEV data annotation is specific due to innate 3-D structure, captured either from 3-D sensor (LiDAR) NuScenes
  • 23. To apply BEV for autonomous driving, a data closed loop is required to build: • BEV data annotation is specific due to innate 3-D structure, captured either from 3-D sensor (LiDAR) or from 3-D visual reconstruction of cameras; Images IMU Odometry GPS Big Neural Net Model Segment Depth Flow Static BG & Ego Traject Moving Objects & Kine Tesla Elevation
  • 24. To apply BEV for autonomous driving, a data closed loop is required to build: • Simulation platform is used for photo-realistic image data synthesis, digital twin (from real-to-sim) , scenario generalization and style transfer (from sim-to-real); Google Block-NeRF Simulation with ground truth Carla Simulator Nvidia OmniVerse
  • 25. To apply BEV for autonomous driving, a data closed loop is required to build: • A teacher-student training framework assists the knowledge distillation in BEV model training and deployment.
  • 26. BEV network is the new paradigm for computer vision, showing its strong potential in autonomous driving application; BEV’s network design relies on the computing platform, either at the server side or the client side (vehicle in ADS); The data closed loop is a must for autonomous driving R&D, where BEV needs pay more attention to data selection and annotation; Simulation platform can relieve the burden of BEV data annotation with State-of-art techniques like photorealistic rendering, digital twin, scenario generalization and style transfer etc.; To optimize the best deployment of BEV, knowledge distillation is helpful in trade-off of performance and computation complexity.