SlideShare a Scribd company logo
1 of 50
Download to read offline
BEV’S OBJECT
DETECTION AND
PREDICTION
Yu Huang
Sunnyvale, California
Yu.huang07@gmail.com
OUTLINE
• DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries
• BEVDet: High-Performance Multi-Camera 3D Object Detection in BEV
• BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection
• PETR: Position Embedding Transformation for Multi-View 3D Object Detection
• FIERY: Future Instance Prediction in Bird’s-Eye View from Surround
Monocular Cameras
• BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection
• PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
• ST-P3: E2E Vision-based Autonomous Driving via S-T Feature Learning
DETR3D: 3D OBJECT DETECTION FROM MULTI-
VIEW IMAGES VIA 3D-TO-2D QUERIES
• This method manipulates predictions directly in 3D space, which architecture extracts 2D features
from multiple camera images and then uses a sparse set of 3D object queries to index into these 2D
features, linking 3D positions to multi-view images using camera transformation matrices.
• Finally, the model makes a bounding box prediction per object query, using a set-to-set loss to
measure the discrepancy between the ground-truth and the prediction.
• This top-down approach outperforms its bottom-up counterpart in which object bounding box
prediction follows per-pixel depth estimation, since it does not suffer from the compounding error
introduced by a depth prediction model.
• Moreover, it does not require post-processing such as non-maximum suppression, dramatically
improving inference speed.
DETR3D: 3D OBJECT DETECTION FROM MULTI-
VIEW IMAGES VIA 3D-TO-2D QUERIES
DETR3D: 3D OBJECT DETECTION FROM MULTI-
VIEW IMAGES VIA 3D-TO-2D QUERIES
DETR3D: 3D OBJECT DETECTION FROM MULTI-
VIEW IMAGES VIA 3D-TO-2D QUERIES
BEVDET: HIGH-PERFORMANCE MULTI-CAMERA 3D
OBJECT DETECTION IN BIRD-EYE-VIEW
• BEVDet is developed by following the principle of detecting the 3D objects in Bird-Eye-View (BEV), where
route planning can be handily performed.
• In this paradigm, four kinds of modules are conducted in succession with different roles: an image-view encoder
for encoding feature in image view, a view transformer for feature transformation from image view to BEV, a
BEV encoder for further encoding feature in BEV, and a task-specific head for predicting the targets in BEV.
• reuse the existing modules for constructing BEVDet and make it feasible for multi-camera 3D object detection
by constructing an exclusive data augmentation strategy.
• The proposed paradigm works well in multi-camera 3D object detection and offers a good trade-off between
computing budget and performance.
• BEVDet with 704×256 (1/8 of the competitors) image size scores 29.4% mAP and 38.4% NDS on the nuScenes
val set, which is comparable with FCOS3D (i.e., 2008.2 GFLOPs, 1.7 FPS, 29.5% mAP, and 37.2% NDS), while
requires just 12% computing budget of 239.4 GFLOPs and runs 4.3 times faster.
BEVDET: HIGH-PERFORMANCE MULTI-CAMERA 3D
OBJECT DETECTION IN BIRD-EYE-VIEW
BEVDET: HIGH-PERFORMANCE MULTI-CAMERA 3D
OBJECT DETECTION IN BIRD-EYE-VIEW
BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI-
CAMERA 3D OBJECT DETECTION
• For fundamentally pushing the performance boundary in this area, BEVDet4D is proposed to lift the
scalable BEVDet paradigm from the spatial-only 3D space to the spatial-temporal 4D space.
• It upgrades the framework with a few modifications just for fusing the feature from the previous
frame with the corresponding one in the current frame.
• In this way, with negligible extra computing budget, enable the algorithm to access the temporal cues
by querying and comparing the two candidate features.
• Beyond this, also simplify the velocity learning task by removing the factors of ego-motion and time,
which equips BEVDet4D with robust generalization performance and reduces the velocity error by
52.8%.
BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI-
CAMERA 3D OBJECT DETECTION
BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI-
CAMERA 3D OBJECT DETECTION
BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI-
CAMERA 3D OBJECT DETECTION
BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI-
CAMERA 3D OBJECT DETECTION
PETR: POSITION EMBEDDING TRANSFORMATION
FOR MULTI-VIEW 3D OBJECT DETECTION
• In this paper, develop position embedding transformation (PETR) for multi-view 3D object
detection.
• PETR encodes the position information of 3D coordinates into image features, producing the
3D position-aware features.
• Object query can perceive the 3D position- aware features and perform end-to-end object
detection.
• PETR achieves state-of-the-art performance (50.4% NDS and 44.1% mAP) on standard
nuScenes dataset and ranks 1st place on the benchmark.
PETR: POSITION EMBEDDING TRANSFORMATION
FOR MULTI-VIEW 3D OBJECT DETECTION
(a) In DETR, the object queries interact with 2D features to perform 2D detection. (b) DETR3D
repeatedly projects the generated 3D reference points into image plane and samples the 2D features
to interact with object queries in decoder. (c) PETR generates the 3D position-aware features by
encoding the 3D position embedding into 2D image features. The object queries directly interact with
3D position- aware features and output 3D detection results.
PETR: POSITION EMBEDDING TRANSFORMATION
FOR MULTI-VIEW 3D OBJECT DETECTION
PETR: POSITION EMBEDDING TRANSFORMATION
FOR MULTI-VIEW 3D OBJECT DETECTION
3D Position Encoder
PETR: POSITION EMBEDDING TRANSFORMATION
FOR MULTI-VIEW 3D OBJECT DETECTION
PETR: POSITION EMBEDDING TRANSFORMATION
FOR MULTI-VIEW 3D OBJECT DETECTION
PETR: POSITION EMBEDDING TRANSFORMATION
FOR MULTI-VIEW 3D OBJECT DETECTION
FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE
VIEW FROM SURROUND MONOCULAR CAMERAS
• Driving requires interacting with road agents and predicting their future behaviour in order to
navigate safely.
• FIERY: a probabilistic future prediction model in bird’s-eye view from monocular cameras.
• The model predicts future instance segmentation and motion of dynamic agents that can be
transformed into non-parametric future trajectories.
• The approach combines the perception, sensor fusion and prediction components of a traditional
autonomous driving stack by estimating bird’s-eye-view prediction directly from surround RGB
monocular camera inputs.
• FIERY learns to model the inherent stochastic nature of the future solely from camera driving data
in an end-to- end manner, without relying on HD maps, and predicts multimodal future trajectories.
• The code and trained models are available at https://github.com/wayveai/fiery.
FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE
VIEW FROM SURROUND MONOCULAR CAMERAS
FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE
VIEW FROM SURROUND MONOCULAR CAMERAS
FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE
VIEW FROM SURROUND MONOCULAR CAMERAS
FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE
VIEW FROM SURROUND MONOCULAR CAMERAS
FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE
VIEW FROM SURROUND MONOCULAR CAMERAS
FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE
VIEW FROM SURROUND MONOCULAR CAMERAS
BEVDEPTH: ACQUISITION OF RELIABLE DEPTH
FOR MULTI-VIEW 3D OBJECT DETECTION
• In this research, a new 3D object detector with a trustworthy depth estimation, dubbed BEVDepth,
for camera-based Bird’s-Eye-View (BEV) 3D object detection.
• the depth estimation is implicitly learned without camera information, making it the de-facto fake-
depth for creating the following pseudo point cloud.
• BEVDepth gets explicit depth supervision utilizing encoded intrinsic and extrinsic parameters.
• A depth correction sub-network is further introduced to counteract projecting-induced disturbances in
depth ground truth.
• To reduce the speed bottleneck while projecting features from image-view into BEV using estimated
depth, a quick view-transform operation is also proposed.
• Besides, BEVDepth can be easily extended with input from multi-frame.
BEVDEPTH: ACQUISITION OF RELIABLE DEPTH
FOR MULTI-VIEW 3D OBJECT DETECTION
BEVDEPTH: ACQUISITION OF RELIABLE DEPTH
FOR MULTI-VIEW 3D OBJECT DETECTION
BEVDEPTH: ACQUISITION OF RELIABLE DEPTH
FOR MULTI-VIEW 3D OBJECT DETECTION
BEVDEPTH: ACQUISITION OF RELIABLE DEPTH
FOR MULTI-VIEW 3D OBJECT DETECTION
BEVDEPTH: ACQUISITION OF RELIABLE DEPTH
FOR MULTI-VIEW 3D OBJECT DETECTION
BEVDEPTH: ACQUISITION OF RELIABLE DEPTH
FOR MULTI-VIEW 3D OBJECT DETECTION
PETRV2: A UNIFIED FRAMEWORK FOR 3D
PERCEPTION FROM MULTI-CAMERA IMAGES
• PETRv2, a unified framework for 3D perception from multi-view images.
• Based on PETR, PETRv2 explores the effectiveness of temporal modeling, which utilizes the temporal
information of previous frames to boost 3D object detection.
• More specifically, extend the 3D position embedding (3D PE) in PETR for temporal modeling.
• The 3D PE achieves the temporal alignment on object position of different frames.
• A feature-guided position encoder is further introduced to improve the data adaptability of 3D PE.
• To support for high-quality BEV segmentation, PETRv2 provides a simply yet effective solution by adding a set
of segmentation queries.
• Each segmentation query is responsible for segmenting one specific patch of BEV map.
• Code is available at https://github.com/megvii-research/PETR.
PETRV2: A UNIFIED FRAMEWORK FOR 3D
PERCEPTION FROM MULTI-CAMERA IMAGES
PETRV2: A UNIFIED FRAMEWORK FOR 3D
PERCEPTION FROM MULTI-CAMERA IMAGES
coordinate system transformation feature-guided position encoder
PETRV2: A UNIFIED FRAMEWORK FOR 3D
PERCEPTION FROM MULTI-CAMERA IMAGES
PETRV2: A UNIFIED FRAMEWORK FOR 3D
PERCEPTION FROM MULTI-CAMERA IMAGES
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
• While there are some pineering works on LiDAR-based input or implicit design, this paper formulates
the problem in an interpretable vision-based setting.
• In particular, propose a spatial-temporal feature learning scheme towards a set of more representative
features for perception, prediction and planning tasks simultaneously, which is called ST-P3.
• Specifically, an egocentric-aligned accumulation technique is proposed to preserve geometry
information in 3D space before the bird’s eye view transformation for perception; a dual pathway
modeling is devised to take past motion variations into account for future prediction; a temporal-based
refinement unit is introduced to compensate for recognizing vision-based elements for planning.
• Source code available at https://github.com/OpenPerceptionX/ST-P3.
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
Egocentric aligned accumulation for Perception
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
Dual pathway modelling for Prediction.
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
Prior knowledge integration and refinement for Planning.
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
ST-P3: END-TO-END VISION-BASED AUTONOMOUS
DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
BEV Object Detection and Prediction

More Related Content

What's hot

Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learningYu Huang
 
Skin Cancer Detection using Image Processing in Real Time
Skin Cancer Detection using Image Processing in Real TimeSkin Cancer Detection using Image Processing in Real Time
Skin Cancer Detection using Image Processing in Real Timeijtsrd
 
Tutorial on convolutional neural networks
Tutorial on convolutional neural networksTutorial on convolutional neural networks
Tutorial on convolutional neural networksHojin Yang
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
U-Net (1).pptx
U-Net (1).pptxU-Net (1).pptx
U-Net (1).pptxChangjin Lee
 
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...Joonhyung Lee
 
Dip chapter 2
Dip chapter 2Dip chapter 2
Dip chapter 2Amar Meena
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...Edge AI and Vision Alliance
 
Enhanced Deep Residual Networks for Single Image Super-Resolution
Enhanced Deep Residual Networks for Single Image Super-ResolutionEnhanced Deep Residual Networks for Single Image Super-Resolution
Enhanced Deep Residual Networks for Single Image Super-ResolutionNAVER Engineering
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab
 
Masked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxMasked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxSangmin Woo
 
discrete wavelet transform
discrete wavelet transformdiscrete wavelet transform
discrete wavelet transformpiyush_11
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)Yu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Visual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsVisual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsNAVER Engineering
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and ApplicationsHoang Nguyen
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
Video Transformers.pptx
Video Transformers.pptxVideo Transformers.pptx
Video Transformers.pptxSangmin Woo
 

What's hot (20)

Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learning
 
Skin Cancer Detection using Image Processing in Real Time
Skin Cancer Detection using Image Processing in Real TimeSkin Cancer Detection using Image Processing in Real Time
Skin Cancer Detection using Image Processing in Real Time
 
Tutorial on convolutional neural networks
Tutorial on convolutional neural networksTutorial on convolutional neural networks
Tutorial on convolutional neural networks
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
U-Net (1).pptx
U-Net (1).pptxU-Net (1).pptx
U-Net (1).pptx
 
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
 
Dip chapter 2
Dip chapter 2Dip chapter 2
Dip chapter 2
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
Enhanced Deep Residual Networks for Single Image Super-Resolution
Enhanced Deep Residual Networks for Single Image Super-ResolutionEnhanced Deep Residual Networks for Single Image Super-Resolution
Enhanced Deep Residual Networks for Single Image Super-Resolution
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 
Masked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxMasked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptx
 
discrete wavelet transform
discrete wavelet transformdiscrete wavelet transform
discrete wavelet transform
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Computer Vision Introduction
Computer Vision IntroductionComputer Vision Introduction
Computer Vision Introduction
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Visual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsVisual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environments
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Video Transformers.pptx
Video Transformers.pptxVideo Transformers.pptx
Video Transformers.pptx
 

Similar to BEV Object Detection and Prediction

3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IVYu Huang
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image IIIYu Huang
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving IYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIYu Huang
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4IRJET Journal
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving IIYu Huang
 
Nadia2013 research
Nadia2013 researchNadia2013 research
Nadia2013 researchNadia Barbara
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningYu Huang
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfQualcomm Research
 
CVGIP 2010 Part 3
CVGIP 2010 Part 3CVGIP 2010 Part 3
CVGIP 2010 Part 3Cody Liu
 
Goal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D cameraGoal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D camerajournalBEEI
 
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...c.choi
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Yu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSGanesan Narayanasamy
 
ppt 20BET1024.pptx
ppt 20BET1024.pptxppt 20BET1024.pptx
ppt 20BET1024.pptxManeetBali
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdfAshrafDabbas1
 

Similar to BEV Object Detection and Prediction (20)

3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving I
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving II
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II
 
Nadia2013 research
Nadia2013 researchNadia2013 research
Nadia2013 research
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdf
 
CVGIP 2010 Part 3
CVGIP 2010 Part 3CVGIP 2010 Part 3
CVGIP 2010 Part 3
 
Goal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D cameraGoal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D camera
 
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUS
 
ppt 20BET1024.pptx
ppt 20BET1024.pptxppt 20BET1024.pptx
ppt 20BET1024.pptx
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 

More from Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningYu Huang
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainYu Huang
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksYu Huang
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image VYu Huang
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic SegmentationYu Huang
 

More from Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucks
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 

Recently uploaded

Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 

Recently uploaded (20)

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 

BEV Object Detection and Prediction

  • 1. BEV’S OBJECT DETECTION AND PREDICTION Yu Huang Sunnyvale, California Yu.huang07@gmail.com
  • 2. OUTLINE • DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries • BEVDet: High-Performance Multi-Camera 3D Object Detection in BEV • BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection • PETR: Position Embedding Transformation for Multi-View 3D Object Detection • FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras • BEVDepth: Acquisition of Reliable Depth for Multi-view 3D Object Detection • PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images • ST-P3: E2E Vision-based Autonomous Driving via S-T Feature Learning
  • 3. DETR3D: 3D OBJECT DETECTION FROM MULTI- VIEW IMAGES VIA 3D-TO-2D QUERIES • This method manipulates predictions directly in 3D space, which architecture extracts 2D features from multiple camera images and then uses a sparse set of 3D object queries to index into these 2D features, linking 3D positions to multi-view images using camera transformation matrices. • Finally, the model makes a bounding box prediction per object query, using a set-to-set loss to measure the discrepancy between the ground-truth and the prediction. • This top-down approach outperforms its bottom-up counterpart in which object bounding box prediction follows per-pixel depth estimation, since it does not suffer from the compounding error introduced by a depth prediction model. • Moreover, it does not require post-processing such as non-maximum suppression, dramatically improving inference speed.
  • 4. DETR3D: 3D OBJECT DETECTION FROM MULTI- VIEW IMAGES VIA 3D-TO-2D QUERIES
  • 5. DETR3D: 3D OBJECT DETECTION FROM MULTI- VIEW IMAGES VIA 3D-TO-2D QUERIES
  • 6. DETR3D: 3D OBJECT DETECTION FROM MULTI- VIEW IMAGES VIA 3D-TO-2D QUERIES
  • 7. BEVDET: HIGH-PERFORMANCE MULTI-CAMERA 3D OBJECT DETECTION IN BIRD-EYE-VIEW • BEVDet is developed by following the principle of detecting the 3D objects in Bird-Eye-View (BEV), where route planning can be handily performed. • In this paradigm, four kinds of modules are conducted in succession with different roles: an image-view encoder for encoding feature in image view, a view transformer for feature transformation from image view to BEV, a BEV encoder for further encoding feature in BEV, and a task-specific head for predicting the targets in BEV. • reuse the existing modules for constructing BEVDet and make it feasible for multi-camera 3D object detection by constructing an exclusive data augmentation strategy. • The proposed paradigm works well in multi-camera 3D object detection and offers a good trade-off between computing budget and performance. • BEVDet with 704Ă—256 (1/8 of the competitors) image size scores 29.4% mAP and 38.4% NDS on the nuScenes val set, which is comparable with FCOS3D (i.e., 2008.2 GFLOPs, 1.7 FPS, 29.5% mAP, and 37.2% NDS), while requires just 12% computing budget of 239.4 GFLOPs and runs 4.3 times faster.
  • 8. BEVDET: HIGH-PERFORMANCE MULTI-CAMERA 3D OBJECT DETECTION IN BIRD-EYE-VIEW
  • 9. BEVDET: HIGH-PERFORMANCE MULTI-CAMERA 3D OBJECT DETECTION IN BIRD-EYE-VIEW
  • 10. BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI- CAMERA 3D OBJECT DETECTION • For fundamentally pushing the performance boundary in this area, BEVDet4D is proposed to lift the scalable BEVDet paradigm from the spatial-only 3D space to the spatial-temporal 4D space. • It upgrades the framework with a few modifications just for fusing the feature from the previous frame with the corresponding one in the current frame. • In this way, with negligible extra computing budget, enable the algorithm to access the temporal cues by querying and comparing the two candidate features. • Beyond this, also simplify the velocity learning task by removing the factors of ego-motion and time, which equips BEVDet4D with robust generalization performance and reduces the velocity error by 52.8%.
  • 11. BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI- CAMERA 3D OBJECT DETECTION
  • 12. BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI- CAMERA 3D OBJECT DETECTION
  • 13. BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI- CAMERA 3D OBJECT DETECTION
  • 14. BEVDET4D: EXPLOIT TEMPORAL CUES IN MULTI- CAMERA 3D OBJECT DETECTION
  • 15. PETR: POSITION EMBEDDING TRANSFORMATION FOR MULTI-VIEW 3D OBJECT DETECTION • In this paper, develop position embedding transformation (PETR) for multi-view 3D object detection. • PETR encodes the position information of 3D coordinates into image features, producing the 3D position-aware features. • Object query can perceive the 3D position- aware features and perform end-to-end object detection. • PETR achieves state-of-the-art performance (50.4% NDS and 44.1% mAP) on standard nuScenes dataset and ranks 1st place on the benchmark.
  • 16. PETR: POSITION EMBEDDING TRANSFORMATION FOR MULTI-VIEW 3D OBJECT DETECTION (a) In DETR, the object queries interact with 2D features to perform 2D detection. (b) DETR3D repeatedly projects the generated 3D reference points into image plane and samples the 2D features to interact with object queries in decoder. (c) PETR generates the 3D position-aware features by encoding the 3D position embedding into 2D image features. The object queries directly interact with 3D position- aware features and output 3D detection results.
  • 17. PETR: POSITION EMBEDDING TRANSFORMATION FOR MULTI-VIEW 3D OBJECT DETECTION
  • 18. PETR: POSITION EMBEDDING TRANSFORMATION FOR MULTI-VIEW 3D OBJECT DETECTION 3D Position Encoder
  • 19. PETR: POSITION EMBEDDING TRANSFORMATION FOR MULTI-VIEW 3D OBJECT DETECTION
  • 20. PETR: POSITION EMBEDDING TRANSFORMATION FOR MULTI-VIEW 3D OBJECT DETECTION
  • 21. PETR: POSITION EMBEDDING TRANSFORMATION FOR MULTI-VIEW 3D OBJECT DETECTION
  • 22. FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE VIEW FROM SURROUND MONOCULAR CAMERAS • Driving requires interacting with road agents and predicting their future behaviour in order to navigate safely. • FIERY: a probabilistic future prediction model in bird’s-eye view from monocular cameras. • The model predicts future instance segmentation and motion of dynamic agents that can be transformed into non-parametric future trajectories. • The approach combines the perception, sensor fusion and prediction components of a traditional autonomous driving stack by estimating bird’s-eye-view prediction directly from surround RGB monocular camera inputs. • FIERY learns to model the inherent stochastic nature of the future solely from camera driving data in an end-to- end manner, without relying on HD maps, and predicts multimodal future trajectories. • The code and trained models are available at https://github.com/wayveai/fiery.
  • 23. FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE VIEW FROM SURROUND MONOCULAR CAMERAS
  • 24. FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE VIEW FROM SURROUND MONOCULAR CAMERAS
  • 25. FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE VIEW FROM SURROUND MONOCULAR CAMERAS
  • 26. FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE VIEW FROM SURROUND MONOCULAR CAMERAS
  • 27. FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE VIEW FROM SURROUND MONOCULAR CAMERAS
  • 28. FIERY: FUTURE INSTANCE PREDICTION IN BIRD’S-EYE VIEW FROM SURROUND MONOCULAR CAMERAS
  • 29. BEVDEPTH: ACQUISITION OF RELIABLE DEPTH FOR MULTI-VIEW 3D OBJECT DETECTION • In this research, a new 3D object detector with a trustworthy depth estimation, dubbed BEVDepth, for camera-based Bird’s-Eye-View (BEV) 3D object detection. • the depth estimation is implicitly learned without camera information, making it the de-facto fake- depth for creating the following pseudo point cloud. • BEVDepth gets explicit depth supervision utilizing encoded intrinsic and extrinsic parameters. • A depth correction sub-network is further introduced to counteract projecting-induced disturbances in depth ground truth. • To reduce the speed bottleneck while projecting features from image-view into BEV using estimated depth, a quick view-transform operation is also proposed. • Besides, BEVDepth can be easily extended with input from multi-frame.
  • 30. BEVDEPTH: ACQUISITION OF RELIABLE DEPTH FOR MULTI-VIEW 3D OBJECT DETECTION
  • 31. BEVDEPTH: ACQUISITION OF RELIABLE DEPTH FOR MULTI-VIEW 3D OBJECT DETECTION
  • 32. BEVDEPTH: ACQUISITION OF RELIABLE DEPTH FOR MULTI-VIEW 3D OBJECT DETECTION
  • 33. BEVDEPTH: ACQUISITION OF RELIABLE DEPTH FOR MULTI-VIEW 3D OBJECT DETECTION
  • 34. BEVDEPTH: ACQUISITION OF RELIABLE DEPTH FOR MULTI-VIEW 3D OBJECT DETECTION
  • 35. BEVDEPTH: ACQUISITION OF RELIABLE DEPTH FOR MULTI-VIEW 3D OBJECT DETECTION
  • 36. PETRV2: A UNIFIED FRAMEWORK FOR 3D PERCEPTION FROM MULTI-CAMERA IMAGES • PETRv2, a unified framework for 3D perception from multi-view images. • Based on PETR, PETRv2 explores the effectiveness of temporal modeling, which utilizes the temporal information of previous frames to boost 3D object detection. • More specifically, extend the 3D position embedding (3D PE) in PETR for temporal modeling. • The 3D PE achieves the temporal alignment on object position of different frames. • A feature-guided position encoder is further introduced to improve the data adaptability of 3D PE. • To support for high-quality BEV segmentation, PETRv2 provides a simply yet effective solution by adding a set of segmentation queries. • Each segmentation query is responsible for segmenting one specific patch of BEV map. • Code is available at https://github.com/megvii-research/PETR.
  • 37. PETRV2: A UNIFIED FRAMEWORK FOR 3D PERCEPTION FROM MULTI-CAMERA IMAGES
  • 38. PETRV2: A UNIFIED FRAMEWORK FOR 3D PERCEPTION FROM MULTI-CAMERA IMAGES coordinate system transformation feature-guided position encoder
  • 39. PETRV2: A UNIFIED FRAMEWORK FOR 3D PERCEPTION FROM MULTI-CAMERA IMAGES
  • 40. PETRV2: A UNIFIED FRAMEWORK FOR 3D PERCEPTION FROM MULTI-CAMERA IMAGES
  • 41. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING • While there are some pineering works on LiDAR-based input or implicit design, this paper formulates the problem in an interpretable vision-based setting. • In particular, propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously, which is called ST-P3. • Specifically, an egocentric-aligned accumulation technique is proposed to preserve geometry information in 3D space before the bird’s eye view transformation for perception; a dual pathway modeling is devised to take past motion variations into account for future prediction; a temporal-based refinement unit is introduced to compensate for recognizing vision-based elements for planning. • Source code available at https://github.com/OpenPerceptionX/ST-P3.
  • 42. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
  • 43. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING Egocentric aligned accumulation for Perception
  • 44. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING Dual pathway modelling for Prediction.
  • 45. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING Prior knowledge integration and refinement for Planning.
  • 46. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
  • 47. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
  • 48. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING
  • 49. ST-P3: END-TO-END VISION-BASED AUTONOMOUS DRIVING VIA SPATIAL-TEMPORAL FEATURE LEARNING