SlideShare a Scribd company logo
3D Interpretation from Stereo Images
for Autonomous Driving
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
• Object-Centric Stereo Matching for 3D Object Detection
• Triangulation Learning Network: from Monocular to Stereo 3D Object Detection
• Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object
Detection for Autonomous Driving
• Stereo R-CNN based 3D Object Detection for Autonomous Driving
• 3d object proposals for accurate object class detection
Object-Centric Stereo Matching for 3D
Object Detection
• The current SoA for stereo 3D object detection takes the existing PSMNet stereo matching
network, with no modifications, and converts the estimated disparities into a 3D point cloud,
and feeds this point cloud into a LiDAR-based 3D object detector.
• The issue with existing stereo matching networks is that they are designed for disparity
estimation, not 3D object detection; the shape and accuracy of object point clouds are not
the focus.
• Stereo matching networks commonly suffer from inaccurate depth estimates at object
boundaries, which this method defines as streaking, because BG and FG points are jointly
estimated.
• Existing networks also penalize disparity instead of the estimated position of object point
clouds in their loss functions.
• Here it proposes a 2D box association and object-centric stereo matching method that only
estimates the disparities of the objects of interest to address these two issues.
Object-Centric Stereo Matching for 3D
Object Detection
First, a 2D detector generates 2D boxes in Il and Ir. Next, a box association algorithm matches object detections
across both images. Each matched detection pair is passed into the object-centric stereo network, which jointly
produces a disparity map and instance segmentation mask for each object. Together, these form a disparity
map containing only the objects of interest. Lastly, the disparity map is transformed into a point cloud that can
be used by any LiDAR-based 3D object detection network to predict the 3D bounding boxes.
Object-Centric Stereo Matching for 3D
Object Detection
Qualitative results on KITTI. Ground truth and predictions are in red and green,
respectively. Colored points are predicted by our stereo matching network
while LiDAR points are shown in black for visualization purposes only.
Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
• For 3D object detection from stereo images, the key challenge is how to effectively utilize
stereo information.
• Different from previous methods using pixel-level depth maps, this method employs 3D
anchors to explicitly construct object-level correspondences between the ROI in stereo
images, from which DNN learns to detect and triangulate the targeted object in 3D space.
• It introduces a cost-efficient channel reweighting strategy that enhances representational
features and weakens noisy signals to facilitate the learning process.
• All of these are flexibly integrated into a solid baseline detector that uses monocular images.
• It is demonstrated that both the monocular baseline and the stereo triangulation learning
network outperform the prior SoA in 3D object detection and localization on KITTI dataset.
Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
Overview of the 3D detection pipeline. The baseline monocular network is indicated with
blue background, and can be easily extended to stereo inputs by duplicating the baseline
and further integrating with the TLNet (Triangulation Learning Network).
Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
• The baseline network taking a mono image as the input is composed of a backbone and 3
subsequent modules, i.e. front view anchor generation, 3D box proposal and refinement.
• The three-stage pipeline progressively reduces the searching space by selecting confident
anchors, which highly reduces computational complexity.
• The stereo 3D detection is performed by integrating a triangulation learning network (TL-
Net) into the baseline model.
• Triangulation is known as localizing 3D points from multi- view images in the classical
geometry fields, while this objective is to localize a 3D object and estimates its size and
orientation from stereo images.
• To achieve this, introduce an anchor triangulation scheme, in which the NN uses 3D
anchors as reference to triangulate the targets.
Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
Front view anchor generation. Potential anchors are of high objectness in the front view. Only
the potential anchors are fed into RPN to reduce searching space and save computational cost.
Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
Anchor triangulation. By projecting the 3D
anchor box to stereo images, obtain a pair of
RoIs. The left RoI establishes a geometric
correspondence with the right one via the
anchor box. The nearby target is present in
both RoIs with slightly positional differences.
The TLNet takes the RoI pair as input and
utilizes the 3D anchor as reference to localize
the targeted object.
Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
The TLNet takes as input a pair of left-
right RoI features Fl and Fr with Croi
channels and size Hroi ×Wroi, which are
obtained using RoIAlign by projecting
the same 3D anchor to the left and right
frames. To utilize the left-right coherence
scores to reweight each channel. The
reweighted features are fused using
element-wise addition and passed to
task-specific fully-connected layers to
predict the objectness confidence and
3D bounding box offsets, i.e., the 3D
geometric variance between the anchor
and target.
Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
Orange bounding boxes are detection results, while the green boxes are ground truths. For the main method, also
visualize the projected 3D bounding boxes in image, i.e., the first and forth rows. The lidar point clouds are
visualized for reference but not used in both training and evaluation. It is shown that the triangulation learning
method can reduce missed detections and improve the performance of depth prediction at distant regions.
Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
• Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in
drastically lower accuracies — a gap that is commonly attributed to poor image-based
depth estimation.
• However, it is not the quality of the data but its representation that accounts for the majority
of the difference.
• Taking the inner workings of CNNs into consideration, convert image-based depth maps to
pseudo- LiDAR representations — essentially mimicking the LiDAR signal.
• With this representation, apply different existing LiDAR-based detection algorithms.
• On the popular KITTI benchmark, this approach achieves impressive improvements over the
existing state-of-the-art in image-based performance — raising the detection accuracy of
objects within the 30m range from the previous state-of-the-art of 22% to an
unprecedented 74%.
Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
The pipeline for image-based 3D object detection. Given stereo or monocular images, first predict the depth map,
followed by back-projecting it into a 3D point cloud in the LiDAR coordinate system. Refer this representation as pseudo-
LiDAR, and process it exactly like LiDAR — any LiDAR-based detection algorithms can be applied.
Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
Apply a single 2D convolution with a uniform kernel to the frontal view depth map (top-left). The resulting depth
map (top-right), after back-projected into pseudo- LiDAR and displayed from the bird’s-eye view (bottom- right),
reveals a large depth distortion in comparison to the original pseudo-LiDAR representation (bottom-left),
especially for far-away objects. Mark points of each car instance by a color. The boxes are super-imposed and
contain all points of the green and cyan cars respectively.
Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
Qualitative comparison. Compare AVOD with LiDAR, pseudo-LiDAR, and frontal-view (stereo).
Ground- truth boxes are in red, predicted boxes in green; the observer in the pseudo-LiDAR
plots (bottom row) is on the very left side looking to the right. The frontal-view approach
(right) even miscalculates the depths of nearby objects and misses far-away objects entirely.
Stereo R-CNN based 3D Object Detection
for Autonomous Driving
• A 3D object detection method for autonomous driving by fully exploiting the sparse and
dense, semantic and geometry information in stereo imagery.
• This method, called Stereo R-CNN, extends Faster R-CNN for stereo inputs to
simultaneously detect and associate object in left and right images.
• Add extra branches after stereo Region Proposal Network (RPN) to predict sparse keypoints,
viewpoints, and object dimensions, which are combined with 2D left-right boxes to calculate
a coarse1 3D object bounding box.
• Then to recover the accurate 3D bounding box by a region-based photometric alignment
using left and right RoIs.
• This method does not require depth input and 3D position supervision, however,
outperforms all existing fully supervised image-based methods.
• Code released at https://github.com/HKUST-Aerial-Robotics/Stereo-RCNN.
Stereo R-CNN based 3D Object Detection
for Autonomous Driving
The stereo R-CNN outputs stereo boxes, keypoints, dimensions, and the viewpoint angle,
followed by the 3D box estimation and the dense 3D box alignment module.
Stereo R-CNN based 3D Object Detection
for Autonomous Driving
Relations between object orientation θ,
azimuth β and viewpoint θ + β. Only same
viewpoints lead to same projections.
Different targets assignment for RPN classification and
regression.
Stereo R-CNN based 3D Object Detection
for Autonomous Driving
3D semantic keypoints, the 2D perspective keypoint, and boundary keypoints.
Stereo R-CNN based 3D Object Detection
for Autonomous Driving
Sparse constraints for the 3D box estimation
Stereo R-CNN based 3D Object Detection
for Autonomous Driving
From top to bottom: detections on left image, right image, and bird’s eye view image.
3-d interpretation from stereo images for autonomous driving

More Related Content

What's hot

Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving V
Yu Huang
 
Driving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIDriving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XI
Yu Huang
 
Depth Fusion from RGB and Depth Sensors III
Depth Fusion from RGB and Depth Sensors  IIIDepth Fusion from RGB and Depth Sensors  III
Depth Fusion from RGB and Depth Sensors III
Yu Huang
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
Yu Huang
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
Yu Huang
 
Driving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIDriving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving III
Yu Huang
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
Yu Huang
 
Pedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IVPedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IV
Yu Huang
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XII
Yu Huang
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learning
Yu Huang
 
Fisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving IIFisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving II
Yu Huang
 
Driving behaviors for adas and autonomous driving XIII
Driving behaviors for adas and autonomous driving XIIIDriving behaviors for adas and autonomous driving XIII
Driving behaviors for adas and autonomous driving XIII
Yu Huang
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving III
Yu Huang
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
Yu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
Yu Huang
 
Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous Driving
Yu Huang
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal Data
Yu Huang
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
Yu Huang
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IV
Yu Huang
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
Yu Huang
 

What's hot (20)

Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving V
 
Driving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIDriving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XI
 
Depth Fusion from RGB and Depth Sensors III
Depth Fusion from RGB and Depth Sensors  IIIDepth Fusion from RGB and Depth Sensors  III
Depth Fusion from RGB and Depth Sensors III
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
Driving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIDriving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving III
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
 
Pedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IVPedestrian behavior/intention modeling for autonomous driving IV
Pedestrian behavior/intention modeling for autonomous driving IV
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XII
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learning
 
Fisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving IIFisheye Omnidirectional View in Autonomous Driving II
Fisheye Omnidirectional View in Autonomous Driving II
 
Driving behaviors for adas and autonomous driving XIII
Driving behaviors for adas and autonomous driving XIIIDriving behaviors for adas and autonomous driving XIII
Driving behaviors for adas and autonomous driving XIII
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving III
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous Driving
 
Deep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal DataDeep Learning’s Application in Radar Signal Data
Deep Learning’s Application in Radar Signal Data
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IV
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 

Similar to 3-d interpretation from stereo images for autonomous driving

fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving I
Yu Huang
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving II
Yu Huang
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
Yu Huang
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
Yu Huang
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
mokamojah
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
jungunkim39
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
jungunkim39
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
Deep Learning JP
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
Yu Huang
 
Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep Learning
Yu Huang
 
Deep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationDeep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentation
VijaylaxmiNagurkar
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V
Yu Huang
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors II
Yu Huang
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
Yu Huang
 
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
CSCJournals
 
Vision based non-invasive tool for facial swelling assessment
Vision based non-invasive tool for facial swelling assessment Vision based non-invasive tool for facial swelling assessment
Vision based non-invasive tool for facial swelling assessment
University of Moratuwa
 
Sensors optimized for 3 d digitization
Sensors optimized for 3 d digitizationSensors optimized for 3 d digitization
Sensors optimized for 3 d digitization
Basavaraj Patted
 
Goal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D cameraGoal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D camera
journalBEEI
 
Indoor scene understanding for autonomous agents
Indoor scene understanding for autonomous agentsIndoor scene understanding for autonomous agents
Indoor scene understanding for autonomous agents
Varun Bhaseen
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
IRJET Journal
 

Similar to 3-d interpretation from stereo images for autonomous driving (20)

fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving I
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving II
 
3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III3-d interpretation from single 2-d image III
3-d interpretation from single 2-d image III
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
 
Mmpaper draft10
Mmpaper draft10Mmpaper draft10
Mmpaper draft10
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
Stereo Matching by Deep Learning
Stereo Matching by Deep LearningStereo Matching by Deep Learning
Stereo Matching by Deep Learning
 
Deep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentationDeep learning for 3 d point clouds presentation
Deep learning for 3 d point clouds presentation
 
3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V3-d interpretation from single 2-d image V
3-d interpretation from single 2-d image V
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors II
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...
 
Vision based non-invasive tool for facial swelling assessment
Vision based non-invasive tool for facial swelling assessment Vision based non-invasive tool for facial swelling assessment
Vision based non-invasive tool for facial swelling assessment
 
Sensors optimized for 3 d digitization
Sensors optimized for 3 d digitizationSensors optimized for 3 d digitization
Sensors optimized for 3 d digitization
 
Goal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D cameraGoal location prediction based on deep learning using RGB-D camera
Goal location prediction based on deep learning using RGB-D camera
 
Indoor scene understanding for autonomous agents
Indoor scene understanding for autonomous agentsIndoor scene understanding for autonomous agents
Indoor scene understanding for autonomous agents
 
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
Real-time 3D Object Detection on LIDAR Point Cloud using Complex- YOLO V4
 

More from Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
Yu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
Yu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
Yu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
Yu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
Yu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
Yu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
Yu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
Yu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
Yu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
Yu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
Yu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
Yu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
Yu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
Yu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
Yu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
Yu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
Yu Huang
 

More from Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 

Recently uploaded

MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 

Recently uploaded (20)

MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 

3-d interpretation from stereo images for autonomous driving

  • 1. 3D Interpretation from Stereo Images for Autonomous Driving Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline • Object-Centric Stereo Matching for 3D Object Detection • Triangulation Learning Network: from Monocular to Stereo 3D Object Detection • Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving • Stereo R-CNN based 3D Object Detection for Autonomous Driving • 3d object proposals for accurate object class detection
  • 3. Object-Centric Stereo Matching for 3D Object Detection • The current SoA for stereo 3D object detection takes the existing PSMNet stereo matching network, with no modifications, and converts the estimated disparities into a 3D point cloud, and feeds this point cloud into a LiDAR-based 3D object detector. • The issue with existing stereo matching networks is that they are designed for disparity estimation, not 3D object detection; the shape and accuracy of object point clouds are not the focus. • Stereo matching networks commonly suffer from inaccurate depth estimates at object boundaries, which this method defines as streaking, because BG and FG points are jointly estimated. • Existing networks also penalize disparity instead of the estimated position of object point clouds in their loss functions. • Here it proposes a 2D box association and object-centric stereo matching method that only estimates the disparities of the objects of interest to address these two issues.
  • 4. Object-Centric Stereo Matching for 3D Object Detection First, a 2D detector generates 2D boxes in Il and Ir. Next, a box association algorithm matches object detections across both images. Each matched detection pair is passed into the object-centric stereo network, which jointly produces a disparity map and instance segmentation mask for each object. Together, these form a disparity map containing only the objects of interest. Lastly, the disparity map is transformed into a point cloud that can be used by any LiDAR-based 3D object detection network to predict the 3D bounding boxes.
  • 5. Object-Centric Stereo Matching for 3D Object Detection Qualitative results on KITTI. Ground truth and predictions are in red and green, respectively. Colored points are predicted by our stereo matching network while LiDAR points are shown in black for visualization purposes only.
  • 6. Triangulation Learning Network: from Monocular to Stereo 3D Object Detection • For 3D object detection from stereo images, the key challenge is how to effectively utilize stereo information. • Different from previous methods using pixel-level depth maps, this method employs 3D anchors to explicitly construct object-level correspondences between the ROI in stereo images, from which DNN learns to detect and triangulate the targeted object in 3D space. • It introduces a cost-efficient channel reweighting strategy that enhances representational features and weakens noisy signals to facilitate the learning process. • All of these are flexibly integrated into a solid baseline detector that uses monocular images. • It is demonstrated that both the monocular baseline and the stereo triangulation learning network outperform the prior SoA in 3D object detection and localization on KITTI dataset.
  • 7. Triangulation Learning Network: from Monocular to Stereo 3D Object Detection Overview of the 3D detection pipeline. The baseline monocular network is indicated with blue background, and can be easily extended to stereo inputs by duplicating the baseline and further integrating with the TLNet (Triangulation Learning Network).
  • 8. Triangulation Learning Network: from Monocular to Stereo 3D Object Detection • The baseline network taking a mono image as the input is composed of a backbone and 3 subsequent modules, i.e. front view anchor generation, 3D box proposal and refinement. • The three-stage pipeline progressively reduces the searching space by selecting confident anchors, which highly reduces computational complexity. • The stereo 3D detection is performed by integrating a triangulation learning network (TL- Net) into the baseline model. • Triangulation is known as localizing 3D points from multi- view images in the classical geometry fields, while this objective is to localize a 3D object and estimates its size and orientation from stereo images. • To achieve this, introduce an anchor triangulation scheme, in which the NN uses 3D anchors as reference to triangulate the targets.
  • 9. Triangulation Learning Network: from Monocular to Stereo 3D Object Detection Front view anchor generation. Potential anchors are of high objectness in the front view. Only the potential anchors are fed into RPN to reduce searching space and save computational cost.
  • 10. Triangulation Learning Network: from Monocular to Stereo 3D Object Detection Anchor triangulation. By projecting the 3D anchor box to stereo images, obtain a pair of RoIs. The left RoI establishes a geometric correspondence with the right one via the anchor box. The nearby target is present in both RoIs with slightly positional differences. The TLNet takes the RoI pair as input and utilizes the 3D anchor as reference to localize the targeted object.
  • 11. Triangulation Learning Network: from Monocular to Stereo 3D Object Detection The TLNet takes as input a pair of left- right RoI features Fl and Fr with Croi channels and size Hroi ×Wroi, which are obtained using RoIAlign by projecting the same 3D anchor to the left and right frames. To utilize the left-right coherence scores to reweight each channel. The reweighted features are fused using element-wise addition and passed to task-specific fully-connected layers to predict the objectness confidence and 3D bounding box offsets, i.e., the 3D geometric variance between the anchor and target.
  • 12. Triangulation Learning Network: from Monocular to Stereo 3D Object Detection Orange bounding boxes are detection results, while the green boxes are ground truths. For the main method, also visualize the projected 3D bounding boxes in image, i.e., the first and forth rows. The lidar point clouds are visualized for reference but not used in both training and evaluation. It is shown that the triangulation learning method can reduce missed detections and improve the performance of depth prediction at distant regions.
  • 13. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving • Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in drastically lower accuracies — a gap that is commonly attributed to poor image-based depth estimation. • However, it is not the quality of the data but its representation that accounts for the majority of the difference. • Taking the inner workings of CNNs into consideration, convert image-based depth maps to pseudo- LiDAR representations — essentially mimicking the LiDAR signal. • With this representation, apply different existing LiDAR-based detection algorithms. • On the popular KITTI benchmark, this approach achieves impressive improvements over the existing state-of-the-art in image-based performance — raising the detection accuracy of objects within the 30m range from the previous state-of-the-art of 22% to an unprecedented 74%.
  • 14. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving The pipeline for image-based 3D object detection. Given stereo or monocular images, first predict the depth map, followed by back-projecting it into a 3D point cloud in the LiDAR coordinate system. Refer this representation as pseudo- LiDAR, and process it exactly like LiDAR — any LiDAR-based detection algorithms can be applied.
  • 15. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving Apply a single 2D convolution with a uniform kernel to the frontal view depth map (top-left). The resulting depth map (top-right), after back-projected into pseudo- LiDAR and displayed from the bird’s-eye view (bottom- right), reveals a large depth distortion in comparison to the original pseudo-LiDAR representation (bottom-left), especially for far-away objects. Mark points of each car instance by a color. The boxes are super-imposed and contain all points of the green and cyan cars respectively.
  • 16. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving Qualitative comparison. Compare AVOD with LiDAR, pseudo-LiDAR, and frontal-view (stereo). Ground- truth boxes are in red, predicted boxes in green; the observer in the pseudo-LiDAR plots (bottom row) is on the very left side looking to the right. The frontal-view approach (right) even miscalculates the depths of nearby objects and misses far-away objects entirely.
  • 17. Stereo R-CNN based 3D Object Detection for Autonomous Driving • A 3D object detection method for autonomous driving by fully exploiting the sparse and dense, semantic and geometry information in stereo imagery. • This method, called Stereo R-CNN, extends Faster R-CNN for stereo inputs to simultaneously detect and associate object in left and right images. • Add extra branches after stereo Region Proposal Network (RPN) to predict sparse keypoints, viewpoints, and object dimensions, which are combined with 2D left-right boxes to calculate a coarse1 3D object bounding box. • Then to recover the accurate 3D bounding box by a region-based photometric alignment using left and right RoIs. • This method does not require depth input and 3D position supervision, however, outperforms all existing fully supervised image-based methods. • Code released at https://github.com/HKUST-Aerial-Robotics/Stereo-RCNN.
  • 18. Stereo R-CNN based 3D Object Detection for Autonomous Driving The stereo R-CNN outputs stereo boxes, keypoints, dimensions, and the viewpoint angle, followed by the 3D box estimation and the dense 3D box alignment module.
  • 19. Stereo R-CNN based 3D Object Detection for Autonomous Driving Relations between object orientation θ, azimuth β and viewpoint θ + β. Only same viewpoints lead to same projections. Different targets assignment for RPN classification and regression.
  • 20. Stereo R-CNN based 3D Object Detection for Autonomous Driving 3D semantic keypoints, the 2D perspective keypoint, and boundary keypoints.
  • 21. Stereo R-CNN based 3D Object Detection for Autonomous Driving Sparse constraints for the 3D box estimation
  • 22. Stereo R-CNN based 3D Object Detection for Autonomous Driving From top to bottom: detections on left image, right image, and bird’s eye view image.