Camera-Based Road Lane Detection by Deep Learning IIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
camera-based Lane detection by deep learningYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Prediction and planning for self driving at waymoYu Huang
ChauffeurNet: Learning To Drive By Imitating The Best Synthesizing The Worst
Multipath: Multiple Probabilistic Anchor Trajectory Hypotheses For Behavior Prediction
VectorNet: Encoding HD Maps And Agent Dynamics From Vectorized Representation
TNT: Target-driven Trajectory Prediction
Large Scale Interactive Motion Forecasting For Autonomous Driving : The Waymo Open Motion Dataset
Identifying Driver Interactions Via Conditional Behavior Prediction
Peeking Into The Future: Predicting Future Person Activities And Locations In Videos
STINet: Spatio-temporal-interactive Network For Pedestrian Detection And Trajectory Prediction
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
camera-based Lane detection by deep learningYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Prediction and planning for self driving at waymoYu Huang
ChauffeurNet: Learning To Drive By Imitating The Best Synthesizing The Worst
Multipath: Multiple Probabilistic Anchor Trajectory Hypotheses For Behavior Prediction
VectorNet: Encoding HD Maps And Agent Dynamics From Vectorized Representation
TNT: Target-driven Trajectory Prediction
Large Scale Interactive Motion Forecasting For Autonomous Driving : The Waymo Open Motion Dataset
Identifying Driver Interactions Via Conditional Behavior Prediction
Peeking Into The Future: Predicting Future Person Activities And Locations In Videos
STINet: Spatio-temporal-interactive Network For Pedestrian Detection And Trajectory Prediction
Stereo Correspondence Algorithms for Robotic Applications Under Ideal And Non...CSCJournals
The use of visual information in real time applications such as in robotic pick, navigation, obstacle avoidance etc. has been widely used in many sectors for enabling them to interact with its environment. Robotics require computationally simpler and easy to implement stereo vision algorithms that will provide reliable and accurate results under real time constraint. Stereo vision is a less expensive, passive sensing technique, for inferring the three dimensional position of objects from two or more simultaneous views of a scene and there is no interference with other sensing devices if multiple robots are present in the same environment. Stereo correspondence aims at finding matching points in the stereo image pair based on Lambertian criteria to obtain disparity. The correspondence algorithm will provide high resolution disparity maps of the scene by comparing two views of the scene under the study. By using the principle of triangulation and with the help of camera parameters, depth information can be extracted from this disparity .Since the focus is on real-time application, only the local stereo correspondence algorithms are considered. A comparative study based on error and computational costs are done between two area based algorithms. Evaluation of Sum of absolute Difference algorithm, which is less computationally expensive, suitable for ideal lightening condition and a more accurate adaptive binary support window algorithm that can handle of non-ideal lighting conditions are taken for this study. To simplify the correspondence search, rectified stereo image pairs are used as inputs.
An algorithm to quantify the swelling by reconstructing 3D model of the face with stereo images is presented. We
analyzed the primary problems in computational stereo, which include correspondence and depth calculation. Work has been carried out to determine suitable methods for depth estimation and standardizing volume estimations. Finally we designed software for reconstructing 3D images from 2D stereo images, which was built on Matlab and Visual C++. Utilizing
techniques from multi-view geometry, a 3D model of the face was constructed and refined. An explicit analysis of the stereo
disparity calculation methods and filter elimination disparity estimation for increasing reliability of the disparity map was
used. Minimizing variability in position by using more precise positioning techniques and resources will increase the accuracy of this technique and is a focus for future work
Digital 3D imaging can benefit from advances in VLSI technology in order to accelerate its deployment in many fields like visual communication and industrial automation. High-resolution 3D images can be acquired using laser-based vision systems. With this approach, the 3D information becomes relatively insensitive to background illumination and surface texture. Complete images of visible surfaces that are rather featureless to the human eye or a video camera can be generated. Intelligent digitizers will be capable of measuring accurately and simultaneously color and 3D.
Goal location prediction based on deep learning using RGB-D camerajournalBEEI
In the navigation system, the desired destination position plays an essential role since the path planning algorithms takes a current location and goal location as inputs as well as the map of the surrounding environment. The generated path from path planning algorithm is used to guide a user to his final destination. This paper presents a proposed algorithm based on RGB-D camera to predict the goal coordinates in 2D occupancy grid map for visually impaired people navigation system. In recent years, deep learning methods have been used in many object detection tasks. So, the object detection method based on convolution neural network method is adopted in the proposed algorithm. The measuring distance between the current position of a sensor and the detected object depends on the depth data that is acquired from RGB-D camera. Both of the object detected coordinates and depth data has been integrated to get an accurate goal location in a 2D map. This proposed algorithm has been tested on various real-time scenarios. The experiments results indicate to the effectiveness of the proposed algorithm.
Application of Foundation Model for Autonomous DrivingYu Huang
Since DARPA’s Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Recently powered by large language models (LLMs), chat systems, such as chatGPT and PaLM, emerge and rapidly become a promising direction to achieve artificial general intelligence (AGI) in natural language processing (NLP). There comes a natural thinking that we could employ these abilities to reformulate autonomous driving. By combining LLM with foundation models, it is possible to utilize the human knowledge, commonsense and reasoning to rebuild autonomous driving systems from the current long-tailed AI dilemma. In this paper, we investigate the techniques of foundation models and LLMs applied for autonomous driving, categorized as simulation, world model, data annotation and planning or E2E solutions etc.
Fisheye based Perception for Autonomous Driving VIYu Huang
Disentangling and Vectorization: A 3D Visual Perception Approach for Autonomous Driving Based on Surround-View Fisheye Cameras
SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround View Fisheye Cameras
FisheyeDistanceNet++: Self-Supervised Fisheye Distance Estimation with Self-Attention, Robust Loss Function and Camera View Generalization
An Online Learning System for Wireless Charging Alignment using Surround-view Fisheye Cameras
RoadEdgeNet: Road Edge Detection System Using Surround View Camera Images
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
Road-line detection and 3D reconstruction using fisheye cameras
• Vehicle Re-ID for Surround-view Camera System
• SynDistNet: Self-Supervised Monocular Fisheye Camera Distance
Estimation Synergized with Semantic Segmentation for Autonomous
Driving
• Universal Semantic Segmentation for Fisheye Urban Driving Images
• UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a
Generic Framework for Handling Common Camera Distortion Models
• OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving
• Adversarial Attacks on Multi-task Visual Perception for Autonomous Driving
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
FisheyeMultiNet: Real-time Multi-task Learning Architecture for
Surround-view Automated Parking System
• Generalized Object Detection on Fisheye Cameras for Autonomous
Driving: Dataset, Representations and Baseline
• SynWoodScape: Synthetic Surround-view Fisheye Camera Dataset for
Autonomous Driving
• Feasible Self-Calibration of Larger Field-of-View (FOV) Camera Sensors
for the ADAS
Autonomous driving for robotaxi, like perception, prediction, planning, decision making and control etc. As well as simulation, visualization and data closed loop etc.
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
Canadian Adverse Driving Conditions Dataset, 2020, 2
Deep multimodal sensor fusion in unseen adverse weather, 2020, 8
RADIATE: A Radar Dataset for Automotive Perception in Bad Weather, 2021, 4
Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection, 2021, 7
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather, 2021, 8
DSOR: A Scalable Statistical Filter for Removing Falling Snow from LiDAR Point Clouds in Severe Winter Weather, 2021, 9
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
Formal Scenario-Based Testing of Autonomous Vehicles: From Simulation to the Real World, 2020
A Scenario-Based Development Framework for Autonomous Driving, 2020
A Customizable Dynamic Scenario Modeling and Data Generation Platform for Autonomous Driving, 2020
Large Scale Autonomous Driving Scenarios Clustering with Self-supervised Feature Extraction, 2021
Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles, 2021
Systems Approach to Creating Test Scenarios for Automated Driving Systems, Reliability Engineering and System Safety (215), 2021
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
Introduction;
data driven models for autonomous driving;
cloud computing infrastructure and big data processing;
annotation tools for training data;
large scale model training platform;
model testing and verification;
related machine learning techniques;
Conclusion.
Simulation for autonomous driving at uber atgYu Huang
Testing Safety of SDVs by Simulating Perception and Prediction
LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World
Recovering and Simulating Pedestrians in the Wild
S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling
SceneGen: Learning to Generate Realistic Traffic Scenes
TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors
GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving
AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles
Appendix: (Waymo)
SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving
RegNet: Multimodal Sensor Registration Using Deep Neural Networks
CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks
RGGNet: Tolerance Aware LiDAR-Camera Online Calibration with Geometric Deep Learning and Generative Model
CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional Neural Network and Geometric Constraints
LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network
CFNet: LiDAR-Camera Registration Using Calibration Flow Network
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
3-d interpretation from stereo images for autonomous driving
1. 3D Interpretation from Stereo Images
for Autonomous Driving
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
2. Outline
• Object-Centric Stereo Matching for 3D Object Detection
• Triangulation Learning Network: from Monocular to Stereo 3D Object Detection
• Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object
Detection for Autonomous Driving
• Stereo R-CNN based 3D Object Detection for Autonomous Driving
• 3d object proposals for accurate object class detection
3. Object-Centric Stereo Matching for 3D
Object Detection
• The current SoA for stereo 3D object detection takes the existing PSMNet stereo matching
network, with no modifications, and converts the estimated disparities into a 3D point cloud,
and feeds this point cloud into a LiDAR-based 3D object detector.
• The issue with existing stereo matching networks is that they are designed for disparity
estimation, not 3D object detection; the shape and accuracy of object point clouds are not
the focus.
• Stereo matching networks commonly suffer from inaccurate depth estimates at object
boundaries, which this method defines as streaking, because BG and FG points are jointly
estimated.
• Existing networks also penalize disparity instead of the estimated position of object point
clouds in their loss functions.
• Here it proposes a 2D box association and object-centric stereo matching method that only
estimates the disparities of the objects of interest to address these two issues.
4. Object-Centric Stereo Matching for 3D
Object Detection
First, a 2D detector generates 2D boxes in Il and Ir. Next, a box association algorithm matches object detections
across both images. Each matched detection pair is passed into the object-centric stereo network, which jointly
produces a disparity map and instance segmentation mask for each object. Together, these form a disparity
map containing only the objects of interest. Lastly, the disparity map is transformed into a point cloud that can
be used by any LiDAR-based 3D object detection network to predict the 3D bounding boxes.
5. Object-Centric Stereo Matching for 3D
Object Detection
Qualitative results on KITTI. Ground truth and predictions are in red and green,
respectively. Colored points are predicted by our stereo matching network
while LiDAR points are shown in black for visualization purposes only.
6. Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
• For 3D object detection from stereo images, the key challenge is how to effectively utilize
stereo information.
• Different from previous methods using pixel-level depth maps, this method employs 3D
anchors to explicitly construct object-level correspondences between the ROI in stereo
images, from which DNN learns to detect and triangulate the targeted object in 3D space.
• It introduces a cost-efficient channel reweighting strategy that enhances representational
features and weakens noisy signals to facilitate the learning process.
• All of these are flexibly integrated into a solid baseline detector that uses monocular images.
• It is demonstrated that both the monocular baseline and the stereo triangulation learning
network outperform the prior SoA in 3D object detection and localization on KITTI dataset.
7. Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
Overview of the 3D detection pipeline. The baseline monocular network is indicated with
blue background, and can be easily extended to stereo inputs by duplicating the baseline
and further integrating with the TLNet (Triangulation Learning Network).
8. Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
• The baseline network taking a mono image as the input is composed of a backbone and 3
subsequent modules, i.e. front view anchor generation, 3D box proposal and refinement.
• The three-stage pipeline progressively reduces the searching space by selecting confident
anchors, which highly reduces computational complexity.
• The stereo 3D detection is performed by integrating a triangulation learning network (TL-
Net) into the baseline model.
• Triangulation is known as localizing 3D points from multi- view images in the classical
geometry fields, while this objective is to localize a 3D object and estimates its size and
orientation from stereo images.
• To achieve this, introduce an anchor triangulation scheme, in which the NN uses 3D
anchors as reference to triangulate the targets.
9. Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
Front view anchor generation. Potential anchors are of high objectness in the front view. Only
the potential anchors are fed into RPN to reduce searching space and save computational cost.
10. Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
Anchor triangulation. By projecting the 3D
anchor box to stereo images, obtain a pair of
RoIs. The left RoI establishes a geometric
correspondence with the right one via the
anchor box. The nearby target is present in
both RoIs with slightly positional differences.
The TLNet takes the RoI pair as input and
utilizes the 3D anchor as reference to localize
the targeted object.
11. Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
The TLNet takes as input a pair of left-
right RoI features Fl and Fr with Croi
channels and size Hroi ×Wroi, which are
obtained using RoIAlign by projecting
the same 3D anchor to the left and right
frames. To utilize the left-right coherence
scores to reweight each channel. The
reweighted features are fused using
element-wise addition and passed to
task-specific fully-connected layers to
predict the objectness confidence and
3D bounding box offsets, i.e., the 3D
geometric variance between the anchor
and target.
12. Triangulation Learning Network: from
Monocular to Stereo 3D Object Detection
Orange bounding boxes are detection results, while the green boxes are ground truths. For the main method, also
visualize the projected 3D bounding boxes in image, i.e., the first and forth rows. The lidar point clouds are
visualized for reference but not used in both training and evaluation. It is shown that the triangulation learning
method can reduce missed detections and improve the performance of depth prediction at distant regions.
13. Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
• Approaches based on cheaper monocular or stereo imagery data have, until now, resulted in
drastically lower accuracies — a gap that is commonly attributed to poor image-based
depth estimation.
• However, it is not the quality of the data but its representation that accounts for the majority
of the difference.
• Taking the inner workings of CNNs into consideration, convert image-based depth maps to
pseudo- LiDAR representations — essentially mimicking the LiDAR signal.
• With this representation, apply different existing LiDAR-based detection algorithms.
• On the popular KITTI benchmark, this approach achieves impressive improvements over the
existing state-of-the-art in image-based performance — raising the detection accuracy of
objects within the 30m range from the previous state-of-the-art of 22% to an
unprecedented 74%.
14. Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
The pipeline for image-based 3D object detection. Given stereo or monocular images, first predict the depth map,
followed by back-projecting it into a 3D point cloud in the LiDAR coordinate system. Refer this representation as pseudo-
LiDAR, and process it exactly like LiDAR — any LiDAR-based detection algorithms can be applied.
15. Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
Apply a single 2D convolution with a uniform kernel to the frontal view depth map (top-left). The resulting depth
map (top-right), after back-projected into pseudo- LiDAR and displayed from the bird’s-eye view (bottom- right),
reveals a large depth distortion in comparison to the original pseudo-LiDAR representation (bottom-left),
especially for far-away objects. Mark points of each car instance by a color. The boxes are super-imposed and
contain all points of the green and cyan cars respectively.
16. Pseudo-LiDAR from Visual Depth Estimation: Bridging the
Gap in 3D Object Detection for Autonomous Driving
Qualitative comparison. Compare AVOD with LiDAR, pseudo-LiDAR, and frontal-view (stereo).
Ground- truth boxes are in red, predicted boxes in green; the observer in the pseudo-LiDAR
plots (bottom row) is on the very left side looking to the right. The frontal-view approach
(right) even miscalculates the depths of nearby objects and misses far-away objects entirely.
17. Stereo R-CNN based 3D Object Detection
for Autonomous Driving
• A 3D object detection method for autonomous driving by fully exploiting the sparse and
dense, semantic and geometry information in stereo imagery.
• This method, called Stereo R-CNN, extends Faster R-CNN for stereo inputs to
simultaneously detect and associate object in left and right images.
• Add extra branches after stereo Region Proposal Network (RPN) to predict sparse keypoints,
viewpoints, and object dimensions, which are combined with 2D left-right boxes to calculate
a coarse1 3D object bounding box.
• Then to recover the accurate 3D bounding box by a region-based photometric alignment
using left and right RoIs.
• This method does not require depth input and 3D position supervision, however,
outperforms all existing fully supervised image-based methods.
• Code released at https://github.com/HKUST-Aerial-Robotics/Stereo-RCNN.
18. Stereo R-CNN based 3D Object Detection
for Autonomous Driving
The stereo R-CNN outputs stereo boxes, keypoints, dimensions, and the viewpoint angle,
followed by the 3D box estimation and the dense 3D box alignment module.
19. Stereo R-CNN based 3D Object Detection
for Autonomous Driving
Relations between object orientation θ,
azimuth β and viewpoint θ + β. Only same
viewpoints lead to same projections.
Different targets assignment for RPN classification and
regression.
20. Stereo R-CNN based 3D Object Detection
for Autonomous Driving
3D semantic keypoints, the 2D perspective keypoint, and boundary keypoints.
21. Stereo R-CNN based 3D Object Detection
for Autonomous Driving
Sparse constraints for the 3D box estimation
22. Stereo R-CNN based 3D Object Detection
for Autonomous Driving
From top to bottom: detections on left image, right image, and bird’s eye view image.