camera-based Lane detection by deep learningYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
RegNet: Multimodal Sensor Registration Using Deep Neural Networks
CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks
RGGNet: Tolerance Aware LiDAR-Camera Online Calibration with Geometric Deep Learning and Generative Model
CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional Neural Network and Geometric Constraints
LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network
CFNet: LiDAR-Camera Registration Using Calibration Flow Network
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu
해당 논문은 3D Aware 모델입니다 StyleGAN 같은 경우에는 어떤 하나의 피처에 대해서 Editing 하고 싶을 때 입력에 해당하는 레이턴트 백터를 찾아서 레이턴트 백터를 수정함으로써 입에 해당하는 피쳐를 바꿀 수 있었는데 이런 컨셉을 그대로 착안해서
GAN 스페이스 논문에서는 인풋이 들어왔을 때 어떤 공간적인 정보까지도 에디팅하려고 시도했습니다 결과를 봤을 때 로테이션 정보가 어느 정도 잘 학습된 것 같지만 같은 사람이 아닌 것 같이 인식되기도 합니다 이러한 문제를 이제 disentangle 되지 않았다라고 하는 게 원하는 피처만 변화시켜야 되는 것과 달리 다른 피처까지도 모두 학습 모두 변했다는 것인데 이를 좀 더 효율적으로 3D를 더 잘 이해시키기 위해서 탄생한 논문입니다.
camera-based Lane detection by deep learningYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
RegNet: Multimodal Sensor Registration Using Deep Neural Networks
CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks
RGGNet: Tolerance Aware LiDAR-Camera Online Calibration with Geometric Deep Learning and Generative Model
CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional Neural Network and Geometric Constraints
LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network
CFNet: LiDAR-Camera Registration Using Calibration Flow Network
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu
해당 논문은 3D Aware 모델입니다 StyleGAN 같은 경우에는 어떤 하나의 피처에 대해서 Editing 하고 싶을 때 입력에 해당하는 레이턴트 백터를 찾아서 레이턴트 백터를 수정함으로써 입에 해당하는 피쳐를 바꿀 수 있었는데 이런 컨셉을 그대로 착안해서
GAN 스페이스 논문에서는 인풋이 들어왔을 때 어떤 공간적인 정보까지도 에디팅하려고 시도했습니다 결과를 봤을 때 로테이션 정보가 어느 정도 잘 학습된 것 같지만 같은 사람이 아닌 것 같이 인식되기도 합니다 이러한 문제를 이제 disentangle 되지 않았다라고 하는 게 원하는 피처만 변화시켜야 되는 것과 달리 다른 피처까지도 모두 학습 모두 변했다는 것인데 이를 좀 더 효율적으로 3D를 더 잘 이해시키기 위해서 탄생한 논문입니다.
Single Image Depth Estimation using frequency domain analysis and Deep learningAhan M R
Using Machine Learning and Deep Learning Techniques, we train the ResNet CNN Model and build a model for estimating Depth using the Discrete Fourier Domain Analysis, and generate results including the explanation of the Loss function and code snippets.
The single image dehazing based on efficient transmission estimationAVVENIRE TECHNOLOGIES
We propose a novel haze imaging model for single image haze removal. Haze imaging model is formulated using dark channel prior (DCP), scene radiance, intensity, atmospheric light and transmission medium. The dark channel prior is based on the statistics of outdoor haze-free images. We find that, in most of the local regions which do not cover the sky, some pixels (called dark pixels) very often have very low intensity in at least one color (RGB) channel. In hazy images, the intensity of these dark pixels in that channel is mainly contributed by the air light. Therefore, these dark pixels can directly provide an accurate estimation of the haze transmission. Combining a haze imaging model and a interpolation method, we can recover a high-quality haze free image and produce a good depth map.
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
In this talk, the authors will describe an overview of a different method for deferred lighting approach used in CryENGINE 3, along with an in-depth description of the many techniques used. Original file and videos at http://crytek.com/cryengine/presentations
WEBINAR ON FUNDAMENTALS OF DIGITAL IMAGE PROCESSING DURING COVID LOCK DOWN by by K.Vijay Anand , Associate Professor, Department of Electronics and Instrumentation Engineering , R.M.K Engineering College, Tamil Nadu , India
EFFICIENT IMAGE COMPRESSION USING LAPLACIAN PYRAMIDAL FILTERS FOR EDGE IMAGESijcnac
This project presents a new image compression technique for the coding of retinal and
fingerprint images. Retinal images are used to detect diseases like diabetes or
hypertension. Fingerprint images are used for the security purpose. In this work, the
contourlet transform of the retinal and fingerprint image is taken first. The coefficients of
the contourlet transform are quantized using adaptive multistage vector quantization
scheme. The number of code vectors in the adaptive vector quantization scheme depends
on the dynamic range of the input image.
Application of Foundation Model for Autonomous DrivingYu Huang
Since DARPA’s Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Recently powered by large language models (LLMs), chat systems, such as chatGPT and PaLM, emerge and rapidly become a promising direction to achieve artificial general intelligence (AGI) in natural language processing (NLP). There comes a natural thinking that we could employ these abilities to reformulate autonomous driving. By combining LLM with foundation models, it is possible to utilize the human knowledge, commonsense and reasoning to rebuild autonomous driving systems from the current long-tailed AI dilemma. In this paper, we investigate the techniques of foundation models and LLMs applied for autonomous driving, categorized as simulation, world model, data annotation and planning or E2E solutions etc.
Fisheye based Perception for Autonomous Driving VIYu Huang
Disentangling and Vectorization: A 3D Visual Perception Approach for Autonomous Driving Based on Surround-View Fisheye Cameras
SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround View Fisheye Cameras
FisheyeDistanceNet++: Self-Supervised Fisheye Distance Estimation with Self-Attention, Robust Loss Function and Camera View Generalization
An Online Learning System for Wireless Charging Alignment using Surround-view Fisheye Cameras
RoadEdgeNet: Road Edge Detection System Using Surround View Camera Images
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
Road-line detection and 3D reconstruction using fisheye cameras
• Vehicle Re-ID for Surround-view Camera System
• SynDistNet: Self-Supervised Monocular Fisheye Camera Distance
Estimation Synergized with Semantic Segmentation for Autonomous
Driving
• Universal Semantic Segmentation for Fisheye Urban Driving Images
• UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a
Generic Framework for Handling Common Camera Distortion Models
• OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving
• Adversarial Attacks on Multi-task Visual Perception for Autonomous Driving
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
FisheyeMultiNet: Real-time Multi-task Learning Architecture for
Surround-view Automated Parking System
• Generalized Object Detection on Fisheye Cameras for Autonomous
Driving: Dataset, Representations and Baseline
• SynWoodScape: Synthetic Surround-view Fisheye Camera Dataset for
Autonomous Driving
• Feasible Self-Calibration of Larger Field-of-View (FOV) Camera Sensors
for the ADAS
Autonomous driving for robotaxi, like perception, prediction, planning, decision making and control etc. As well as simulation, visualization and data closed loop etc.
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
Canadian Adverse Driving Conditions Dataset, 2020, 2
Deep multimodal sensor fusion in unseen adverse weather, 2020, 8
RADIATE: A Radar Dataset for Automotive Perception in Bad Weather, 2021, 4
Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection, 2021, 7
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather, 2021, 8
DSOR: A Scalable Statistical Filter for Removing Falling Snow from LiDAR Point Clouds in Severe Winter Weather, 2021, 9
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
Formal Scenario-Based Testing of Autonomous Vehicles: From Simulation to the Real World, 2020
A Scenario-Based Development Framework for Autonomous Driving, 2020
A Customizable Dynamic Scenario Modeling and Data Generation Platform for Autonomous Driving, 2020
Large Scale Autonomous Driving Scenarios Clustering with Self-supervised Feature Extraction, 2021
Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles, 2021
Systems Approach to Creating Test Scenarios for Automated Driving Systems, Reliability Engineering and System Safety (215), 2021
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
Introduction;
data driven models for autonomous driving;
cloud computing infrastructure and big data processing;
annotation tools for training data;
large scale model training platform;
model testing and verification;
related machine learning techniques;
Conclusion.
Simulation for autonomous driving at uber atgYu Huang
Testing Safety of SDVs by Simulating Perception and Prediction
LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World
Recovering and Simulating Pedestrians in the Wild
S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling
SceneGen: Learning to Generate Realistic Traffic Scenes
TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors
GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving
AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles
Appendix: (Waymo)
SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving
Prediction and planning for self driving at waymoYu Huang
ChauffeurNet: Learning To Drive By Imitating The Best Synthesizing The Worst
Multipath: Multiple Probabilistic Anchor Trajectory Hypotheses For Behavior Prediction
VectorNet: Encoding HD Maps And Agent Dynamics From Vectorized Representation
TNT: Target-driven Trajectory Prediction
Large Scale Interactive Motion Forecasting For Autonomous Driving : The Waymo Open Motion Dataset
Identifying Driver Interactions Via Conditional Behavior Prediction
Peeking Into The Future: Predicting Future Person Activities And Locations In Videos
STINet: Spatio-temporal-interactive Network For Pedestrian Detection And Trajectory Prediction
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
1. DEPTH FUSION FROM RGB AND
DEPTH SENSORS IV
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
2. Outline
■ Single-Photon 3D Imaging with Deep Sensor Fusion
■ Deep RGB-D Canonical Correlation Analysis For Sparse Depth Completion
■ Confidence Propagation through CNNs for Guided Sparse Depth
Regression
■ Learning Guided Convolutional Network for Depth Completion
■ DFineNet: Ego-Motion Estimation and Depth Refinement from Sparse,
Noisy Depth Input with RGB Guidance
■ PLIN: A Network for Pseudo-LiDAR Point Cloud Interpolation
■ Depth Completion from Sparse LiDAR Data with Depth-Normal
Constraints
3. Single-Photon 3D Imaging with Deep
Sensor Fusion
■ Active illumination time-of-flight sensors in particular have become widely used to
estimate a 3D representation of a scene.
■ However, the maximum range, density of acquired spatial samples, and overall
acquisition time of these sensors is fundamentally limited by the min-imum signal
required to estimate depth reliably.
■ A data-driven method for photon-efficient 3D imaging which leverages sensor fusion
and computational reconstruction to rapidly and robustly estimate a dense depth map
from low photon counts.
■ This sensor fusion approach uses measurements of single photon arrival times from a
LR single-photon detector array and an intensity image from a conventional HR camera.
■ Using a multi-scale deep convolutional network, it jointly processes the raw
measurements from both sensors and output a high-resolution depth map.
2018
4. Single-Photon 3D Imaging with Deep
Sensor Fusion
Single-photon 3D imaging systems measure a spatio-temporal volume containing photon counts (left) that
include ambient light, noise, and photons emitted by a pulsed laser into the scene and reflected back to the
detector. Conventional depth estimation techniques, such as log-matched filtering (center left), estimate a depth
map from these counts. However, depth estimation is a non-convex and challenging problem, especially for
extremely low photon counts observed in fast or long-range 3D imaging systems. Here is a data-driven approach
to solve this depth estimation problem and explore deep sensor fusion approaches that use an intensity image of
the scene to optimize the robustness (center right) and resolution (right) of the depth estimation.
5. Single-Photon 3D Imaging with Deep
Sensor Fusion
The denoising branch (left) takes as input the 3D volume of photon counts and processes it at multiple scales using
a series of 3D conv layers. The resulting features from each resolution scale are concatenated together and
optionally concatenated with additional features from an intensity image in a sensor fusion approach. A further set
of 3D conv layers regresses a normalized illumination pulse, censoring the BG photon events. A differentiable
argmax operator is used to localize the ToF of the estimated illumination pulse and determine the depth. In the
image-guided upsampling branch (right), the network predicts HF differences between an upsampled LF depth map
and the HR depth map using multi-scale guidance from HF features of the intensity image. The entire network is
trainable end-to-end for depth estimation and upsampling from raw photon counts and an intensity image.
6. Single-Photon 3D Imaging with Deep
Sensor Fusion
(a) photo of setup (b) imaging optics (c) illumination optics
Single-photon imaging prototype. (a) both the imaging opMcs (boNom) and illuminaMon opMcs (top). The
illuminaMon and imaging opMcs are aligned in a recMfied setup to perform energy-efficient epipolar scanning.
(bA dichroic short-pass filter reflects light above 500 nm to a PointGrey vision camera, and transmits light of
all remaining wavelengths through a 450 nm laser line filter and onto a 1D array of 256 SPAD pixels. The galvo
mirror angle controls the scanline imaging the scene. (c) A cylindrical lens creates a verMcal laser line, and the
galvo mirror determines the posiMon of this laser line within the scene.
7. Single-Photon 3D Imaging with Deep
Sensor Fusion
Reconstruction results for four scenes: checkerboard, elephant, lamp, and bouncing ball.
8. Deep RGB-D Canonical Correlation
Analysis For Sparse Depth Completion
■ Correlation For Completion Network (CFCNet), an end-to-end deep model to do the sparse
depth completion task with RGB info.
■ A 2D deep canonical correlation analysis as network constraints to ensure encoders of RGB
and depth capture the most similar semantics.
■ It transforms the RGB features to the depth domain, and the complementary RGB info is
used to complete the missing depth info.
■ A completed dense depth map is viewed as composed of two parts.
■ One is the sparse depth which is observable and used as the input, another is non-
observable and recovered by the task.
■ Also, the corresponding full RGB image of the depth map can be decomposed into two parts,
one is called the sparse RGB, which holds the corresponding RGB values at the observable
locations in the sparse depth, and the other part is complementary RGB, which is the
subtraction of the sparse RGB from the full RGB images.
■ During training, CFCNet learns the relationship between sparse depth and sparse RGB and
uses the learned knowledge to recover non-observable depth from complementary RGB.
2019,6
9. Deep RGB-D Canonical Correlation
Analysis For Sparse Depth Completion
The input 0 - 1 sparse mask represents the sparse pattern of depth measures. The complementary mask is complementary to the
sparse mask. Separate a full image into a sparse RGB and a complementary RGB by the mask and feed them with masks into networks.
10. Deep RGB-D Canonical Correlation
Analysis For Sparse Depth Completion
■ CFCNet takes in sparse depth map, sparse RGB, and complementary RGB.
■ Use the Sparsity-aware Attentional Convolutions (SAConv) in VGG16-like encoders.
■ SAConv is inspired by local attention mask which introduces the segmentation-aware
mask to let convolution "focus" on the signals consistent with the segmentation mask.
■ In order to propagate info from reliable sources, use sparsity masks to make convolution
operations attend on the signals from reliable locations.
■ Difference to the local attention mask is that SAConv does not apply mask normalization.
■ It avoids mask normalization cause it affect the stability of later 2D2CCA calculations due
to the numerically small extracted features it produces after several times normalization.
■ Also, use max-pooling operation on masks after every SAConv to keep track of the visibility.
■ If there is at least one nonzero value visible to a convolutional kernel, the max-pooling
would evaluate the value at the position to 1.
11. Deep RGB-D Canonical Correlation
Analysis For Sparse Depth Completion
SAConv. The ⊙ is for Hadamard product. The
⊗ is for convolution. The + is for elementwise
addition. The kernel size is 3 × 3 and stride is 1
for both convolution and max-pooling.
2D Deep Canonical Correlation Analysis (2D2CCA)
full-rank covariance matrix
covariance matrices
correlation
The total loss funcMon
12. Deep RGB-D Canonical Correlation
Analysis For Sparse Depth Completion
■ Most multi-modal deep learning approaches simply concatenate or element-wisely add
bottleneck features.
■ However, when the extracted semantics and range of feature value differs among
elements, direct concatenation and addition on multi-modal data source would not
always yield better performance than single-modal data source.
■ To avoid this problem, use encoders to extract higher-level semantics from two branches.
■ 2D2CCA ensures the extracted features from two branches are maximally correlated.
■ The intuition is to capture the same semantics from the RGB and depth domains.
■ Next, use a transformer network to transform extracted features from RGB domain to
depth domain, making extracted features from different sources share the same
numerical range.
■ During the training phase, use features of sparse depth and corresponding sparse RGB
image to calculate the 2D2CCA loss and transformer loss.
13. Deep RGB-D Canonical Correlation
Analysis For Sparse Depth Completion
(a) RGB image (b) 500 points sparse depth as inputs. (c) Completed depth maps. (d) Results from MIT.
14. Learning Guided Convolutional Network for
Depth Completion
■ Dense depth perception is critical for autonomous driving and other robotics applications.
■ It is thus necessary to complete the sparse LiDAR data, where a synchronized guidance
RGB image is often used to facilitate this completion.
■ Inspired by the guided image filtering, a guided network to predict kernel weights from the
guidance image.
■ These predicted kernels are then applied to extract the depth image features.
■ In this way, a network generates content-dependent and spatially-variant kernels for multi-
modal feature fusion.
■ Dynamically generated spatially-variant kernels could lead to prohibitive GPU memory
consumption and computation overhead.
■ A convolution factorization is designed to reduce computation and memory consumption.
■ GPU memory reduction makes it possible for feature fusion to work in multi-stage scheme.
2019,8
15. Learning Guided Convolutional Network for
Depth Completion
The network architecture includes two sub-networks: GuideNet in orange and DepthNet in blue. To add a convolution
layer at the beginning of both GuideNet and DepthNet as well as the end of DepthNet. The light orange and blue are
the encoder stages, while corresponding dark ones are decoder stage of GuideNet and DepthNet, respectively. The
ResBlock represents the basic residual block structure with two sequential 3 × 3 convolutional layers.
16. Learning Guided Convolutional Network for
Depth Completion
Guided Convolution Module. (a) the overall pipeline of guided convolution module. Given image features as input,
filter generation layer dynamically produces guided kernels, which are further applied on input depth features
and output new depth features. (b) the details of convolution between guided kernels and input depth features.
To factorize it into two-stage convolutions: channel-wise convolution and cross-channel convolution.
17. Learning Guided Convolutional Network for
Depth Completion
Qualitative comparison with state-of-the-art methods on KITTI test set
18. DFineNet: Ego-Motion Estimation and Depth Refinement
from Sparse, Noisy Depth Input with RGB Guidance
■ Depth estimation is an important capability for autonomous vehicles to understand
and reconstruct 3D environments as well as avoid obstacles during the execution.
■ Accurate depth sensors such as LiDARs are often heavy, expensive and can only
provide sparse depth while lighter depth sensors such as stereo cameras are noiser
in comparison.
■ It is an end- to-end learning algorithm that is capable of using sparse, noisy input
depth for refinement and depth completion.
■ This model also produces the camera pose as a byproduct, making it a great solution
for autonomous systems.
■ To evaluate the approach on both indoor and outdoor datasets.
■ 2019,8.
19. DFineNet: Ego-Motion Estimation and Depth
Refinement from Sparse, Noisy Depth Input with RGB
Guidance
An example of sparse, noisy depth input (1st row), the
3D visualization of ground truth of depth (2nd row)
and the 3D visualization of output from our model
(bottom). RGB image (1st) is overlaid with sparse,
noisy depth input for visualization.
20. DFineNet: Ego-Motion Estimation and Depth Refinement
from Sparse, Noisy Depth Input with RGB Guidance
It refines sparse & noisy depth input (the 3rd row) to output dense depth of high quality (bottom row).
21. DFineNet: Ego-Motion Estimation and Depth Refinement
from Sparse, Noisy Depth Input with RGB Guidance
Network Architecture
The network consists of two branches: one CNN to learn the function that estimates the depth (ψd),
and one CNN to learn the function that estimates the pose (θp). This network takes as input the image
sequence and corresponding sparse depth maps and outputs the transformation as well as the dense
depth map. During training, the two sets of parameters are simultaneously updated by the training
signal which will be detailed in this section. It is the revised depth net of Ma from MIT as a Depth-CNN.
22. DFineNet: Ego-Motion Estimation and Depth Refinement
from Sparse, Noisy Depth Input with RGB Guidance
■ Supervised Loss
■ Photometric Loss
■ Masked Photometric Loss
■ Smoothness Loss
– Derived from Sfm-net
■ Total Loss:
23. DFineNet: Ego-Motion Estimation and Depth Refinement
from Sparse, Noisy Depth Input with RGB Guidance
Qualitative results of this method (left), RGB guide & certainty (middle) ranking 1st and MIT’s Ma(right) ranking 7th.
24. Confidence Propagation through CNNs
for Guided Sparse Depth Regression
■ 2019,8
■ Generally, convolutional neural networks (CNNs) process data on a regular grid, e.g. data
generated by ordinary cameras.
■ Designing CNNs for sparse and irregularly spaced input data is still an open research
problem with numerous applications in autonomous driving, robotics, and surveillance.
■ An algebraically-constrained normalized convolution layer for CNNs with highly sparse
input that has a smaller number of network parameters compared to related work.
■ Strategies for determining the confidence from the convolution operation and
propagating it to consecutive layers.
■ An objective function that simultaneously minimizes the data error while maximizing the
output confidence.
■ To integrate structural information, fusion strategies to combine depth and RGB
information in the normalized convolution network framework.
■ In addition, use of output confidence as an auxiliary information to improve the results.
25. Confidence Propagation through CNNs
for Guided Sparse Depth Regression
Scene depth completion pipeline on an example image. The input to the pipeline is a very sparse projected LiDAR
point cloud, an input confidence map which has zeros at missing pixels and ones otherwise, and an RGB image. The
sparse point cloud input and the input confidence are fed to a multi-scale unguided network that acts as a generic
estimator for the data. Afterwards, the continuous output confidence map is concatenated with the RGB image and
fed to a feature extraction network. The output from the unguided network and the RGB feature extraction networks
are concatenated and fed to a fusion network which produces the final dense depth map.
26. Confidence Propagation through CNNs
for Guided Sparse Depth Regression
The standard convoluMon layer in CNN frameworks can be replaced by a normalized convoluMon layer
with minor modificaMons. First, the layer takes in two inputs simultaneously, the data and its confidence.
The forward pass is then modified and the back-propagaMon is modified to include a derivaMve term for
the non-negaMvity enforcement funcMon. To propagate the confidence to consecuMve layers, the already-
calculated denominator term is normalized by the sum of the filter elements.
Normalized Convolution layer that takes in two inputs, i.e. data
and confidence and outputs a data term and a confidence term.
27. Confidence Propagation through CNNs
for Guided Sparse Depth Regression
The multi-scale architecture for the task of unguided scene depth completion that utilizes normalized convolution
layers. Downsampling is performed using max pooling on confidence maps and the indices of the pooled pixels are
used to select the pixels with highest confidences from the feature maps. Different scales are fused by upsampling
the coarser scale and concatenating it with the finer scale. A normalized convolution layer is then used to fuse the
feature maps based on the confidence information. Finally, a 1 × 1 normalized convolution layer is used to merge
different channels into one channel and produce a dense output and an output confidence map.
28. Confidence Propagation through CNNs
for Guided Sparse Depth Regression
(a) A multi-stream architecture that
contains a stream for depth and
another stream for RGB + Output
Confidence feature extraction.
Afterwards, a fusion network combines
both streams to produce the final
dense output. (d) A multi-scale
encoder-decoder architecture where
depth is fed to the unguided network
followed by an encoder and output
confidence and RGB image are
concatenated then fed to a similar
encoder. Both streams have skip-
connection to the decoder between
the corresponding scales. (c) is similar
to (a), but with early fusion and (d) is
similar to (b) but with early fusion.
29. Confidence Propagation through CNNs
for Guided Sparse Depth Regression
(a) RGB input, (b) Method MS-Net[LF]-L2 (gd), (c) Sparse-
to-Dense (gd) and (d) HMS-Net (gd). For each one, top:
the prediction, ethod MS-Net[LF]-L2 (gd) performs
slightly better, while Sparse-to-Dense produces smoother
edges due to the use of a smoothness loss.
30. PLIN: A Network for Pseudo-LiDAR Point
Cloud Interpolation
■ LiDAR can provide dependable 3D spatial information at a low frequency (around 10Hz)
and have been widely applied in the field of autonomous driving and UAV.
■ However, the camera with a higher frequency (around 20-30Hz) has to be decreased so
as to match with LiDAR in a multi- sensor system.
■ A Pseudo-LiDAR interpolation network (PLIN) to increase the frequency of LiDAR sensors.
■ PLIN can generate temporally and spatially high- quality point cloud sequences to match
the high frequency of cameras.
■ For this goal, use a coarse interpolation stage guided by consecutive sparse depth maps
and motion relationship and a refined interpolation stage guided by the realistic scene.
■ Using this coarse-to-fine cascade structure, this method can progressively perceive
multi-modal info and generate accurate intermediate point clouds.
■ This is the first deep framework for Pseudo-LiDAR point cloud interpolation, which shows
appealing applications in navigation systems equipped with LiDAR and cameras.
2019,9
31. PLIN: A Network for Pseudo-LiDAR Point
Cloud Interpolation
Overall pipeline of the proposed method.
PLIN aims to address the mismatching
problem of frequency between camera
and LiDAR sensors, generating both
temporally and spatially high-quality
point cloud sequences. This method
takes three consecutive color images
and two sparse depth maps as inputs,
and interpolates an intermediate dense
depth map, which is further transformed
into a Pseudo-LiDAR point cloud using
camera intrinsic parameters.
32. PLIN: A Network for Pseudo-LiDAR Point
Cloud Interpolation
Overview of the Pseudo-LiDAR interpolation network (PLIN). The whole architecture consists of three modules,
including the motion guidance module, scene guidance module and transformation module.
33. PLIN: A Network for Pseudo-LiDAR Point
Cloud Interpolation
Results of interpolated depth map obtained by PLIN. For each example, it shows the intermediate color image,
sparse depth map, dense depth map, and the result. This method can recover the original depth informaMon and
generate much denser distribuMons.
34. PLIN: A Network for Pseudo-LiDAR Point
Cloud Interpolation
It shows the color image, interpolated dense depth map, two views of the generated Pseudo-LiDAR, and
enlarged areas. The complete network produces more accurate depth map, and the distribution and shape of
Pseudo-LiDAR are more similar to those of the GT point cloud.
35. Depth Completion from Sparse LiDAR
Data with Depth-Normal Constraints
■ Depth completion aims to recover dense depth maps from sparse depth
measurements.
■ It is of increasing importance for autonomous driving and draws increasing attention
from the vision community.
■ Most of existing methods directly train a network to learn a mapping from sparse
depth inputs to dense depth maps, which has difficulties in utilizing the 3D
geometric constraints and handling the practical sensor noises.
■ to regularize the depth completion and improve the robustness against noise, a
unified CNN framework 1) models the geometric constraints between depth and
surface normal in a diffusion module and 2) predicts the confidence of sparse
LiDAR measurements to mitigate the impact of noise.
■ Specifically, the encoder-decoder backbone predicts surface normals, coarse depth
and confidence of LiDAR inputs simultaneously, which are subsequently inputted
into the diffusion refinement module to obtain the final completion results.
2019,10
36. Depth Completion from Sparse LiDAR
Data with Depth-Normal Constraints
From sparse LiDAR measurements and color images (a-b), this model first infers the maps of coarse depth
and normal (c-d), and then recurrently refines the initial depth estimation by enforcing the constraints
between depth and normals. Moreover, to address the noises in practical LiDAR measurements (g), employ
a decoder branch to predict the confidences (h) of sparse inputs for better regularization.
37. Depth Completion from Sparse LiDAR
Data with Depth-Normal Constraints
The predicMon network first predicts maps of surface normal N, coarse depth D and confidence M of sparse depth input with a
shared-weight encoder and independent decoders. Then, the sparse depth inputs D ̄ and coarse depth D are transformed to the
plane-origin distance space as P ̄ and P. Next, the refinement network, an anisotropic diffusion module, refines the coarse depth
map D in the plane-origin distance subspace to enforce the constraints between depth and normal and to incorporate info from
the confident sparse depth inputs. During the refinement, the diffusion conductance depends on the similarity in guidance feature
map G. Finally, the refined P is inversely transformed back to obtain the refined depth map Drwhen the diffusion is finished.
38. Depth Completion from Sparse LiDAR
Data with Depth-Normal Constraints
Differentiable diffusion block. In each
refinement iteration, high-dimensional feature
vectors (e.g., of dimension 64) in guidance
feature map G are independently transformed via
two different functions f and g (modeled as two
convolution layers followed by normalization).
Then, the conductance from each location xi (in
plane-origin distance map P) to its neighboring K
pixels (xj ∈ Ni) are calculated. Finally, the diffusion
is performed through a convolution operation
with the kernels defined by the previous
computed conductance. Through such diffusion,
depth completion results are regularized by the
constraint between depth and normal.
39. Depth Completion from Sparse LiDAR
Data with Depth-Normal Constraints
negative cosine loss
L2 reconstruction loss
L2 depth loss
L2 refinement reconstruction loss
The overall loss function:
relation between depth and normal can be
established via the tangent plane equation
40. Depth Completion from Sparse LiDAR
Data with Depth-Normal Constraints
Quantitative comparison with other
methods. For each method, provide
the whole completion results as well
as the zoom-in views of details and
error maps for better comparison.
Also provide the normal prediction
and confidence prediction of this
method for better illustration.