SlideShare a Scribd company logo
Depth Fusion from RGB
and Depth Sensors
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
• 1. Sparsity Invariant CNNs
• 2. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image
• 3. Self-Supervised Sparse-to-Dense: Depth Completion from LiDAR and Monocular Camera
• 4. Fusion of stereo and still monocular depth estimates in a self-supervised learning context
• 5. Deep Depth Completion of a Single RGB-D Image
• 6. Estimating Depth from RGB and Sparse Sensing
• Appendix: InterpoNet, a brain inspired NN for optic flow dense interpolation
Sparsity Invariant CNNs
• CNNs operating on sparse inputs for depth completion from sparse laser scan data.
• Traditional convolutional networks perform poorly when applied to sparse data even when
the location of missing data is provided to the network.
• This is simple yet effective sparse convolution layer which explicitly considers the location of
missing data during the convolution operation.
• The network architecture in synthetic and real experiments wrt various baseline approaches.
• Compared to dense baselines, the sparse convolution network generalizes well to novel
datasets and is invariant to the level of sparsity in the data.
• A dataset from the KITTI benchmark, comprising over 94k depth annotated RGB images.
• The dataset allows for training and evaluating depth completion and depth prediction
techniques in challenging real-world settings.
Sparsity Invariant CNNs
(a) as inputs leads to noisy results when processed with standard CNNs (c). In contrast, sparse conv. network
(d) predicts smooth and accurate depth maps by explicitly considering sparsity during convolution.
(a) Input
(visually
enhanced)
(b) Ground truth
(c) Standard
ConvNet
(d) Sparse conv.
network
Sparsity Invariant CNNs
Sparse Convolutional Network. (a) The input to the network is a sparse depth map (yellow) and a binary
observation mask (red). It passes through several sparse convolution layers (dashed) with decreasing kernel
sizes from 11×11 to 3 × 3. (b) Schematic of our sparse convolution operation. Here, ⊙ denotes elementwise
multiplication, ∗ convolution, 1/x inversion and “max pool” the max pooling operation. The input feature can be
single channel or multi-channel.
Sparsity Invariant CNNs
KITTI 2015 Dataset Raw LiDaR Acc. LiDaR SG
RGB Image Error Maps wrt. KITTI 2015
Sparsity Invariant CNNs
(a) Input (enhanced) (b) ConvNet (c) ConvNet + mask (d) Sparse ConvNet (e) Ground truth
Sparse-to-Dense: Depth Prediction from
Sparse Depth Samples and a Single Image
• Dense depth prediction from a sparse set of depth measurements and a single RGB image.
• Introduce additional sparse depth samples, either acquired with a low-resolution depth
sensor or computed via visual SLAM algorithms.
• Use a single deep regression network to learn directly from the RGB-D raw data, and
explore the impact of number of depth samples on prediction accuracy.
• Two applications: a plug-in module in SLAM to convert sparse maps to dense maps, and
super-resolution for LiDARs.
Sparse-to-Dense: Depth Prediction from
Sparse Depth Samples and a Single Image
CNN architecture for NYU-Depth-v2 and KITTI datasets, respectively.
Sparse-to-Dense: Depth Prediction from
Sparse Depth Samples and a Single Image
prediction on KITTI
RGB images
RGB-based
prediction
sd prediction with
200 and no RGB
RGB-d prediction
with 200 sparse
depth and rgb
ground truth depth
Sparse-to-Dense: Depth Prediction from
Sparse Depth Samples and a Single Image
RGB
raw
depth
predicted
depth
Self-Supervised Sparse-to-Dense: Depth
Completion from LiDAR and Monocular Camera
• Depth completion faces 3 main challenges: 1). the irregularly spaced pattern in the sparse
depth input, 2). the difficulty in handling multiple sensor modalities, 3). the lack of dense,
pixel-level ground truth depth labels.
• A deep regression model to learn mapping from sparse depth (+rgb images) to dense depth.
• A self-supervised training framework that requires only sequences of rgb and sparse depth
images, without the need for dense depth labels.
Given (a) sparse LiDAR scans, (b) a color image, estimate (d) a dense depth image. Semi-dense depth
labels, (d) and (e), apply a highly-scalable, self-supervised framework for training such networks.
Self-Supervised Sparse-to-Dense: Depth
Completion from LiDAR and Monocular Camera
The encoder consists of a sequence of convolutions with increasing filter banks to down-sample the
feature spatial resolutions. The decoder, on the other hand, has a reversed structure with transposed
convolutions to up-sample the spatial resolutions. The input sparse depth and the color image are
separately processed by their initial convolutions. The convolved outputs are concatenated into a single
tensor, which acts as input to the residual blocks of ResNet-34. Output from each of the encoding layers
is passed to, via skip connections, the corresponding decoding layers. A final 1x1 convolution filter
produces a single prediction image with the same resolution as network input. All convolutions are
followed by batch normalization and ReLU, with the exception at the last layer.
Self-Supervised Sparse-to-Dense: Depth
Completion from LiDAR and Monocular Camera
A model-based self-supervised training framework for depth completion. This framework requires only a
synchronized sequence of color/intensity images from a monocular camera and sparse depth images from
LiDAR. White rectangles are variables, red is the depth network to be trained, blue are deterministic
computational blocks (without learnable parameters), and green are loss functions. During training, the
current data frame RGBd1 and a nearby data frame RGB2 are both used to provide supervision signals. At
inference time, only the current frame RGBd1 is needed to produce a depth prediction pred1.
Self-Supervised Sparse-to-Dense: Depth
Completion from LiDAR and Monocular Camera
• The depth loss is defined as
• Given the camera intrinsic matrix K, any pixel p1 in the current frame 1 has
the corresponding projection in frame 2 as
• Synthetic color image using bilinear interpolation
• The final photometric loss is
• The final loss function for the entire self-supervised framework is
Smoothness loss
Self-Supervised Sparse-to-Dense: Depth
Completion from LiDAR and Monocular Camera
Fusion of stereo and still monocular depth
estimates in a self-supervised learning context
• Self-supervised learning in which stereo vision
depth estimates serve as targets for a CNN that
transforms a single image to a depth map.
• After training, the stereo and mono estimates are
fused with a method that preserves high
confidence stereo estimates, while leveraging
CNN estimates in the low-confidence regions.
• Even rather limited CNNs can help provide stereo
vision equipped robots with more reliable depth
maps for autonomous navigation.
Self-supervised learning (SSL)
Fusion of stereo and still monocular depth
estimates in a self-supervised learning context
The regions where stereo vision is ’blind’ can be unveiled by the monocular estimator, as in those
areas a still mono estimator has a priori no constraints to make a valid depth prediction. Note that
the scene and obstacle are quite close to the camera. In large outdoor scenes with obstacles
further away, the proportion of occluded areas will be much smaller.
Fusion of stereo and still monocular depth
estimates in a self-supervised learning context
• The monocular depth estimation is performed with the Fully Convolutional Network (FCN).
• The basis is the well known VGG network, which is pruned of its fully connected layers.
• There are 5 main principles behind the fusion operation:
• (i) as CNN is better at estimating relative depths, its output should be scaled to the stereo range;
• (ii) when a pixel is occluded only monocular estimates are preserved;
• (iii) when stereo is considered reliable, its estimates are preserved;
• (iv)/(v) when in a region of low stereo confidence, if the relative depth estimates are
dissimilar/similar, then the CNN is trusted more/the stereo is trusted more.
• Since stereo vision involves finding correspond. in the same row, it relies on vertical contrasts.
• Convolve with a vertical Sobel filter and apply a threshold to obtain a binary map. This map is
subsequently convolved with a Gaussian blur filter of a relatively large size and renormalized;
• After the merging operation a median filter with a 5 × 5 kernel is used to smooth the final
depth map and reduce even more overall noise.
Fusion of stereo and still monocular depth
estimates in a self-supervised learning context
1) the rgb image.
2) Stereo depth map.
3) Still-mono depth map.
4) The merged depth map.
5) Confidence map (red high stereo
confidence, blue mono).
6) Diff in error against GT btw mono
and stereo (red high mono errors,
blue high stereo errors).
7) Velodyne depth map
Deep Depth Completion of a Single RGB-D Image
• The goal is to complete the depth channel of an RGB-D image.
• To train a deep network that takes an RGB image as input and predicts dense surface
normals and occlusion boundaries.
• Those predictions are then combined with raw depth observations provided by the RGB-D
camera to solve for depths for all pixels, including those missing in the original observation.
• A depth completion benchmark dataset, where holes are filled in training data through the
rendering of surface reconstructions created from multi-view RGB-D scans.
Deep Depth Completion of a Single RGB-D Image
1) prediction of surface normals and occlusion boundaries only from color, and 2) optimization of
global surface structure from those predictions with soft constraints provided by observed depths.
Deep Depth Completion of a Single RGB-D Image
Depth Completion Dataset. Depth
completions are computed from multi-
view surface reconstructions of large
indoor environments. Bottom: the raw
color and depth channels with the
rendered depth for the viewpoint marked
as the red dot. The rendered mesh
(colored by vertex in large image) is
created by combining RGB-D images
from a variety of other views spread
throughout the scene (yellow dots),
which collaborate to fill holes when
rendered to the red dot view.
Deep Depth Completion of a Single RGB-D Image
Using surface normals to solve for depth completion. (a) An example of where depth cannot
be solved from surface normal. (b) The area missing depth is marked in red. The red arrow
shows paths on which depth cannot be integrated from surface normals. However in real-world
images, there are usually many paths through connected neighboring pixels (along floors,
ceilings, etc.) over which depths can be integrated (green arrows).
Deep Depth Completion of a Single RGB-D Image
• The model is a FCN built on the back-bone of VGG-16 with symmetry encoder and decoder.
• It is also equipped with short-cut connections and shared pooling masks for corresponding
max pooling and unpooling layers, which are critical for learning local image features.
• Train the network with “ground truth” surface normals and silhouette boundaries computed
from the reconstructed mesh.
• Define the observed pixels as the ones with depth data from both the raw sensor and the
rendered mesh, and the unobserved pixels as the ones with depth from the rendered mesh
but not the raw sensor.
• For any given set of pixels (observed, unobserved, or both), train models with a loss for only
those pixels by masking out the gradients on other pixels during BP.
• The network learns to predict normals better from color than depth, even if the network is
given an extra channel containing a binary mask indicating which pixels observe depth.
Deep Depth Completion of a Single RGB-D Image
• After predicting the surface normal image N and
occlusion boundary image B, solve a system of
equations to complete the depth image D.
• The objective function is de- fined as the
weighted sum of squared errors with four terms:
Input & GT Zhang et al. Laina et al. Chakrabarti et al.
Estimating Depth from RGB and Sparse Sensing
• A deep model that can produce dense depth maps given an RGB image with known depth
at a very sparse set of pixels.
The objective is to densify a sparse depth map (with additional cues from an RGB
image), then the model is called Deep Depth Densification, or D3 .
Estimating Depth from RGB and Sparse Sensing
• A parametrization of the sparse depth input that accommodates sparse input patterns.
• It allows for varying such patterns not only across different deep models but even within the
same model during training and testing.
• Inputs to parametrization:
• I(x, y) and D(x, y): RGB vector-valued image I and ground truth depth D
• Both maps have dimensions H×W. Invalid values in D are encoded as zero.
• M(x,y): Binary pattern mask of dimensions H×W, where M(x,y) = 1 defines (x,y) locations of our
desired depth samples.
• All points where M(x,y) = 1 must correspond to valid depth points (D(x, y) > 0).
• From I, D and M, form 2 maps for the sparse depth input, S1(x,y) and S2(x,y).
• Both maps have dimension H×W;
• S1(x,y) is a NN (nearest neighbor) fill of the sparse depth M(x,y)∗D(x,y).
• S2(x, y) is the Euclidean Distance Transform of M(x, y), i.e. the L2 distance btw (x,y) and the closest
point (x’,y’) where M(x′,y′) = 1.
• The final parametrization of the sparse depth input is the concatenation of S1(x,y) and S2(x,y).
Estimating Depth from RGB and Sparse Sensing
Estimating Depth from RGB and Sparse Sensing
Both regular and irregular sparse patterns in S1 (top) and S2 (bottom). Dark points
in S2 correspond to the pixels where there is access to depth information.
Estimating Depth from RGB and Sparse Sensing
• For regular grid patterns, to ensure minimal spatial bias when choosing the mask M(x,y) by
enforcing equal spacing btw subsequent pattern points in both the x and y directions.
• Such a strategy is convenient when one model accommodate images of different resolutions.
• For ease of interpretation, use sparse patterns close to an integer level of downsampling;
• It is beneficial to vary the sparse pattern M(x,y) during training.
• Such a schedule begins training at 6 times the desired sparse pattern density and smoothly
decays towards the final density as training progresses.
• Also train with randomly varying sampling densities at each training step.
Estimating Depth from RGB and Sparse Sensing
D3 Network Architecture
DenseNet
Estimating Depth from RGB and Sparse Sensing
Estimating Depth from RGB and Sparse Sensing
Appendix
InterpoNet, a brain inspired NN for optic flow
dense interpolation
• Sparse-to-dense interpolation for optical flow is a fundamental phase in the pipeline of
most of the leading optical flow estimation algorithms.
• The current SoA method for interpolation, EpicFlow, is a local average method based on an
edge aware geodesic distance.
• This is a data-driven sparse-to-dense interpolation algorithm based on FCN.
• Inspiration from the filling-in process in the visual cortex, introduce lateral dependencies
between neurons and multi-layer supervision into the learning process.
• The main branch of the network consists of ten layers, each applying a 7x7 convolution filter
followed by an ELU (exponential linear unit) non-linearity.
• The input to the entire algorithm is a set of sparse and noisy matches.
• FlowFields (FF), CPM-Flow (CPM), DiscreteFlow (DF), DeepMatching (DM);
• From the matches, produce as parse flow map of size h×w×2 of the image pair.
InterpoNet, a brain inspired NN for optic flow
dense interpolation
InterpoNet
InterpoNet, a brain inspired NN for optic flow
dense interpolation
• Inspired by that neuronal filling-in takes place in many layers in the visual system hierarchy,
used detour networks connecting each and every layer directly to the loss function.
• During training, the loss function served as top down information pushing each layer to
perform interpolation in the best possible manner.
• The detour networks were kept simple: aside from the main branch of the network, each of
the layer’s activations was transformed into a two channels flow map using a single conv.
layer with linear activations.
• Each of the flow maps produced by the detour networks was compared to the ground truth
flow map using the EPE and LD losses.
• The final network loss was the weighted sum of all the losses.
• For inference, use only the last detour layer output - the one connected to the last layer of
the network’s main branch.
InterpoNet, a brain inspired NN for optic flow
dense interpolation
InterpoNet, a brain inspired NN for optic flow
dense interpolation
InterpoNe
Thanks

More Related Content

What's hot

Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
Sushant Shrivastava
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
Hichem Felouat
 
cnn ppt.pptx
cnn ppt.pptxcnn ppt.pptx
cnn ppt.pptx
rohithprabhas1
 
Enhanced Deep Residual Networks for Single Image Super-Resolution
Enhanced Deep Residual Networks for Single Image Super-ResolutionEnhanced Deep Residual Networks for Single Image Super-Resolution
Enhanced Deep Residual Networks for Single Image Super-Resolution
NAVER Engineering
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)nikhilus85
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
Ferdous ahmed
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
Yu Huang
 
Biometric recognition using deep learning
Biometric recognition using deep learningBiometric recognition using deep learning
Biometric recognition using deep learning
SwatiNarkhede1
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
Dongmin Choi
 
Emotion detection using cnn.pptx
Emotion detection using cnn.pptxEmotion detection using cnn.pptx
Emotion detection using cnn.pptx
RADO7900
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth Estimation
Ryo Takahashi
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
Usman Qayyum
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Junaid Bhat
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
남주 김
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
Brodmann17
 
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIESYOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
IRJET Journal
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction
Wael Badawy
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
Data Science Thailand
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
Sungjoon Choi
 

What's hot (20)

Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
cnn ppt.pptx
cnn ppt.pptxcnn ppt.pptx
cnn ppt.pptx
 
Enhanced Deep Residual Networks for Single Image Super-Resolution
Enhanced Deep Residual Networks for Single Image Super-ResolutionEnhanced Deep Residual Networks for Single Image Super-Resolution
Enhanced Deep Residual Networks for Single Image Super-Resolution
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
 
Biometric recognition using deep learning
Biometric recognition using deep learningBiometric recognition using deep learning
Biometric recognition using deep learning
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Emotion detection using cnn.pptx
Emotion detection using cnn.pptxEmotion detection using cnn.pptx
Emotion detection using cnn.pptx
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth Estimation
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIESYOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
YOGA POSE DETECTION USING MACHINE LEARNING LIBRARIES
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 

Similar to Depth Fusion from RGB and Depth Sensors by Deep Learning

Depth Fusion from RGB and Depth Sensors III
Depth Fusion from RGB and Depth Sensors  IIIDepth Fusion from RGB and Depth Sensors  III
Depth Fusion from RGB and Depth Sensors III
Yu Huang
 
Depth Fusion from RGB and Depth Sensors IV
Depth Fusion from RGB and Depth Sensors  IVDepth Fusion from RGB and Depth Sensors  IV
Depth Fusion from RGB and Depth Sensors IV
Yu Huang
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
Yu Huang
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
Yu Huang
 
Single Image Depth Estimation using frequency domain analysis and Deep learning
Single Image Depth Estimation using frequency domain analysis and Deep learningSingle Image Depth Estimation using frequency domain analysis and Deep learning
Single Image Depth Estimation using frequency domain analysis and Deep learning
Ahan M R
 
WT in IP.ppt
WT in IP.pptWT in IP.ppt
WT in IP.ppt
viveksingh19210115
 
The single image dehazing based on efficient transmission estimation
The single image dehazing based on efficient transmission estimationThe single image dehazing based on efficient transmission estimation
The single image dehazing based on efficient transmission estimation
AVVENIRE TECHNOLOGIES
 
Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...
Benyamin Moadab
 
notes_Image Compression.ppt
notes_Image Compression.pptnotes_Image Compression.ppt
notes_Image Compression.ppt
HarisMasood20
 
notes_Image Compression.ppt
notes_Image Compression.pptnotes_Image Compression.ppt
notes_Image Compression.ppt
HarisMasood20
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II
Yu Huang
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
Yu Huang
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
mvitelli_ee367_final_report
mvitelli_ee367_final_reportmvitelli_ee367_final_report
mvitelli_ee367_final_reportMatt Vitelli
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
NUPUR YADAV
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
Yu Huang
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
Shunta Saito
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
Rishabh Indoria
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
fahmi324663
 

Similar to Depth Fusion from RGB and Depth Sensors by Deep Learning (20)

Depth Fusion from RGB and Depth Sensors III
Depth Fusion from RGB and Depth Sensors  IIIDepth Fusion from RGB and Depth Sensors  III
Depth Fusion from RGB and Depth Sensors III
 
Depth Fusion from RGB and Depth Sensors IV
Depth Fusion from RGB and Depth Sensors  IVDepth Fusion from RGB and Depth Sensors  IV
Depth Fusion from RGB and Depth Sensors IV
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
Single Image Depth Estimation using frequency domain analysis and Deep learning
Single Image Depth Estimation using frequency domain analysis and Deep learningSingle Image Depth Estimation using frequency domain analysis and Deep learning
Single Image Depth Estimation using frequency domain analysis and Deep learning
 
WT in IP.ppt
WT in IP.pptWT in IP.ppt
WT in IP.ppt
 
The single image dehazing based on efficient transmission estimation
The single image dehazing based on efficient transmission estimationThe single image dehazing based on efficient transmission estimation
The single image dehazing based on efficient transmission estimation
 
Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...Learning RGB-D Salient Object Detection using background enclosure, depth con...
Learning RGB-D Salient Object Detection using background enclosure, depth con...
 
notes_Image Compression.ppt
notes_Image Compression.pptnotes_Image Compression.ppt
notes_Image Compression.ppt
 
notes_Image Compression.ppt
notes_Image Compression.pptnotes_Image Compression.ppt
notes_Image Compression.ppt
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
mvitelli_ee367_final_report
mvitelli_ee367_final_reportmvitelli_ee367_final_report
mvitelli_ee367_final_report
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 

More from Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
Yu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
Yu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
Yu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
Yu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
Yu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
Yu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
Yu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
Yu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
Yu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
Yu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
Yu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
Yu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
Yu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
Yu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
Yu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
Yu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
Yu Huang
 

More from Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 

Recently uploaded

一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 

Recently uploaded (20)

一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 

Depth Fusion from RGB and Depth Sensors by Deep Learning

  • 1. Depth Fusion from RGB and Depth Sensors Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline • 1. Sparsity Invariant CNNs • 2. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image • 3. Self-Supervised Sparse-to-Dense: Depth Completion from LiDAR and Monocular Camera • 4. Fusion of stereo and still monocular depth estimates in a self-supervised learning context • 5. Deep Depth Completion of a Single RGB-D Image • 6. Estimating Depth from RGB and Sparse Sensing • Appendix: InterpoNet, a brain inspired NN for optic flow dense interpolation
  • 3. Sparsity Invariant CNNs • CNNs operating on sparse inputs for depth completion from sparse laser scan data. • Traditional convolutional networks perform poorly when applied to sparse data even when the location of missing data is provided to the network. • This is simple yet effective sparse convolution layer which explicitly considers the location of missing data during the convolution operation. • The network architecture in synthetic and real experiments wrt various baseline approaches. • Compared to dense baselines, the sparse convolution network generalizes well to novel datasets and is invariant to the level of sparsity in the data. • A dataset from the KITTI benchmark, comprising over 94k depth annotated RGB images. • The dataset allows for training and evaluating depth completion and depth prediction techniques in challenging real-world settings.
  • 4. Sparsity Invariant CNNs (a) as inputs leads to noisy results when processed with standard CNNs (c). In contrast, sparse conv. network (d) predicts smooth and accurate depth maps by explicitly considering sparsity during convolution. (a) Input (visually enhanced) (b) Ground truth (c) Standard ConvNet (d) Sparse conv. network
  • 5. Sparsity Invariant CNNs Sparse Convolutional Network. (a) The input to the network is a sparse depth map (yellow) and a binary observation mask (red). It passes through several sparse convolution layers (dashed) with decreasing kernel sizes from 11×11 to 3 × 3. (b) Schematic of our sparse convolution operation. Here, ⊙ denotes elementwise multiplication, ∗ convolution, 1/x inversion and “max pool” the max pooling operation. The input feature can be single channel or multi-channel.
  • 6. Sparsity Invariant CNNs KITTI 2015 Dataset Raw LiDaR Acc. LiDaR SG RGB Image Error Maps wrt. KITTI 2015
  • 7. Sparsity Invariant CNNs (a) Input (enhanced) (b) ConvNet (c) ConvNet + mask (d) Sparse ConvNet (e) Ground truth
  • 8. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image • Dense depth prediction from a sparse set of depth measurements and a single RGB image. • Introduce additional sparse depth samples, either acquired with a low-resolution depth sensor or computed via visual SLAM algorithms. • Use a single deep regression network to learn directly from the RGB-D raw data, and explore the impact of number of depth samples on prediction accuracy. • Two applications: a plug-in module in SLAM to convert sparse maps to dense maps, and super-resolution for LiDARs.
  • 9. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image CNN architecture for NYU-Depth-v2 and KITTI datasets, respectively.
  • 10. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image prediction on KITTI RGB images RGB-based prediction sd prediction with 200 and no RGB RGB-d prediction with 200 sparse depth and rgb ground truth depth
  • 11. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image RGB raw depth predicted depth
  • 12. Self-Supervised Sparse-to-Dense: Depth Completion from LiDAR and Monocular Camera • Depth completion faces 3 main challenges: 1). the irregularly spaced pattern in the sparse depth input, 2). the difficulty in handling multiple sensor modalities, 3). the lack of dense, pixel-level ground truth depth labels. • A deep regression model to learn mapping from sparse depth (+rgb images) to dense depth. • A self-supervised training framework that requires only sequences of rgb and sparse depth images, without the need for dense depth labels. Given (a) sparse LiDAR scans, (b) a color image, estimate (d) a dense depth image. Semi-dense depth labels, (d) and (e), apply a highly-scalable, self-supervised framework for training such networks.
  • 13. Self-Supervised Sparse-to-Dense: Depth Completion from LiDAR and Monocular Camera The encoder consists of a sequence of convolutions with increasing filter banks to down-sample the feature spatial resolutions. The decoder, on the other hand, has a reversed structure with transposed convolutions to up-sample the spatial resolutions. The input sparse depth and the color image are separately processed by their initial convolutions. The convolved outputs are concatenated into a single tensor, which acts as input to the residual blocks of ResNet-34. Output from each of the encoding layers is passed to, via skip connections, the corresponding decoding layers. A final 1x1 convolution filter produces a single prediction image with the same resolution as network input. All convolutions are followed by batch normalization and ReLU, with the exception at the last layer.
  • 14. Self-Supervised Sparse-to-Dense: Depth Completion from LiDAR and Monocular Camera A model-based self-supervised training framework for depth completion. This framework requires only a synchronized sequence of color/intensity images from a monocular camera and sparse depth images from LiDAR. White rectangles are variables, red is the depth network to be trained, blue are deterministic computational blocks (without learnable parameters), and green are loss functions. During training, the current data frame RGBd1 and a nearby data frame RGB2 are both used to provide supervision signals. At inference time, only the current frame RGBd1 is needed to produce a depth prediction pred1.
  • 15. Self-Supervised Sparse-to-Dense: Depth Completion from LiDAR and Monocular Camera • The depth loss is defined as • Given the camera intrinsic matrix K, any pixel p1 in the current frame 1 has the corresponding projection in frame 2 as • Synthetic color image using bilinear interpolation • The final photometric loss is • The final loss function for the entire self-supervised framework is Smoothness loss
  • 16. Self-Supervised Sparse-to-Dense: Depth Completion from LiDAR and Monocular Camera
  • 17. Fusion of stereo and still monocular depth estimates in a self-supervised learning context • Self-supervised learning in which stereo vision depth estimates serve as targets for a CNN that transforms a single image to a depth map. • After training, the stereo and mono estimates are fused with a method that preserves high confidence stereo estimates, while leveraging CNN estimates in the low-confidence regions. • Even rather limited CNNs can help provide stereo vision equipped robots with more reliable depth maps for autonomous navigation. Self-supervised learning (SSL)
  • 18. Fusion of stereo and still monocular depth estimates in a self-supervised learning context The regions where stereo vision is ’blind’ can be unveiled by the monocular estimator, as in those areas a still mono estimator has a priori no constraints to make a valid depth prediction. Note that the scene and obstacle are quite close to the camera. In large outdoor scenes with obstacles further away, the proportion of occluded areas will be much smaller.
  • 19. Fusion of stereo and still monocular depth estimates in a self-supervised learning context • The monocular depth estimation is performed with the Fully Convolutional Network (FCN). • The basis is the well known VGG network, which is pruned of its fully connected layers. • There are 5 main principles behind the fusion operation: • (i) as CNN is better at estimating relative depths, its output should be scaled to the stereo range; • (ii) when a pixel is occluded only monocular estimates are preserved; • (iii) when stereo is considered reliable, its estimates are preserved; • (iv)/(v) when in a region of low stereo confidence, if the relative depth estimates are dissimilar/similar, then the CNN is trusted more/the stereo is trusted more. • Since stereo vision involves finding correspond. in the same row, it relies on vertical contrasts. • Convolve with a vertical Sobel filter and apply a threshold to obtain a binary map. This map is subsequently convolved with a Gaussian blur filter of a relatively large size and renormalized; • After the merging operation a median filter with a 5 × 5 kernel is used to smooth the final depth map and reduce even more overall noise.
  • 20. Fusion of stereo and still monocular depth estimates in a self-supervised learning context 1) the rgb image. 2) Stereo depth map. 3) Still-mono depth map. 4) The merged depth map. 5) Confidence map (red high stereo confidence, blue mono). 6) Diff in error against GT btw mono and stereo (red high mono errors, blue high stereo errors). 7) Velodyne depth map
  • 21. Deep Depth Completion of a Single RGB-D Image • The goal is to complete the depth channel of an RGB-D image. • To train a deep network that takes an RGB image as input and predicts dense surface normals and occlusion boundaries. • Those predictions are then combined with raw depth observations provided by the RGB-D camera to solve for depths for all pixels, including those missing in the original observation. • A depth completion benchmark dataset, where holes are filled in training data through the rendering of surface reconstructions created from multi-view RGB-D scans.
  • 22. Deep Depth Completion of a Single RGB-D Image 1) prediction of surface normals and occlusion boundaries only from color, and 2) optimization of global surface structure from those predictions with soft constraints provided by observed depths.
  • 23. Deep Depth Completion of a Single RGB-D Image Depth Completion Dataset. Depth completions are computed from multi- view surface reconstructions of large indoor environments. Bottom: the raw color and depth channels with the rendered depth for the viewpoint marked as the red dot. The rendered mesh (colored by vertex in large image) is created by combining RGB-D images from a variety of other views spread throughout the scene (yellow dots), which collaborate to fill holes when rendered to the red dot view.
  • 24. Deep Depth Completion of a Single RGB-D Image Using surface normals to solve for depth completion. (a) An example of where depth cannot be solved from surface normal. (b) The area missing depth is marked in red. The red arrow shows paths on which depth cannot be integrated from surface normals. However in real-world images, there are usually many paths through connected neighboring pixels (along floors, ceilings, etc.) over which depths can be integrated (green arrows).
  • 25. Deep Depth Completion of a Single RGB-D Image • The model is a FCN built on the back-bone of VGG-16 with symmetry encoder and decoder. • It is also equipped with short-cut connections and shared pooling masks for corresponding max pooling and unpooling layers, which are critical for learning local image features. • Train the network with “ground truth” surface normals and silhouette boundaries computed from the reconstructed mesh. • Define the observed pixels as the ones with depth data from both the raw sensor and the rendered mesh, and the unobserved pixels as the ones with depth from the rendered mesh but not the raw sensor. • For any given set of pixels (observed, unobserved, or both), train models with a loss for only those pixels by masking out the gradients on other pixels during BP. • The network learns to predict normals better from color than depth, even if the network is given an extra channel containing a binary mask indicating which pixels observe depth.
  • 26. Deep Depth Completion of a Single RGB-D Image • After predicting the surface normal image N and occlusion boundary image B, solve a system of equations to complete the depth image D. • The objective function is de- fined as the weighted sum of squared errors with four terms: Input & GT Zhang et al. Laina et al. Chakrabarti et al.
  • 27. Estimating Depth from RGB and Sparse Sensing • A deep model that can produce dense depth maps given an RGB image with known depth at a very sparse set of pixels. The objective is to densify a sparse depth map (with additional cues from an RGB image), then the model is called Deep Depth Densification, or D3 .
  • 28. Estimating Depth from RGB and Sparse Sensing • A parametrization of the sparse depth input that accommodates sparse input patterns. • It allows for varying such patterns not only across different deep models but even within the same model during training and testing. • Inputs to parametrization: • I(x, y) and D(x, y): RGB vector-valued image I and ground truth depth D • Both maps have dimensions H×W. Invalid values in D are encoded as zero. • M(x,y): Binary pattern mask of dimensions H×W, where M(x,y) = 1 defines (x,y) locations of our desired depth samples. • All points where M(x,y) = 1 must correspond to valid depth points (D(x, y) > 0). • From I, D and M, form 2 maps for the sparse depth input, S1(x,y) and S2(x,y). • Both maps have dimension H×W; • S1(x,y) is a NN (nearest neighbor) fill of the sparse depth M(x,y)∗D(x,y). • S2(x, y) is the Euclidean Distance Transform of M(x, y), i.e. the L2 distance btw (x,y) and the closest point (x’,y’) where M(x′,y′) = 1. • The final parametrization of the sparse depth input is the concatenation of S1(x,y) and S2(x,y).
  • 29. Estimating Depth from RGB and Sparse Sensing
  • 30. Estimating Depth from RGB and Sparse Sensing Both regular and irregular sparse patterns in S1 (top) and S2 (bottom). Dark points in S2 correspond to the pixels where there is access to depth information.
  • 31. Estimating Depth from RGB and Sparse Sensing • For regular grid patterns, to ensure minimal spatial bias when choosing the mask M(x,y) by enforcing equal spacing btw subsequent pattern points in both the x and y directions. • Such a strategy is convenient when one model accommodate images of different resolutions. • For ease of interpretation, use sparse patterns close to an integer level of downsampling; • It is beneficial to vary the sparse pattern M(x,y) during training. • Such a schedule begins training at 6 times the desired sparse pattern density and smoothly decays towards the final density as training progresses. • Also train with randomly varying sampling densities at each training step.
  • 32. Estimating Depth from RGB and Sparse Sensing D3 Network Architecture DenseNet
  • 33. Estimating Depth from RGB and Sparse Sensing
  • 34. Estimating Depth from RGB and Sparse Sensing
  • 36. InterpoNet, a brain inspired NN for optic flow dense interpolation • Sparse-to-dense interpolation for optical flow is a fundamental phase in the pipeline of most of the leading optical flow estimation algorithms. • The current SoA method for interpolation, EpicFlow, is a local average method based on an edge aware geodesic distance. • This is a data-driven sparse-to-dense interpolation algorithm based on FCN. • Inspiration from the filling-in process in the visual cortex, introduce lateral dependencies between neurons and multi-layer supervision into the learning process. • The main branch of the network consists of ten layers, each applying a 7x7 convolution filter followed by an ELU (exponential linear unit) non-linearity. • The input to the entire algorithm is a set of sparse and noisy matches. • FlowFields (FF), CPM-Flow (CPM), DiscreteFlow (DF), DeepMatching (DM); • From the matches, produce as parse flow map of size h×w×2 of the image pair.
  • 37. InterpoNet, a brain inspired NN for optic flow dense interpolation InterpoNet
  • 38. InterpoNet, a brain inspired NN for optic flow dense interpolation • Inspired by that neuronal filling-in takes place in many layers in the visual system hierarchy, used detour networks connecting each and every layer directly to the loss function. • During training, the loss function served as top down information pushing each layer to perform interpolation in the best possible manner. • The detour networks were kept simple: aside from the main branch of the network, each of the layer’s activations was transformed into a two channels flow map using a single conv. layer with linear activations. • Each of the flow maps produced by the detour networks was compared to the ground truth flow map using the EPE and LD losses. • The final network loss was the weighted sum of all the losses. • For inference, use only the last detour layer output - the one connected to the last layer of the network’s main branch.
  • 39. InterpoNet, a brain inspired NN for optic flow dense interpolation
  • 40. InterpoNet, a brain inspired NN for optic flow dense interpolation InterpoNe