Slides by Amaia Salvador at the UPC Computer Vision Reading Group.
Source document on GDocs with clickable links:
https://docs.google.com/presentation/d/1jDTyKTNfZBfMl8OHANZJaYxsXTqGCHMVeMeBe5o1EL0/edit?usp=sharing
Based on the original work:
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
Deep learning for image super resolutionPrudhvi Raj
Using Deep Convolutional Networks, the machine can learn end-to-end mapping between the low/high-resolution images. Unlike traditional methods, this method jointly optimizes all the layers of the image. A light-weight CNN structure is used, which is simple to implement and provides formidable trade-off from the existential methods.
Presentation for the Berlin Computer Vision Group, December 2020 on deep learning methods for image segmentation: Instance segmentation, semantic segmentation, and panoptic segmentation.
Slides by Amaia Salvador at the UPC Computer Vision Reading Group.
Source document on GDocs with clickable links:
https://docs.google.com/presentation/d/1jDTyKTNfZBfMl8OHANZJaYxsXTqGCHMVeMeBe5o1EL0/edit?usp=sharing
Based on the original work:
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster R-CNN: Towards real-time object detection with region proposal networks." In Advances in Neural Information Processing Systems, pp. 91-99. 2015.
Deep learning for image super resolutionPrudhvi Raj
Using Deep Convolutional Networks, the machine can learn end-to-end mapping between the low/high-resolution images. Unlike traditional methods, this method jointly optimizes all the layers of the image. A light-weight CNN structure is used, which is simple to implement and provides formidable trade-off from the existential methods.
Presentation for the Berlin Computer Vision Group, December 2020 on deep learning methods for image segmentation: Instance segmentation, semantic segmentation, and panoptic segmentation.
Enhanced Deep Residual Networks for Single Image Super-ResolutionNAVER Engineering
발표자: 김희원 (서울대학교 박사과정)
발표일: 2017.9.
(현)서울대학교 전기정보공학 석박통합과정 재학
Best Paper Award of NTIRE 2017 Workshop: Challenge Track
개요:
Single Image Super-Resolution은 저해상도 이미지를 고해상도의 원본 이미지로 복원시키는 연구 분야입니다. 실생활에서 접할 수 있는 흔한 예로는 SNS 사진 중 작은 부분을 크게 확대해도 선명하게 하는 것이나, thumb nail로 원본 이미지만큼의 해상도를 만들어 내는 것입니다.
이번 발표에서는 딥러닝 전과 후의 연구방향에 대해서 알아본 후, CVPR 2017의 2nd NTIRE Workshop Challenge에서 우승한 저희 팀의 연구를 신경망 구조에 대한 분석을 중심으로 살펴보려고 합니다.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
This Powerpoint prsentation contains information about the overview of various successful works performed for Biometric Recognition using Deep Learning. This work is based on an existing survey paper.
We will use 7 emotions namely - We have used 7 emotions namely - 'Angry', 'Disgust'濫, 'Fear', 'Happy', 'Neutral', 'Sad'☹️, 'Surprise' to train and test our algorithm using Convolution Neural Networks.
Object Detection using Deep Neural NetworksUsman Qayyum
Recent Talk at PI school covering following contents
Object Detection
Recent Architecture of Deep NN for Object Detection
Object Detection on Embedded Computers (or for edge computing)
SqueezeNet for embedded computing
TinySSD (object detection for edge computing)
Deep learning (also known as deep structured learning or hierarchical learning) is the application of artificial neural networks (ANNs) to learning tasks that contain more than one hidden layer. Deep learning is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, partially supervised or unsupervised.
Enhanced Deep Residual Networks for Single Image Super-ResolutionNAVER Engineering
발표자: 김희원 (서울대학교 박사과정)
발표일: 2017.9.
(현)서울대학교 전기정보공학 석박통합과정 재학
Best Paper Award of NTIRE 2017 Workshop: Challenge Track
개요:
Single Image Super-Resolution은 저해상도 이미지를 고해상도의 원본 이미지로 복원시키는 연구 분야입니다. 실생활에서 접할 수 있는 흔한 예로는 SNS 사진 중 작은 부분을 크게 확대해도 선명하게 하는 것이나, thumb nail로 원본 이미지만큼의 해상도를 만들어 내는 것입니다.
이번 발표에서는 딥러닝 전과 후의 연구방향에 대해서 알아본 후, CVPR 2017의 2nd NTIRE Workshop Challenge에서 우승한 저희 팀의 연구를 신경망 구조에 대한 분석을 중심으로 살펴보려고 합니다.
A comprehensive tutorial on Convolutional Neural Networks (CNN) which talks about the motivation behind CNNs and Deep Learning in general, followed by a description of the various components involved in a typical CNN layer. It explains the theory involved with the different variants used in practice and also, gives a big picture of the whole network by putting everything together.
Next, there's a discussion of the various state-of-the-art frameworks being used to implement CNNs to tackle real-world classification and regression problems.
Finally, the implementation of the CNNs is demonstrated by implementing the paper 'Age ang Gender Classification Using Convolutional Neural Networks' by Hassner (2015).
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
This Powerpoint prsentation contains information about the overview of various successful works performed for Biometric Recognition using Deep Learning. This work is based on an existing survey paper.
We will use 7 emotions namely - We have used 7 emotions namely - 'Angry', 'Disgust'濫, 'Fear', 'Happy', 'Neutral', 'Sad'☹️, 'Surprise' to train and test our algorithm using Convolution Neural Networks.
Object Detection using Deep Neural NetworksUsman Qayyum
Recent Talk at PI school covering following contents
Object Detection
Recent Architecture of Deep NN for Object Detection
Object Detection on Embedded Computers (or for edge computing)
SqueezeNet for embedded computing
TinySSD (object detection for edge computing)
Deep learning (also known as deep structured learning or hierarchical learning) is the application of artificial neural networks (ANNs) to learning tasks that contain more than one hidden layer. Deep learning is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, partially supervised or unsupervised.
Single Image Depth Estimation using frequency domain analysis and Deep learningAhan M R
Using Machine Learning and Deep Learning Techniques, we train the ResNet CNN Model and build a model for estimating Depth using the Discrete Fourier Domain Analysis, and generate results including the explanation of the Loss function and code snippets.
The single image dehazing based on efficient transmission estimationAVVENIRE TECHNOLOGIES
We propose a novel haze imaging model for single image haze removal. Haze imaging model is formulated using dark channel prior (DCP), scene radiance, intensity, atmospheric light and transmission medium. The dark channel prior is based on the statistics of outdoor haze-free images. We find that, in most of the local regions which do not cover the sky, some pixels (called dark pixels) very often have very low intensity in at least one color (RGB) channel. In hazy images, the intensity of these dark pixels in that channel is mainly contributed by the air light. Therefore, these dark pixels can directly provide an accurate estimation of the haze transmission. Combining a haze imaging model and a interpolation method, we can recover a high-quality haze free image and produce a good depth map.
https://telecombcn-dl.github.io/dlmm-2017-dcu/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
This presentation is Part 2 of my September Lisp NYC presentation on Reinforcement Learning and Artificial Neural Nets. We will continue from where we left off by covering Convolutional Neural Nets (CNN) and Recurrent Neural Nets (RNN) in depth.
Time permitting I also plan on having a few slides on each of the following topics:
1. Generative Adversarial Networks (GANs)
2. Differentiable Neural Computers (DNCs)
3. Deep Reinforcement Learning (DRL)
Some code examples will be provided in Clojure.
After a very brief recap of Part 1 (ANN & RL), we will jump right into CNN and their appropriateness for image recognition. We will start by covering the convolution operator. We will then explain feature maps and pooling operations and then explain the LeNet 5 architecture. The MNIST data will be used to illustrate a fully functioning CNN.
Next we cover Recurrent Neural Nets in depth and describe how they have been used in Natural Language Processing. We will explain why gated networks and LSTM are used in practice.
Please note that some exposure or familiarity with Gradient Descent and Backpropagation will be assumed. These are covered in the first part of the talk for which both video and slides are available online.
A lot of material will be drawn from the new Deep Learning book by Goodfellow & Bengio as well as Michael Nielsen's online book on Neural Networks and Deep Learning as well several other online resources.
Bio
Pierre de Lacaze has over 20 years industry experience with AI and Lisp based technologies. He holds a Bachelor of Science in Applied Mathematics and a Master’s Degree in Computer Science.
https://www.linkedin.com/in/pierre-de-lacaze-b11026b/
Intro to selective search for object proposals, rcnn family and retinanet state of the art model deep dives for object detection along with MAP concept for evaluating model and how does anchor boxes make the model learn where to draw bounding boxes
Application of Foundation Model for Autonomous DrivingYu Huang
Since DARPA’s Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Recently powered by large language models (LLMs), chat systems, such as chatGPT and PaLM, emerge and rapidly become a promising direction to achieve artificial general intelligence (AGI) in natural language processing (NLP). There comes a natural thinking that we could employ these abilities to reformulate autonomous driving. By combining LLM with foundation models, it is possible to utilize the human knowledge, commonsense and reasoning to rebuild autonomous driving systems from the current long-tailed AI dilemma. In this paper, we investigate the techniques of foundation models and LLMs applied for autonomous driving, categorized as simulation, world model, data annotation and planning or E2E solutions etc.
Fisheye based Perception for Autonomous Driving VIYu Huang
Disentangling and Vectorization: A 3D Visual Perception Approach for Autonomous Driving Based on Surround-View Fisheye Cameras
SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround View Fisheye Cameras
FisheyeDistanceNet++: Self-Supervised Fisheye Distance Estimation with Self-Attention, Robust Loss Function and Camera View Generalization
An Online Learning System for Wireless Charging Alignment using Surround-view Fisheye Cameras
RoadEdgeNet: Road Edge Detection System Using Surround View Camera Images
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
Road-line detection and 3D reconstruction using fisheye cameras
• Vehicle Re-ID for Surround-view Camera System
• SynDistNet: Self-Supervised Monocular Fisheye Camera Distance
Estimation Synergized with Semantic Segmentation for Autonomous
Driving
• Universal Semantic Segmentation for Fisheye Urban Driving Images
• UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a
Generic Framework for Handling Common Camera Distortion Models
• OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving
• Adversarial Attacks on Multi-task Visual Perception for Autonomous Driving
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
FisheyeMultiNet: Real-time Multi-task Learning Architecture for
Surround-view Automated Parking System
• Generalized Object Detection on Fisheye Cameras for Autonomous
Driving: Dataset, Representations and Baseline
• SynWoodScape: Synthetic Surround-view Fisheye Camera Dataset for
Autonomous Driving
• Feasible Self-Calibration of Larger Field-of-View (FOV) Camera Sensors
for the ADAS
Autonomous driving for robotaxi, like perception, prediction, planning, decision making and control etc. As well as simulation, visualization and data closed loop etc.
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
Canadian Adverse Driving Conditions Dataset, 2020, 2
Deep multimodal sensor fusion in unseen adverse weather, 2020, 8
RADIATE: A Radar Dataset for Automotive Perception in Bad Weather, 2021, 4
Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection, 2021, 7
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather, 2021, 8
DSOR: A Scalable Statistical Filter for Removing Falling Snow from LiDAR Point Clouds in Severe Winter Weather, 2021, 9
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
Formal Scenario-Based Testing of Autonomous Vehicles: From Simulation to the Real World, 2020
A Scenario-Based Development Framework for Autonomous Driving, 2020
A Customizable Dynamic Scenario Modeling and Data Generation Platform for Autonomous Driving, 2020
Large Scale Autonomous Driving Scenarios Clustering with Self-supervised Feature Extraction, 2021
Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles, 2021
Systems Approach to Creating Test Scenarios for Automated Driving Systems, Reliability Engineering and System Safety (215), 2021
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
Introduction;
data driven models for autonomous driving;
cloud computing infrastructure and big data processing;
annotation tools for training data;
large scale model training platform;
model testing and verification;
related machine learning techniques;
Conclusion.
Simulation for autonomous driving at uber atgYu Huang
Testing Safety of SDVs by Simulating Perception and Prediction
LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World
Recovering and Simulating Pedestrians in the Wild
S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling
SceneGen: Learning to Generate Realistic Traffic Scenes
TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors
GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving
AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles
Appendix: (Waymo)
SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving
RegNet: Multimodal Sensor Registration Using Deep Neural Networks
CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks
RGGNet: Tolerance Aware LiDAR-Camera Online Calibration with Geometric Deep Learning and Generative Model
CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional Neural Network and Geometric Constraints
LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network
CFNet: LiDAR-Camera Registration Using Calibration Flow Network
Prediction and planning for self driving at waymoYu Huang
ChauffeurNet: Learning To Drive By Imitating The Best Synthesizing The Worst
Multipath: Multiple Probabilistic Anchor Trajectory Hypotheses For Behavior Prediction
VectorNet: Encoding HD Maps And Agent Dynamics From Vectorized Representation
TNT: Target-driven Trajectory Prediction
Large Scale Interactive Motion Forecasting For Autonomous Driving : The Waymo Open Motion Dataset
Identifying Driver Interactions Via Conditional Behavior Prediction
Peeking Into The Future: Predicting Future Person Activities And Locations In Videos
STINet: Spatio-temporal-interactive Network For Pedestrian Detection And Trajectory Prediction
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Depth Fusion from RGB and Depth Sensors by Deep Learning
1. Depth Fusion from RGB
and Depth Sensors
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
2. Outline
• 1. Sparsity Invariant CNNs
• 2. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image
• 3. Self-Supervised Sparse-to-Dense: Depth Completion from LiDAR and Monocular Camera
• 4. Fusion of stereo and still monocular depth estimates in a self-supervised learning context
• 5. Deep Depth Completion of a Single RGB-D Image
• 6. Estimating Depth from RGB and Sparse Sensing
• Appendix: InterpoNet, a brain inspired NN for optic flow dense interpolation
3. Sparsity Invariant CNNs
• CNNs operating on sparse inputs for depth completion from sparse laser scan data.
• Traditional convolutional networks perform poorly when applied to sparse data even when
the location of missing data is provided to the network.
• This is simple yet effective sparse convolution layer which explicitly considers the location of
missing data during the convolution operation.
• The network architecture in synthetic and real experiments wrt various baseline approaches.
• Compared to dense baselines, the sparse convolution network generalizes well to novel
datasets and is invariant to the level of sparsity in the data.
• A dataset from the KITTI benchmark, comprising over 94k depth annotated RGB images.
• The dataset allows for training and evaluating depth completion and depth prediction
techniques in challenging real-world settings.
4. Sparsity Invariant CNNs
(a) as inputs leads to noisy results when processed with standard CNNs (c). In contrast, sparse conv. network
(d) predicts smooth and accurate depth maps by explicitly considering sparsity during convolution.
(a) Input
(visually
enhanced)
(b) Ground truth
(c) Standard
ConvNet
(d) Sparse conv.
network
5. Sparsity Invariant CNNs
Sparse Convolutional Network. (a) The input to the network is a sparse depth map (yellow) and a binary
observation mask (red). It passes through several sparse convolution layers (dashed) with decreasing kernel
sizes from 11×11 to 3 × 3. (b) Schematic of our sparse convolution operation. Here, ⊙ denotes elementwise
multiplication, ∗ convolution, 1/x inversion and “max pool” the max pooling operation. The input feature can be
single channel or multi-channel.
8. Sparse-to-Dense: Depth Prediction from
Sparse Depth Samples and a Single Image
• Dense depth prediction from a sparse set of depth measurements and a single RGB image.
• Introduce additional sparse depth samples, either acquired with a low-resolution depth
sensor or computed via visual SLAM algorithms.
• Use a single deep regression network to learn directly from the RGB-D raw data, and
explore the impact of number of depth samples on prediction accuracy.
• Two applications: a plug-in module in SLAM to convert sparse maps to dense maps, and
super-resolution for LiDARs.
9. Sparse-to-Dense: Depth Prediction from
Sparse Depth Samples and a Single Image
CNN architecture for NYU-Depth-v2 and KITTI datasets, respectively.
10. Sparse-to-Dense: Depth Prediction from
Sparse Depth Samples and a Single Image
prediction on KITTI
RGB images
RGB-based
prediction
sd prediction with
200 and no RGB
RGB-d prediction
with 200 sparse
depth and rgb
ground truth depth
12. Self-Supervised Sparse-to-Dense: Depth
Completion from LiDAR and Monocular Camera
• Depth completion faces 3 main challenges: 1). the irregularly spaced pattern in the sparse
depth input, 2). the difficulty in handling multiple sensor modalities, 3). the lack of dense,
pixel-level ground truth depth labels.
• A deep regression model to learn mapping from sparse depth (+rgb images) to dense depth.
• A self-supervised training framework that requires only sequences of rgb and sparse depth
images, without the need for dense depth labels.
Given (a) sparse LiDAR scans, (b) a color image, estimate (d) a dense depth image. Semi-dense depth
labels, (d) and (e), apply a highly-scalable, self-supervised framework for training such networks.
13. Self-Supervised Sparse-to-Dense: Depth
Completion from LiDAR and Monocular Camera
The encoder consists of a sequence of convolutions with increasing filter banks to down-sample the
feature spatial resolutions. The decoder, on the other hand, has a reversed structure with transposed
convolutions to up-sample the spatial resolutions. The input sparse depth and the color image are
separately processed by their initial convolutions. The convolved outputs are concatenated into a single
tensor, which acts as input to the residual blocks of ResNet-34. Output from each of the encoding layers
is passed to, via skip connections, the corresponding decoding layers. A final 1x1 convolution filter
produces a single prediction image with the same resolution as network input. All convolutions are
followed by batch normalization and ReLU, with the exception at the last layer.
14. Self-Supervised Sparse-to-Dense: Depth
Completion from LiDAR and Monocular Camera
A model-based self-supervised training framework for depth completion. This framework requires only a
synchronized sequence of color/intensity images from a monocular camera and sparse depth images from
LiDAR. White rectangles are variables, red is the depth network to be trained, blue are deterministic
computational blocks (without learnable parameters), and green are loss functions. During training, the
current data frame RGBd1 and a nearby data frame RGB2 are both used to provide supervision signals. At
inference time, only the current frame RGBd1 is needed to produce a depth prediction pred1.
15. Self-Supervised Sparse-to-Dense: Depth
Completion from LiDAR and Monocular Camera
• The depth loss is defined as
• Given the camera intrinsic matrix K, any pixel p1 in the current frame 1 has
the corresponding projection in frame 2 as
• Synthetic color image using bilinear interpolation
• The final photometric loss is
• The final loss function for the entire self-supervised framework is
Smoothness loss
17. Fusion of stereo and still monocular depth
estimates in a self-supervised learning context
• Self-supervised learning in which stereo vision
depth estimates serve as targets for a CNN that
transforms a single image to a depth map.
• After training, the stereo and mono estimates are
fused with a method that preserves high
confidence stereo estimates, while leveraging
CNN estimates in the low-confidence regions.
• Even rather limited CNNs can help provide stereo
vision equipped robots with more reliable depth
maps for autonomous navigation.
Self-supervised learning (SSL)
18. Fusion of stereo and still monocular depth
estimates in a self-supervised learning context
The regions where stereo vision is ’blind’ can be unveiled by the monocular estimator, as in those
areas a still mono estimator has a priori no constraints to make a valid depth prediction. Note that
the scene and obstacle are quite close to the camera. In large outdoor scenes with obstacles
further away, the proportion of occluded areas will be much smaller.
19. Fusion of stereo and still monocular depth
estimates in a self-supervised learning context
• The monocular depth estimation is performed with the Fully Convolutional Network (FCN).
• The basis is the well known VGG network, which is pruned of its fully connected layers.
• There are 5 main principles behind the fusion operation:
• (i) as CNN is better at estimating relative depths, its output should be scaled to the stereo range;
• (ii) when a pixel is occluded only monocular estimates are preserved;
• (iii) when stereo is considered reliable, its estimates are preserved;
• (iv)/(v) when in a region of low stereo confidence, if the relative depth estimates are
dissimilar/similar, then the CNN is trusted more/the stereo is trusted more.
• Since stereo vision involves finding correspond. in the same row, it relies on vertical contrasts.
• Convolve with a vertical Sobel filter and apply a threshold to obtain a binary map. This map is
subsequently convolved with a Gaussian blur filter of a relatively large size and renormalized;
• After the merging operation a median filter with a 5 × 5 kernel is used to smooth the final
depth map and reduce even more overall noise.
20. Fusion of stereo and still monocular depth
estimates in a self-supervised learning context
1) the rgb image.
2) Stereo depth map.
3) Still-mono depth map.
4) The merged depth map.
5) Confidence map (red high stereo
confidence, blue mono).
6) Diff in error against GT btw mono
and stereo (red high mono errors,
blue high stereo errors).
7) Velodyne depth map
21. Deep Depth Completion of a Single RGB-D Image
• The goal is to complete the depth channel of an RGB-D image.
• To train a deep network that takes an RGB image as input and predicts dense surface
normals and occlusion boundaries.
• Those predictions are then combined with raw depth observations provided by the RGB-D
camera to solve for depths for all pixels, including those missing in the original observation.
• A depth completion benchmark dataset, where holes are filled in training data through the
rendering of surface reconstructions created from multi-view RGB-D scans.
22. Deep Depth Completion of a Single RGB-D Image
1) prediction of surface normals and occlusion boundaries only from color, and 2) optimization of
global surface structure from those predictions with soft constraints provided by observed depths.
23. Deep Depth Completion of a Single RGB-D Image
Depth Completion Dataset. Depth
completions are computed from multi-
view surface reconstructions of large
indoor environments. Bottom: the raw
color and depth channels with the
rendered depth for the viewpoint marked
as the red dot. The rendered mesh
(colored by vertex in large image) is
created by combining RGB-D images
from a variety of other views spread
throughout the scene (yellow dots),
which collaborate to fill holes when
rendered to the red dot view.
24. Deep Depth Completion of a Single RGB-D Image
Using surface normals to solve for depth completion. (a) An example of where depth cannot
be solved from surface normal. (b) The area missing depth is marked in red. The red arrow
shows paths on which depth cannot be integrated from surface normals. However in real-world
images, there are usually many paths through connected neighboring pixels (along floors,
ceilings, etc.) over which depths can be integrated (green arrows).
25. Deep Depth Completion of a Single RGB-D Image
• The model is a FCN built on the back-bone of VGG-16 with symmetry encoder and decoder.
• It is also equipped with short-cut connections and shared pooling masks for corresponding
max pooling and unpooling layers, which are critical for learning local image features.
• Train the network with “ground truth” surface normals and silhouette boundaries computed
from the reconstructed mesh.
• Define the observed pixels as the ones with depth data from both the raw sensor and the
rendered mesh, and the unobserved pixels as the ones with depth from the rendered mesh
but not the raw sensor.
• For any given set of pixels (observed, unobserved, or both), train models with a loss for only
those pixels by masking out the gradients on other pixels during BP.
• The network learns to predict normals better from color than depth, even if the network is
given an extra channel containing a binary mask indicating which pixels observe depth.
26. Deep Depth Completion of a Single RGB-D Image
• After predicting the surface normal image N and
occlusion boundary image B, solve a system of
equations to complete the depth image D.
• The objective function is de- fined as the
weighted sum of squared errors with four terms:
Input & GT Zhang et al. Laina et al. Chakrabarti et al.
27. Estimating Depth from RGB and Sparse Sensing
• A deep model that can produce dense depth maps given an RGB image with known depth
at a very sparse set of pixels.
The objective is to densify a sparse depth map (with additional cues from an RGB
image), then the model is called Deep Depth Densification, or D3 .
28. Estimating Depth from RGB and Sparse Sensing
• A parametrization of the sparse depth input that accommodates sparse input patterns.
• It allows for varying such patterns not only across different deep models but even within the
same model during training and testing.
• Inputs to parametrization:
• I(x, y) and D(x, y): RGB vector-valued image I and ground truth depth D
• Both maps have dimensions H×W. Invalid values in D are encoded as zero.
• M(x,y): Binary pattern mask of dimensions H×W, where M(x,y) = 1 defines (x,y) locations of our
desired depth samples.
• All points where M(x,y) = 1 must correspond to valid depth points (D(x, y) > 0).
• From I, D and M, form 2 maps for the sparse depth input, S1(x,y) and S2(x,y).
• Both maps have dimension H×W;
• S1(x,y) is a NN (nearest neighbor) fill of the sparse depth M(x,y)∗D(x,y).
• S2(x, y) is the Euclidean Distance Transform of M(x, y), i.e. the L2 distance btw (x,y) and the closest
point (x’,y’) where M(x′,y′) = 1.
• The final parametrization of the sparse depth input is the concatenation of S1(x,y) and S2(x,y).
30. Estimating Depth from RGB and Sparse Sensing
Both regular and irregular sparse patterns in S1 (top) and S2 (bottom). Dark points
in S2 correspond to the pixels where there is access to depth information.
31. Estimating Depth from RGB and Sparse Sensing
• For regular grid patterns, to ensure minimal spatial bias when choosing the mask M(x,y) by
enforcing equal spacing btw subsequent pattern points in both the x and y directions.
• Such a strategy is convenient when one model accommodate images of different resolutions.
• For ease of interpretation, use sparse patterns close to an integer level of downsampling;
• It is beneficial to vary the sparse pattern M(x,y) during training.
• Such a schedule begins training at 6 times the desired sparse pattern density and smoothly
decays towards the final density as training progresses.
• Also train with randomly varying sampling densities at each training step.
36. InterpoNet, a brain inspired NN for optic flow
dense interpolation
• Sparse-to-dense interpolation for optical flow is a fundamental phase in the pipeline of
most of the leading optical flow estimation algorithms.
• The current SoA method for interpolation, EpicFlow, is a local average method based on an
edge aware geodesic distance.
• This is a data-driven sparse-to-dense interpolation algorithm based on FCN.
• Inspiration from the filling-in process in the visual cortex, introduce lateral dependencies
between neurons and multi-layer supervision into the learning process.
• The main branch of the network consists of ten layers, each applying a 7x7 convolution filter
followed by an ELU (exponential linear unit) non-linearity.
• The input to the entire algorithm is a set of sparse and noisy matches.
• FlowFields (FF), CPM-Flow (CPM), DiscreteFlow (DF), DeepMatching (DM);
• From the matches, produce as parse flow map of size h×w×2 of the image pair.
37. InterpoNet, a brain inspired NN for optic flow
dense interpolation
InterpoNet
38. InterpoNet, a brain inspired NN for optic flow
dense interpolation
• Inspired by that neuronal filling-in takes place in many layers in the visual system hierarchy,
used detour networks connecting each and every layer directly to the loss function.
• During training, the loss function served as top down information pushing each layer to
perform interpolation in the best possible manner.
• The detour networks were kept simple: aside from the main branch of the network, each of
the layer’s activations was transformed into a two channels flow map using a single conv.
layer with linear activations.
• Each of the flow maps produced by the detour networks was compared to the ground truth
flow map using the EPE and LD losses.
• The final network loss was the weighted sum of all the losses.
• For inference, use only the last detour layer output - the one connected to the last layer of
the network’s main branch.