Depth estimation using deep learning

Depth Images Prediction from a Single RGB Image
Using Deep learning
Deep Learning
May 2017
Soubhi Hadri

Table of Contents :
Introduction.1
Existing Solutions.2
Dataset and Model.3
Project Code and Results.1

Introduction
-In 3D computer graphics a depth map is an image or image channel
that contains information relating to the distance of the surfaces of
scene objects from a viewpoint.
-RGB-D image : a RGB image and its corresponding depth image
-A depth image is an image channel in which each pixel relates to a
distance between the image plane and the corresponding object in the
RGB image.

Introduction
To approximate the depth of objects :
• Stereo camera : camera with two/more lenses to simulate human vision.
• Realsense or Kinect to get RGB-D images
• Deep Learning..!!

Deep Learning for depth estimation :
Recently, there are many works to estimate the depth map for RGB image.

Deep Learning for depth estimation :
Learning Fine-Scaled Depth
Maps from Single RGB Images.
7 Feb 2017
Recently, there are many works to estimate the depth map for RGB image.

Dataset : NYU Depth V2
The NYU-Depth V2 data set is comprised of video sequences from a variety of
indoor scenes as recorded by both the RGB and Depth cameras from the
Microsoft Kinect.

The dataset consists of :
• 1449 labeled pairs of aligned RGB and depth images (2.8 GB).
• 407,024 new unlabeled frames - raw rgb, depth (428 GB).
• Toolbox: Useful functions for manipulating the data and labels.
Different parts of the dataset can be downloaded individually.
Authors : Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus
2012

For this project:
• Office 1-2 dataset (part of the whole dataset).
• 15 GB after processing RAW data.
• 3522 RGB-D images.

For this project:
• Office 1-2 dataset (part of the whole dataset).
• 15 GB after processing RAW data.
• 3522 RGB-D images.
Split the data:
3522
20%
80% 2817
705
2414
403
Training
Validation
Test

Samples of the data:

The Model for Depth Estimation:
Model proposed by JaN IVANECK in his master degree thesis -2016.

Model proposed by JaN IVANECK in his master degree thesis -2016.
He derived his model from Eigen et al.
Predicting Depth, Surface
Normals and Semantic Labels
with a Common Multi-Scale
Convolutional Architecture.
17 Dec 2015

Global context network
estimates the rough
depth map of the whole
scene from the input
RGB image.

Gradient network
estimates horizontal and
vertical gradients of the
depth map globally, for
the whole RGB image.

Refining network
improves the rough
estimate from the global
context network, utilizing
gradients estimated by the
gradient network and an
input RGB image.

Global context network
Architecture of the global context
network
The model is derived from AlexNet.

Loss Function:
Root mean squared error log(rms-log)

Training The Network:
1- Scale the output images to [0 1].
2-Subtraction 127 from input images to center the data (kind of normalization).
3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer
Learning).
4-Training the network using batches (batch size = 32) for 35 Epochs.
5- Save the session and model in the end of each Epoch.

Training The Network:
1- Scale the label images to [0 1].
2-Subtraction 127 from input images to center the data (kind of normalization).
3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer
Learning).
4-Training the network using batches (batch size = 32) for 35 Epochs.
5- Save the session and model in the end of each Epoch.

Project Functions :
1- split_data : to split and save the data into training/testing/val.npy files.
2- load_data : load data from .npy files.
3- plot_imgs: to plot pair of images.
4- get_next_batch: to get the next batch from training data.
5- loss : calculate the loss function.
6- model: to create model (network structure).

Project Functions :
7- train: to start training .
8- evaluate: to evaluate new data after restoring the model..

Project Tools and Libraries:
1- Tensorflow.
2- Slim : lightweight library for defining, training and evaluating complex
models in TensorFlow.
3- Tensorboard.
4- numpy.
5-matplotlib.

Project Results: 
Training Loss error:

Samples of new data:

Explanation :
• Training data is not sufficient.

Explanation :
In Jan’s experiment:
• Full NYU dataset and 3 dataset generated from the original one.
• Network was trained for 100,000 iterations.

Explanation :
In Jan’s experiment:
• Full NYU dataset and 3 dataset generated from the original one.
• Network was trained for 100,000 iterations.
This experiment:
• It took ~26 hours for 30 Epochs.

Project :
The project code and data will be available on GitHub:
https://github.com/SubhiH/Depth-Estimation-Deep-Learning

Resources :
-https://arxiv.org/pdf/1607.00730.pdf
-http://janivanecky.com/
-http://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html

Depth estimation using deep learning

More Related Content

What's hot

Similar to Depth estimation using deep learning

Recently uploaded

Depth estimation using deep learning