Depth Images Prediction from a Single RGB Image
Using Deep learning
Deep Learning
May 2017
Soubhi Hadri
Depth Images Prediction from a Single RGB Image
Table of Contents :
Introduction.1
Existing Solutions.2
Dataset and Model.3
Project Code and Results.1
Introduction
Depth Images Prediction from a Single RGB Image
Introduction
-In 3D computer graphics a depth map is an image or image channel
that contains information relating to the distance of the surfaces of
scene objects from a viewpoint.
-RGB-D image : a RGB image and its corresponding depth image
-A depth image is an image channel in which each pixel relates to a
distance between the image plane and the corresponding object in the
RGB image.
Depth Images Prediction from a Single RGB Image
Introduction
To approximate the depth of objects :
• Stereo camera : camera with two/more lenses to simulate human vision.
• Realsense or Kinect to get RGB-D images
• Deep Learning..!!
Existing Solutions
Depth Images Prediction from a Single RGB Image
Deep Learning for depth estimation :
Recently, there are many works to estimate the depth map for RGB image.
Depth Images Prediction from a Single RGB Image
Deep Learning for depth estimation :
Learning Fine-Scaled Depth
Maps from Single RGB Images.
7 Feb 2017
Recently, there are many works to estimate the depth map for RGB image.
Dataset & Model
Depth Images Prediction from a Single RGB Image
Dataset : NYU Depth V2
The NYU-Depth V2 data set is comprised of video sequences from a variety of
indoor scenes as recorded by both the RGB and Depth cameras from the
Microsoft Kinect.
Depth Images Prediction from a Single RGB Image
Dataset : NYU Depth V2
The NYU-Depth V2 data set is comprised of video sequences from a variety of
indoor scenes as recorded by both the RGB and Depth cameras from the
Microsoft Kinect.
Depth Images Prediction from a Single RGB Image
Dataset : NYU Depth V2
The dataset consists of :
• 1449 labeled pairs of aligned RGB and depth images (2.8 GB).
• 407,024 new unlabeled frames - raw rgb, depth (428 GB).
• Toolbox: Useful functions for manipulating the data and labels.
Different parts of the dataset can be downloaded individually.
Authors : Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus
2012
Depth Images Prediction from a Single RGB Image
Dataset : NYU Depth V2
The dataset consists of :
• 1449 labeled pairs of aligned RGB and depth images (2.8 GB).
• 407,024 new unlabeled frames - raw rgb, depth (428 GB).
• Toolbox: Useful functions for manipulating the data and labels.
Different parts of the dataset can be downloaded individually.
Authors : Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus
2012
Depth Images Prediction from a Single RGB Image
Dataset : NYU Depth V2
For this project:
• Office 1-2 dataset (part of the whole dataset).
• 15 GB after processing RAW data.
• 3522 RGB-D images.
Depth Images Prediction from a Single RGB Image
Dataset : NYU Depth V2
For this project:
• Office 1-2 dataset (part of the whole dataset).
• 15 GB after processing RAW data.
• 3522 RGB-D images.
Split the data:
3522
20%
80% 2817
705
2414
403
Training
Validation
Test
Depth Images Prediction from a Single RGB Image
Dataset : NYU Depth V2
Samples of the data:
Depth Images Prediction from a Single RGB Image
The Model for Depth Estimation:
Model proposed by JaN IVANECK in his master degree thesis -2016.
Depth Images Prediction from a Single RGB Image
The Model for Depth Estimation:
Model proposed by JaN IVANECK in his master degree thesis -2016.
He derived his model from Eigen et al.
Predicting Depth, Surface
Normals and Semantic Labels
with a Common Multi-Scale
Convolutional Architecture.
17 Dec 2015
Depth Images Prediction from a Single RGB Image
The Model for Depth Estimation:
Global context network
estimates the rough
depth map of the whole
scene from the input
RGB image.
Depth Images Prediction from a Single RGB Image
The Model for Depth Estimation:
Gradient network
estimates horizontal and
vertical gradients of the
depth map globally, for
the whole RGB image.
Depth Images Prediction from a Single RGB Image
The Model for Depth Estimation:
Refining network
improves the rough
estimate from the global
context network, utilizing
gradients estimated by the
gradient network and an
input RGB image.
Depth Images Prediction from a Single RGB Image
The Model for Depth Estimation:
Global context network
Architecture of the global context
network
The model is derived from AlexNet.
Depth Images Prediction from a Single RGB Image
Loss Function:
Root mean squared error log(rms-log)
Depth Images Prediction from a Single RGB Image
Training The Network:
1- Scale the output images to [0 1].
2-Subtraction 127 from input images to center the data (kind of normalization).
3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer
Learning).
4-Training the network using batches (batch size = 32) for 35 Epochs.
5- Save the session and model in the end of each Epoch.
Depth Images Prediction from a Single RGB Image
Training The Network:
1- Scale the label images to [0 1].
2-Subtraction 127 from input images to center the data (kind of normalization).
3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer
Learning).
4-Training the network using batches (batch size = 32) for 35 Epochs.
5- Save the session and model in the end of each Epoch.
Depth Images Prediction from a Single RGB Image
Training The Network:
1- Scale the label images to [0 1].
2-Subtraction 127 from input images to center the data (kind of normalization).
3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer
Learning).
4-Training the network using batches (batch size = 32) for 35 Epochs.
5- Save the session and model in the end of each Epoch.
Depth Images Prediction from a Single RGB Image
Training The Network:
1- Scale the label images to [0 1].
2-Subtraction 127 from input images to center the data (kind of normalization).
3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer
Learning).
4-Training the network using batches (batch size = 32) for 35 Epochs.
5- Save the session and model in the end of each Epoch.
Depth Images Prediction from a Single RGB Image
Training The Network:
1- Scale the label images to [0 1].
2-Subtraction 127 from input images to center the data (kind of normalization).
3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer
Learning).
4-Training the network using batches (batch size = 32) for 35 Epochs.
5- Save the session and model in the end of each Epoch.
Depth Images Prediction from a Single RGB Image
Project Functions :
1- split_data : to split and save the data into training/testing/val.npy files.
2- load_data : load data from .npy files.
3- plot_imgs: to plot pair of images.
4- get_next_batch: to get the next batch from training data.
5- loss : calculate the loss function.
6- model: to create model (network structure).
Depth Images Prediction from a Single RGB Image
Project Functions :
7- train: to start training .
8- evaluate: to evaluate new data after restoring the model..
Depth Images Prediction from a Single RGB Image
Project Tools and Libraries:
1- Tensorflow.
2- Slim : lightweight library for defining, training and evaluating complex
models in TensorFlow.
3- Tensorboard.
4- numpy.
5-matplotlib.
Depth Images Prediction from a Single RGB Image
Project Results: 
Training Loss error:
Depth Images Prediction from a Single RGB Image
Project Results: 
Samples of new data:
Depth Images Prediction from a Single RGB Image
Project Results: 
Explanation :
• Training data is not sufficient.
Depth Images Prediction from a Single RGB Image
Project Results: 
Explanation :
• Training data is not sufficient.
In Jan’s experiment:
• Full NYU dataset and 3 dataset generated from the original one.
• Network was trained for 100,000 iterations.
Depth Images Prediction from a Single RGB Image
Project Results: 
Explanation :
• Training data is not sufficient.
In Jan’s experiment:
• Full NYU dataset and 3 dataset generated from the original one.
• Network was trained for 100,000 iterations.
This experiment:
• It took ~26 hours for 30 Epochs.
Depth Images Prediction from a Single RGB Image
Project :
The project code and data will be available on GitHub:
https://github.com/SubhiH/Depth-Estimation-Deep-Learning
Depth Images Prediction from a Single RGB Image
Resources :
-https://arxiv.org/pdf/1607.00730.pdf
-http://janivanecky.com/
-http://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html
Thank You

Depth estimation using deep learning

  • 1.
    Depth Images Predictionfrom a Single RGB Image Using Deep learning Deep Learning May 2017 Soubhi Hadri
  • 2.
    Depth Images Predictionfrom a Single RGB Image Table of Contents : Introduction.1 Existing Solutions.2 Dataset and Model.3 Project Code and Results.1
  • 3.
  • 4.
    Depth Images Predictionfrom a Single RGB Image Introduction -In 3D computer graphics a depth map is an image or image channel that contains information relating to the distance of the surfaces of scene objects from a viewpoint. -RGB-D image : a RGB image and its corresponding depth image -A depth image is an image channel in which each pixel relates to a distance between the image plane and the corresponding object in the RGB image.
  • 5.
    Depth Images Predictionfrom a Single RGB Image Introduction To approximate the depth of objects : • Stereo camera : camera with two/more lenses to simulate human vision. • Realsense or Kinect to get RGB-D images • Deep Learning..!!
  • 6.
  • 7.
    Depth Images Predictionfrom a Single RGB Image Deep Learning for depth estimation : Recently, there are many works to estimate the depth map for RGB image.
  • 8.
    Depth Images Predictionfrom a Single RGB Image Deep Learning for depth estimation : Learning Fine-Scaled Depth Maps from Single RGB Images. 7 Feb 2017 Recently, there are many works to estimate the depth map for RGB image.
  • 9.
  • 10.
    Depth Images Predictionfrom a Single RGB Image Dataset : NYU Depth V2 The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect.
  • 11.
    Depth Images Predictionfrom a Single RGB Image Dataset : NYU Depth V2 The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect.
  • 12.
    Depth Images Predictionfrom a Single RGB Image Dataset : NYU Depth V2 The dataset consists of : • 1449 labeled pairs of aligned RGB and depth images (2.8 GB). • 407,024 new unlabeled frames - raw rgb, depth (428 GB). • Toolbox: Useful functions for manipulating the data and labels. Different parts of the dataset can be downloaded individually. Authors : Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus 2012
  • 13.
    Depth Images Predictionfrom a Single RGB Image Dataset : NYU Depth V2 The dataset consists of : • 1449 labeled pairs of aligned RGB and depth images (2.8 GB). • 407,024 new unlabeled frames - raw rgb, depth (428 GB). • Toolbox: Useful functions for manipulating the data and labels. Different parts of the dataset can be downloaded individually. Authors : Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus 2012
  • 14.
    Depth Images Predictionfrom a Single RGB Image Dataset : NYU Depth V2 For this project: • Office 1-2 dataset (part of the whole dataset). • 15 GB after processing RAW data. • 3522 RGB-D images.
  • 15.
    Depth Images Predictionfrom a Single RGB Image Dataset : NYU Depth V2 For this project: • Office 1-2 dataset (part of the whole dataset). • 15 GB after processing RAW data. • 3522 RGB-D images. Split the data: 3522 20% 80% 2817 705 2414 403 Training Validation Test
  • 16.
    Depth Images Predictionfrom a Single RGB Image Dataset : NYU Depth V2 Samples of the data:
  • 17.
    Depth Images Predictionfrom a Single RGB Image The Model for Depth Estimation: Model proposed by JaN IVANECK in his master degree thesis -2016.
  • 18.
    Depth Images Predictionfrom a Single RGB Image The Model for Depth Estimation: Model proposed by JaN IVANECK in his master degree thesis -2016. He derived his model from Eigen et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture. 17 Dec 2015
  • 19.
    Depth Images Predictionfrom a Single RGB Image The Model for Depth Estimation: Global context network estimates the rough depth map of the whole scene from the input RGB image.
  • 20.
    Depth Images Predictionfrom a Single RGB Image The Model for Depth Estimation: Gradient network estimates horizontal and vertical gradients of the depth map globally, for the whole RGB image.
  • 21.
    Depth Images Predictionfrom a Single RGB Image The Model for Depth Estimation: Refining network improves the rough estimate from the global context network, utilizing gradients estimated by the gradient network and an input RGB image.
  • 22.
    Depth Images Predictionfrom a Single RGB Image The Model for Depth Estimation: Global context network Architecture of the global context network The model is derived from AlexNet.
  • 23.
    Depth Images Predictionfrom a Single RGB Image Loss Function: Root mean squared error log(rms-log)
  • 24.
    Depth Images Predictionfrom a Single RGB Image Training The Network: 1- Scale the output images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
  • 25.
    Depth Images Predictionfrom a Single RGB Image Training The Network: 1- Scale the label images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
  • 26.
    Depth Images Predictionfrom a Single RGB Image Training The Network: 1- Scale the label images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
  • 27.
    Depth Images Predictionfrom a Single RGB Image Training The Network: 1- Scale the label images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
  • 28.
    Depth Images Predictionfrom a Single RGB Image Training The Network: 1- Scale the label images to [0 1]. 2-Subtraction 127 from input images to center the data (kind of normalization). 3-Initialize the convolution layers using AlexNet pre-trained CNN (Transfer Learning). 4-Training the network using batches (batch size = 32) for 35 Epochs. 5- Save the session and model in the end of each Epoch.
  • 29.
    Depth Images Predictionfrom a Single RGB Image Project Functions : 1- split_data : to split and save the data into training/testing/val.npy files. 2- load_data : load data from .npy files. 3- plot_imgs: to plot pair of images. 4- get_next_batch: to get the next batch from training data. 5- loss : calculate the loss function. 6- model: to create model (network structure).
  • 30.
    Depth Images Predictionfrom a Single RGB Image Project Functions : 7- train: to start training . 8- evaluate: to evaluate new data after restoring the model..
  • 31.
    Depth Images Predictionfrom a Single RGB Image Project Tools and Libraries: 1- Tensorflow. 2- Slim : lightweight library for defining, training and evaluating complex models in TensorFlow. 3- Tensorboard. 4- numpy. 5-matplotlib.
  • 32.
    Depth Images Predictionfrom a Single RGB Image Project Results:  Training Loss error:
  • 33.
    Depth Images Predictionfrom a Single RGB Image Project Results:  Samples of new data:
  • 34.
    Depth Images Predictionfrom a Single RGB Image Project Results:  Explanation : • Training data is not sufficient.
  • 35.
    Depth Images Predictionfrom a Single RGB Image Project Results:  Explanation : • Training data is not sufficient. In Jan’s experiment: • Full NYU dataset and 3 dataset generated from the original one. • Network was trained for 100,000 iterations.
  • 36.
    Depth Images Predictionfrom a Single RGB Image Project Results:  Explanation : • Training data is not sufficient. In Jan’s experiment: • Full NYU dataset and 3 dataset generated from the original one. • Network was trained for 100,000 iterations. This experiment: • It took ~26 hours for 30 Epochs.
  • 37.
    Depth Images Predictionfrom a Single RGB Image Project : The project code and data will be available on GitHub: https://github.com/SubhiH/Depth-Estimation-Deep-Learning
  • 38.
    Depth Images Predictionfrom a Single RGB Image Resources : -https://arxiv.org/pdf/1607.00730.pdf -http://janivanecky.com/ -http://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html
  • 39.