Monocular Depth Cues in
Computer Vision Applications
Diego Cheda
Thesis Advisors:
Dr. Daniel Ponsa
Dr. Antonio L´opez
December 14, 2012
We don’t need two eyes to perceive depth.
[Edgar Muller]
Motivation
Human depth cues
There are different sources of information supporting depth
perception.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 3/64
Motivation
Depth estimation from a single image
Prior information
Our world is structured In an abstract world
Gloconde Blank check
The listening room Personal values
Ren´e Magritte
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 4/64
Outline
1 Objectives
2 Coarse depth map estimation
3 Egomotion estimation
4 Background estimation
5 Pedestrian candidate generation
6 Conclusions and future work
Outline
1 Objectives
2 Coarse depth map estimation
3 Egomotion estimation
4 Background estimation
5 Pedestrian candidate generation
6 Conclusions and future work
Objectives
• Coarse depth map estimation
simple and low-cost
low-level features based on pictorial cues
• Increasing the performance of many applications
Egomotion estimation
Background estimation
Pedestrian candidates generation
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 7/64
Objectives
Segmenting an image into depth categories
• Near
Depth is usually estimated by using a stereo configuration.
• Very-far
The effect of camera translation at faraway distances is
inappreciable.
• Medium and Far
Interesting for potential applications.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 8/64
Outline
1 Objectives
2 Coarse depth map estimation
3 Egomotion estimation
4 Background estimation
5 Pedestrian candidate generation
6 Conclusions and future work
Coarse depth map estimation
Method
Pipeline of our approach
• Multiclass classification problem
• Supervised learning approach
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 10/64
Coarse depth map estimation
Method
Ground truth dataset
• Set of urban outdoor images
Saxena et al.: 400 images for training and 134 for testing.
• Each image has an associated depth map acquired by a laser
scanner.
Thresholding depth map to be used as ground truth.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 11/64
Coarse depth map estimation
Method
Regions
Superpixels Regular grid
Superpixels conserve
intra-region similarities.
× Time consuming.
× Regular grids merge
information of different regions.
Once for a camera
configuration.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 12/64
Coarse depth map estimation
Method
Features
• Monocular pictorial cues are predominant beyond 30 m to estimate
depth.
• Low-level visual features to represent texture, relative height,
atmospheric scattering.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 13/64
Coarse depth map estimation
Method
Features - Texture
Paris street, rainy day - Gustave Caillebotte
At a greater distance, texture
patterns get finer and appear
smoother
To capture textures we use
• Weibull distribution
• Gabor filters
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 14/64
Coarse depth map estimation
Method
Features - Texture: Weibull distribution
• Compact representation
β parameter γ parameter
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 15/64
Coarse depth map estimation
Method
Features - Texture: Weibull distribution
• Compact representation
β parameter γ parameter
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 15/64
Coarse depth map estimation
Method
Features - Texture: Weibull distribution
• Compact representation
β parameter γ parameter
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 15/64
Coarse depth map estimation
Method
Features - Texture: Weibull distribution
• Compact representation
β parameter γ parameter
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 15/64
Coarse depth map estimation
Method
Features - Texture: Gabor filter
Images
Gabor filter responses
• Capture smoothed and
textured regions
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 16/64
Coarse depth map estimation
Method
Features - Relative height
When an object is near the
horizon, it is perceived as distant.
To capture relative height we use
• Location: x and y coordinates
in the image
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 17/64
Coarse depth map estimation
Method
Features - Location
near medium far
Depth average over ground truth
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 18/64
Coarse depth map estimation
Method
Features - Atmospheric scattering
The Virgin and Child with St. Anne - Leonardo Da Vinci
The further away objects are
unclearer and less detailed with
respect to those which are closer.
To capture atmospheric
scattering we use
• RGB histogram
• HSV histogram
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 19/64
Coarse depth map estimation
Method
Learning approach
One-vs-All
• Binary classifiers
• Training one classifier per class (near, medium, far, and very-far)
• Low-performance due to number of positive examples for medium
and far regions.
Our approach
• Training three classifiers: > 30, > 50, > 70 m.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 20/64
Coarse depth map estimation
Method
Training
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 21/64
Coarse depth map estimation
Method
Testing
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 22/64
Coarse depth map estimation
Method
Testing
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 23/64
Coarse depth map estimation
Method
Testing
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 23/64
Coarse depth map estimation
Method
Testing
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 23/64
Coarse depth map estimation
Method
Inference
• CRF
• Combining probabilities obtained from classifiers
• Associating neighboring regions belonging to the same depth
category.
• Graph cut to guarantee a global maximum likelihood result.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 24/64
Coarse depth map estimation
Experimental results
Performance measurement
• Measure of performance: Jaccard index.
TP
(TP + FP + FN)
Measures the level of agreement with respect to an ideal
classification result.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 25/64
Coarse depth map estimation
Experimental results
Different regions grouping
Performance our method using different oversegmentation configurations.
Regular
Grid
10 x 10 15 x 15 20 x 20
Turbo
Pixels
∼200 regions ∼400 regions ∼800 regions
Algorithm
Number of regions
20x20 15x15 10x10
Superpixels 0.3623 0.3567 0.3561
Grid 0.3586 0.3602 0.3570
• Best performing
configuration is using
superpixels
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 26/64
Coarse depth map estimation
Experimental results
Comparison w.r.t. state-of-the-art
Saxena et al.
• A more challenging goal: photo-realistic 3D model
• For each superpixel and its neighbors: features for occlusions,
geometric, statistical and spatial information, textures, at multiple
spatial scales.
• Inferences methods with a high computational.
• MRF
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 27/64
Coarse depth map estimation
Experimental results
Comparison w.r.t. state-of-the-art
Using a remarkable inferior number of low-level features (64
vs 646 respectively).
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 28/64
Coarse depth map estimation
Experimental results
Relevance of visual features
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 29/64
Coarse depth map estimation
Experimental results
Image Laser Depth Map Saxena et al. Our
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 30/64
Coarse depth map estimation
Experimental results
Image Laser Depth Map Saxena et al. Our
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 30/64
Coarse depth map estimation
Conclusions
We have presented
• A supervised learning approach to segment an image
according to certain depth categories.
• Our algorithm use a reduced number of low-level visual
features, which are based on monocular pictorial cues.
Our results show
• Monocular cues are useful for depth estimation.
• Close and distant regions are well-segmented by our approach.
• Regions at medium distances are more difficult to segment.
• In average, our method outperforms Saxena et al. method.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 31/64
Outline
1 Objectives
2 Coarse depth map estimation
3 Egomotion estimation
4 Background estimation
5 Pedestrian candidate generation
6 Conclusions and future work
Egomotion estimation
Motivation
Egomotion estimation
Estimating the vehicle position is a key component in many ADAS
systems
Autonomous navigation
Adaptive cruise control
Lane change assistance
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 33/64
Egomotion estimation
Problem definition
Egomotion problem
Determining the changes in the 3D rigid camera position and
orientation.
• Camera motion is described as a 3D rigid motion:
pt = Rtp0 + tt
• Six degrees of freedom (DOF).
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 34/64
Egomotion estimation
Problem definition
Egomotion problem
Determining the changes in the 3D rigid camera position and
orientation.
• Camera motion is described as a 3D rigid motion:
pt = Rtp0 + tt
• Six degrees of freedom (DOF).
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 34/64
Egomotion estimation
Problem definition
Egomotion problem
Determining the changes in the 3D rigid camera position and
orientation.
• Camera motion is described as a 3D rigid motion:
pt = Rtp0 + tt
• Six degrees of freedom (DOF).
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 34/64
Egomotion estimation
Problem definition
Egomotion problem
Determining the changes in the 3D rigid camera position and
orientation.
• Camera motion is described as a 3D rigid motion:
pt = Rtp0 + tt
• Six degrees of freedom (DOF).
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 34/64
Egomotion estimation
Goal
Distant regions behave as a plane at infinity
Properties
• It remains in the same image coordinates during translation
• It is only affected by camera rotation
Goal
• Identify distant regions in the image to estimate vehicle rotation
uncoupledly from vehicle translation.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 35/64
Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
Egomotion estimation
Algorithm overview
Egomotion estimation based on distant points / regions
× Distant points are hard to be tracked since they are located at
low-textured regions.
Distant region algorithm does a maximal use of distant information.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
Egomotion estimation
Experimental results
Datasets
• Karlsruhe dataset: 8 sequences
• More than 8000 (∼ 3 km).
• GT: INS Sensor.
• Stereo depth maps
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 37/64
Egomotion estimation
Experimental results
Evaluation of our distant regions segmentation
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 38/64
Egomotion estimation
Experimental results
Comparison with other approaches
• The five-point algorithm (5pts) by Nister.
• The Burschka et al. method (RANSAC).
• The stereo-based algorithm by Kitt et al. (as a baseline).
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 39/64
Egomotion estimation
Experimental results
Rotation estimation performance Trajectory estimation performance
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 40/64
Egomotion estimation
Experimental results
Yaw angle comparison
GT (INS Sensor) DR DP
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 41/64
Egomotion estimation
Experimental results
Trajectory results
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 42/64
Egomotion estimation
Conclusions
In this section, we have
• Proposed two novel monocular egomotion methods based on
tracking distant points and distant regions.
Our results show
• Rotations are accurately estimated, since distant regions
provide strong indicators of camera rotation.
• In comparison with other state-of-the-art methods, our
approach outperforms them.
• Comparable performance with respect to the considered stereo
algorithm.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 43/64
Outline
1 Objectives
2 Coarse depth map estimation
3 Egomotion estimation
4 Background estimation
5 Pedestrian candidate generation
6 Conclusions and future work
Background estimation
Problem definition
Background estimation
Automatically remove transient and moving objects from a set of
images with the aim of obtaining an occlusion-free background
image of the scene.
Background model
• Represents objects whose distance to the camera is maximal.
• Background objects are stationary.
Goal
• Identify close regions to penalize deviations from our background
model.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 45/64
Background estimation
Experimental results
Example of labeling
Original images
Close/distant regions
Labeling Our result
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 46/64
Background estimation
Method
Energy function
E(f ) = p∈P Dp(fp)
Data term
+ p,q∈N Vp,q(fp, fq)
Smoothness term
Data term Penalizes deviation from our background model taking
into account color, motion and depth.
Dp(fp) = αDS
p (fp) + βDM
p (fp) + γDP
p
• Color variations between sort time intervals
• Moving objects by using motion boundaries
• Close objects using our approach
Smoothness term Penalizes the intensity differences between
neighboring regions, giving a higher cost when images do not
match well.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 47/64
Background estimation
Experimental results
Datasets
Towers City Train Market
#frames: 11 #frames: 7 #frames: 3 #frames: 8
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 48/64
Background estimation
Experimental results
Agarwala et al.
• State-of-the-art method.
• Require user intervention to refine results.
• Refined results used as ground truth.
Norm of absolute difference in RGB channels
Sequences
Towers City Train Market
0.0551 0.0804 0.0479 0.0603
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 49/64
Background estimation
Experimental results
Independent moving object
Original images
Our method Agarwala et al.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 50/64
Background estimation
Conclusions
In this section,
• We have presented a method to background estimation
containing moving/transient objects.
• This method uses depth information for such purpose by
penalizing close regions in a cost function.
Our results show that
• Our method significantly outperforms the median filter.
• Our approach is comparable to Agarwala et al. method,
without performing any user intervention.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 51/64
Outline
1 Objectives
2 Coarse depth map estimation
3 Egomotion estimation
4 Background estimation
5 Pedestrian candidate generation
6 Conclusions and future work
Pedestrian candidate generation
Problem definition
Pedestrian candidate generation Generating hypothesis to be
evaluated by a pedestrian classifier.
[Ger´onimo 2010]
Goal
Exploiting geometric and depth information available on single images
to reduce the number of windows to be further processed.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 53/64
Pedestrian candidate generation
Problem definition
Pedestrian candidate generation Generating hypothesis to be
evaluated by a pedestrian classifier.
[Ger´onimo 2010]
Goal
Exploiting geometric and depth information available on single images
to reduce the number of windows to be further processed.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 53/64
Pedestrian candidate generation
Method
Overview
a) Original Image
d) Pedestrian Candidate Windows
b) Geometric Information
c) Depth Information
Fusion
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 54/64
Pedestrian candidate generation
Method
Agglomerative clustering schema
• Regions over ground surface
• Agglomerating regions maintaining size coherence w.r.t. depth
Original
Image
Geometric and Depth
Information
Superpixels
(a) Geometric, Depth, and Spatial Information
(b) Superpixels are merged
Gravity
Depth
Size
Hierarchical clustering
(c) Bounding boxes surrounding regions
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 55/64
Pedestrian candidate generation
Experimental results
Dataset
• CVC Pedestrian dataset.
• 15 sequences taken from a stereo-rig rigidly mounted in a car
while it is driving on an urban scenario (4364 frames).
• 7983 manual annotated pedestrians visible at less than 50
meters.
Performance measures
• Number of pedestrian candidates generated.
• True Positive Rate TPR =
TP
TP + FN
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 56/64
Pedestrian candidate generation
Experimental results
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 57/64
Pedestrian candidate generation
Experimental results
Lost pedestrians
4 %
1 8 %
7 8 %
0 - 1 0 1 0 - 2 5 > 2 5
0
3 0 0
6 0 0
9 0 0
1 2 0 0
LostPedestrians
D i s t a n c e ( m )
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 58/64
Pedestrian candidate generation
Conclusions
In this section, we have presented:
• Novel monocular method for generating pedestrian candidates.
• It is based on geometric relationships and depth.
Our results show that:
• Our method overcome all considered methods because
significantly reduces the number of candidates.
• High value for TPR.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 59/64
Outline
1 Objectives
2 Coarse depth map estimation
3 Egomotion estimation
4 Background estimation
5 Pedestrian candidate generation
6 Conclusions and future work
Conclusions and future work
Conclusions
• We have proposed a supervised learning approach to classify the
pixels of outdoor images in just four categories: near,
medium-distance, far and very-far, based on monocular pictorial
cues.
• In comparison against the results of a most complex depth map
estimation method, our method overcomes the performance of it,
using low computational demanding techniques.
• We have demonstrated the usefulness of our coarse depth maps in
improving the results of egomotion estimation, background
estimation, and pedestrian candidates generation. In each
application, we have contributed with novel methods from a
different perspective based on the use of coarse depth.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 61/64
Conclusions and future work
Future work
• Extend our approach to consider more monocular depth cues like
occlusions, relative and familiar size, that could improve our coarse
estimation.
• Explore other possible applications of depth information (tracking,
for initializing 3D reconstruction algorithms, learning pedestrians
classifiers according with depth, etc).
• Integrate our depth estimation method in different ADAS modules.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 62/64
Conclusions and future work
Publications
This thesis take as bases the following publications:
Conference papers
• Camera Egomotion Estimation in the ADAS Context, D. Cheda, D. Ponsa and
A. M. L´opez, IEEE Conf. Intell. Transp. Syst., 2010.
• Monocular Egomotion Estimation based on Image Matching, D. Cheda, D.
Ponsa and A. M. L´opez, Int. Conf. Pattern Recognit. Appl. and Methods, 2012.
• Monocular Depth-based Background Estimation, D. Cheda, D. Ponsa and A. M.
L´opez, Int. Conf. Comput. Vision Theory Appl., 2012.
• Pedestrian Candidates Generation using Monocular Cues, D. Cheda, D. Ponsa
and A. M. L´opez, IEEE Intell. Vehicles Symposium, 2012.
Journal papers under reviewing
• Monocular Multilayer Depth Segmentation and Applications, D. Cheda, D.
Ponsa and A. M. L´opez, submitted to IJCV, Springer.
• Monocular Visual Odometry Boosted by Monocular Depth Cues, D. Cheda, D.
Ponsa and A. M. L´opez, submitted to ITS, IEEE.
Diego Cheda — Monocular Depth Cues in Computer Vision Applications 63/64
Thanks!

PhD_ppt_2012

  • 1.
    Monocular Depth Cuesin Computer Vision Applications Diego Cheda Thesis Advisors: Dr. Daniel Ponsa Dr. Antonio L´opez December 14, 2012
  • 2.
    We don’t needtwo eyes to perceive depth. [Edgar Muller]
  • 3.
    Motivation Human depth cues Thereare different sources of information supporting depth perception. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 3/64
  • 4.
    Motivation Depth estimation froma single image Prior information Our world is structured In an abstract world Gloconde Blank check The listening room Personal values Ren´e Magritte Diego Cheda — Monocular Depth Cues in Computer Vision Applications 4/64
  • 5.
    Outline 1 Objectives 2 Coarsedepth map estimation 3 Egomotion estimation 4 Background estimation 5 Pedestrian candidate generation 6 Conclusions and future work
  • 6.
    Outline 1 Objectives 2 Coarsedepth map estimation 3 Egomotion estimation 4 Background estimation 5 Pedestrian candidate generation 6 Conclusions and future work
  • 7.
    Objectives • Coarse depthmap estimation simple and low-cost low-level features based on pictorial cues • Increasing the performance of many applications Egomotion estimation Background estimation Pedestrian candidates generation Diego Cheda — Monocular Depth Cues in Computer Vision Applications 7/64
  • 8.
    Objectives Segmenting an imageinto depth categories • Near Depth is usually estimated by using a stereo configuration. • Very-far The effect of camera translation at faraway distances is inappreciable. • Medium and Far Interesting for potential applications. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 8/64
  • 9.
    Outline 1 Objectives 2 Coarsedepth map estimation 3 Egomotion estimation 4 Background estimation 5 Pedestrian candidate generation 6 Conclusions and future work
  • 10.
    Coarse depth mapestimation Method Pipeline of our approach • Multiclass classification problem • Supervised learning approach Diego Cheda — Monocular Depth Cues in Computer Vision Applications 10/64
  • 11.
    Coarse depth mapestimation Method Ground truth dataset • Set of urban outdoor images Saxena et al.: 400 images for training and 134 for testing. • Each image has an associated depth map acquired by a laser scanner. Thresholding depth map to be used as ground truth. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 11/64
  • 12.
    Coarse depth mapestimation Method Regions Superpixels Regular grid Superpixels conserve intra-region similarities. × Time consuming. × Regular grids merge information of different regions. Once for a camera configuration. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 12/64
  • 13.
    Coarse depth mapestimation Method Features • Monocular pictorial cues are predominant beyond 30 m to estimate depth. • Low-level visual features to represent texture, relative height, atmospheric scattering. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 13/64
  • 14.
    Coarse depth mapestimation Method Features - Texture Paris street, rainy day - Gustave Caillebotte At a greater distance, texture patterns get finer and appear smoother To capture textures we use • Weibull distribution • Gabor filters Diego Cheda — Monocular Depth Cues in Computer Vision Applications 14/64
  • 15.
    Coarse depth mapestimation Method Features - Texture: Weibull distribution • Compact representation β parameter γ parameter Diego Cheda — Monocular Depth Cues in Computer Vision Applications 15/64
  • 16.
    Coarse depth mapestimation Method Features - Texture: Weibull distribution • Compact representation β parameter γ parameter Diego Cheda — Monocular Depth Cues in Computer Vision Applications 15/64
  • 17.
    Coarse depth mapestimation Method Features - Texture: Weibull distribution • Compact representation β parameter γ parameter Diego Cheda — Monocular Depth Cues in Computer Vision Applications 15/64
  • 18.
    Coarse depth mapestimation Method Features - Texture: Weibull distribution • Compact representation β parameter γ parameter Diego Cheda — Monocular Depth Cues in Computer Vision Applications 15/64
  • 19.
    Coarse depth mapestimation Method Features - Texture: Gabor filter Images Gabor filter responses • Capture smoothed and textured regions Diego Cheda — Monocular Depth Cues in Computer Vision Applications 16/64
  • 20.
    Coarse depth mapestimation Method Features - Relative height When an object is near the horizon, it is perceived as distant. To capture relative height we use • Location: x and y coordinates in the image Diego Cheda — Monocular Depth Cues in Computer Vision Applications 17/64
  • 21.
    Coarse depth mapestimation Method Features - Location near medium far Depth average over ground truth Diego Cheda — Monocular Depth Cues in Computer Vision Applications 18/64
  • 22.
    Coarse depth mapestimation Method Features - Atmospheric scattering The Virgin and Child with St. Anne - Leonardo Da Vinci The further away objects are unclearer and less detailed with respect to those which are closer. To capture atmospheric scattering we use • RGB histogram • HSV histogram Diego Cheda — Monocular Depth Cues in Computer Vision Applications 19/64
  • 23.
    Coarse depth mapestimation Method Learning approach One-vs-All • Binary classifiers • Training one classifier per class (near, medium, far, and very-far) • Low-performance due to number of positive examples for medium and far regions. Our approach • Training three classifiers: > 30, > 50, > 70 m. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 20/64
  • 24.
    Coarse depth mapestimation Method Training Diego Cheda — Monocular Depth Cues in Computer Vision Applications 21/64
  • 25.
    Coarse depth mapestimation Method Testing Diego Cheda — Monocular Depth Cues in Computer Vision Applications 22/64
  • 26.
    Coarse depth mapestimation Method Testing Diego Cheda — Monocular Depth Cues in Computer Vision Applications 23/64
  • 27.
    Coarse depth mapestimation Method Testing Diego Cheda — Monocular Depth Cues in Computer Vision Applications 23/64
  • 28.
    Coarse depth mapestimation Method Testing Diego Cheda — Monocular Depth Cues in Computer Vision Applications 23/64
  • 29.
    Coarse depth mapestimation Method Inference • CRF • Combining probabilities obtained from classifiers • Associating neighboring regions belonging to the same depth category. • Graph cut to guarantee a global maximum likelihood result. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 24/64
  • 30.
    Coarse depth mapestimation Experimental results Performance measurement • Measure of performance: Jaccard index. TP (TP + FP + FN) Measures the level of agreement with respect to an ideal classification result. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 25/64
  • 31.
    Coarse depth mapestimation Experimental results Different regions grouping Performance our method using different oversegmentation configurations. Regular Grid 10 x 10 15 x 15 20 x 20 Turbo Pixels ∼200 regions ∼400 regions ∼800 regions Algorithm Number of regions 20x20 15x15 10x10 Superpixels 0.3623 0.3567 0.3561 Grid 0.3586 0.3602 0.3570 • Best performing configuration is using superpixels Diego Cheda — Monocular Depth Cues in Computer Vision Applications 26/64
  • 32.
    Coarse depth mapestimation Experimental results Comparison w.r.t. state-of-the-art Saxena et al. • A more challenging goal: photo-realistic 3D model • For each superpixel and its neighbors: features for occlusions, geometric, statistical and spatial information, textures, at multiple spatial scales. • Inferences methods with a high computational. • MRF Diego Cheda — Monocular Depth Cues in Computer Vision Applications 27/64
  • 33.
    Coarse depth mapestimation Experimental results Comparison w.r.t. state-of-the-art Using a remarkable inferior number of low-level features (64 vs 646 respectively). Diego Cheda — Monocular Depth Cues in Computer Vision Applications 28/64
  • 34.
    Coarse depth mapestimation Experimental results Relevance of visual features Diego Cheda — Monocular Depth Cues in Computer Vision Applications 29/64
  • 35.
    Coarse depth mapestimation Experimental results Image Laser Depth Map Saxena et al. Our Diego Cheda — Monocular Depth Cues in Computer Vision Applications 30/64
  • 36.
    Coarse depth mapestimation Experimental results Image Laser Depth Map Saxena et al. Our Diego Cheda — Monocular Depth Cues in Computer Vision Applications 30/64
  • 37.
    Coarse depth mapestimation Conclusions We have presented • A supervised learning approach to segment an image according to certain depth categories. • Our algorithm use a reduced number of low-level visual features, which are based on monocular pictorial cues. Our results show • Monocular cues are useful for depth estimation. • Close and distant regions are well-segmented by our approach. • Regions at medium distances are more difficult to segment. • In average, our method outperforms Saxena et al. method. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 31/64
  • 38.
    Outline 1 Objectives 2 Coarsedepth map estimation 3 Egomotion estimation 4 Background estimation 5 Pedestrian candidate generation 6 Conclusions and future work
  • 39.
    Egomotion estimation Motivation Egomotion estimation Estimatingthe vehicle position is a key component in many ADAS systems Autonomous navigation Adaptive cruise control Lane change assistance Diego Cheda — Monocular Depth Cues in Computer Vision Applications 33/64
  • 40.
    Egomotion estimation Problem definition Egomotionproblem Determining the changes in the 3D rigid camera position and orientation. • Camera motion is described as a 3D rigid motion: pt = Rtp0 + tt • Six degrees of freedom (DOF). Diego Cheda — Monocular Depth Cues in Computer Vision Applications 34/64
  • 41.
    Egomotion estimation Problem definition Egomotionproblem Determining the changes in the 3D rigid camera position and orientation. • Camera motion is described as a 3D rigid motion: pt = Rtp0 + tt • Six degrees of freedom (DOF). Diego Cheda — Monocular Depth Cues in Computer Vision Applications 34/64
  • 42.
    Egomotion estimation Problem definition Egomotionproblem Determining the changes in the 3D rigid camera position and orientation. • Camera motion is described as a 3D rigid motion: pt = Rtp0 + tt • Six degrees of freedom (DOF). Diego Cheda — Monocular Depth Cues in Computer Vision Applications 34/64
  • 43.
    Egomotion estimation Problem definition Egomotionproblem Determining the changes in the 3D rigid camera position and orientation. • Camera motion is described as a 3D rigid motion: pt = Rtp0 + tt • Six degrees of freedom (DOF). Diego Cheda — Monocular Depth Cues in Computer Vision Applications 34/64
  • 44.
    Egomotion estimation Goal Distant regionsbehave as a plane at infinity Properties • It remains in the same image coordinates during translation • It is only affected by camera rotation Goal • Identify distant regions in the image to estimate vehicle rotation uncoupledly from vehicle translation. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 35/64
  • 45.
    Egomotion estimation Algorithm overview Egomotionestimation based on distant points / regions × Distant points are hard to be tracked since they are located at low-textured regions. Distant region algorithm does a maximal use of distant information. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
  • 46.
    Egomotion estimation Algorithm overview Egomotionestimation based on distant points / regions × Distant points are hard to be tracked since they are located at low-textured regions. Distant region algorithm does a maximal use of distant information. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
  • 47.
    Egomotion estimation Algorithm overview Egomotionestimation based on distant points / regions × Distant points are hard to be tracked since they are located at low-textured regions. Distant region algorithm does a maximal use of distant information. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
  • 48.
    Egomotion estimation Algorithm overview Egomotionestimation based on distant points / regions × Distant points are hard to be tracked since they are located at low-textured regions. Distant region algorithm does a maximal use of distant information. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
  • 49.
    Egomotion estimation Algorithm overview Egomotionestimation based on distant points / regions × Distant points are hard to be tracked since they are located at low-textured regions. Distant region algorithm does a maximal use of distant information. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
  • 50.
    Egomotion estimation Algorithm overview Egomotionestimation based on distant points / regions × Distant points are hard to be tracked since they are located at low-textured regions. Distant region algorithm does a maximal use of distant information. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
  • 51.
    Egomotion estimation Algorithm overview Egomotionestimation based on distant points / regions × Distant points are hard to be tracked since they are located at low-textured regions. Distant region algorithm does a maximal use of distant information. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
  • 52.
    Egomotion estimation Algorithm overview Egomotionestimation based on distant points / regions × Distant points are hard to be tracked since they are located at low-textured regions. Distant region algorithm does a maximal use of distant information. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 36/64
  • 53.
    Egomotion estimation Experimental results Datasets •Karlsruhe dataset: 8 sequences • More than 8000 (∼ 3 km). • GT: INS Sensor. • Stereo depth maps Diego Cheda — Monocular Depth Cues in Computer Vision Applications 37/64
  • 54.
    Egomotion estimation Experimental results Evaluationof our distant regions segmentation Diego Cheda — Monocular Depth Cues in Computer Vision Applications 38/64
  • 55.
    Egomotion estimation Experimental results Comparisonwith other approaches • The five-point algorithm (5pts) by Nister. • The Burschka et al. method (RANSAC). • The stereo-based algorithm by Kitt et al. (as a baseline). Diego Cheda — Monocular Depth Cues in Computer Vision Applications 39/64
  • 56.
    Egomotion estimation Experimental results Rotationestimation performance Trajectory estimation performance Diego Cheda — Monocular Depth Cues in Computer Vision Applications 40/64
  • 57.
    Egomotion estimation Experimental results Yawangle comparison GT (INS Sensor) DR DP Diego Cheda — Monocular Depth Cues in Computer Vision Applications 41/64
  • 58.
    Egomotion estimation Experimental results Trajectoryresults Diego Cheda — Monocular Depth Cues in Computer Vision Applications 42/64
  • 59.
    Egomotion estimation Conclusions In thissection, we have • Proposed two novel monocular egomotion methods based on tracking distant points and distant regions. Our results show • Rotations are accurately estimated, since distant regions provide strong indicators of camera rotation. • In comparison with other state-of-the-art methods, our approach outperforms them. • Comparable performance with respect to the considered stereo algorithm. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 43/64
  • 60.
    Outline 1 Objectives 2 Coarsedepth map estimation 3 Egomotion estimation 4 Background estimation 5 Pedestrian candidate generation 6 Conclusions and future work
  • 61.
    Background estimation Problem definition Backgroundestimation Automatically remove transient and moving objects from a set of images with the aim of obtaining an occlusion-free background image of the scene. Background model • Represents objects whose distance to the camera is maximal. • Background objects are stationary. Goal • Identify close regions to penalize deviations from our background model. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 45/64
  • 62.
    Background estimation Experimental results Exampleof labeling Original images Close/distant regions Labeling Our result Diego Cheda — Monocular Depth Cues in Computer Vision Applications 46/64
  • 63.
    Background estimation Method Energy function E(f) = p∈P Dp(fp) Data term + p,q∈N Vp,q(fp, fq) Smoothness term Data term Penalizes deviation from our background model taking into account color, motion and depth. Dp(fp) = αDS p (fp) + βDM p (fp) + γDP p • Color variations between sort time intervals • Moving objects by using motion boundaries • Close objects using our approach Smoothness term Penalizes the intensity differences between neighboring regions, giving a higher cost when images do not match well. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 47/64
  • 64.
    Background estimation Experimental results Datasets TowersCity Train Market #frames: 11 #frames: 7 #frames: 3 #frames: 8 Diego Cheda — Monocular Depth Cues in Computer Vision Applications 48/64
  • 65.
    Background estimation Experimental results Agarwalaet al. • State-of-the-art method. • Require user intervention to refine results. • Refined results used as ground truth. Norm of absolute difference in RGB channels Sequences Towers City Train Market 0.0551 0.0804 0.0479 0.0603 Diego Cheda — Monocular Depth Cues in Computer Vision Applications 49/64
  • 66.
    Background estimation Experimental results Independentmoving object Original images Our method Agarwala et al. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 50/64
  • 67.
    Background estimation Conclusions In thissection, • We have presented a method to background estimation containing moving/transient objects. • This method uses depth information for such purpose by penalizing close regions in a cost function. Our results show that • Our method significantly outperforms the median filter. • Our approach is comparable to Agarwala et al. method, without performing any user intervention. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 51/64
  • 68.
    Outline 1 Objectives 2 Coarsedepth map estimation 3 Egomotion estimation 4 Background estimation 5 Pedestrian candidate generation 6 Conclusions and future work
  • 69.
    Pedestrian candidate generation Problemdefinition Pedestrian candidate generation Generating hypothesis to be evaluated by a pedestrian classifier. [Ger´onimo 2010] Goal Exploiting geometric and depth information available on single images to reduce the number of windows to be further processed. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 53/64
  • 70.
    Pedestrian candidate generation Problemdefinition Pedestrian candidate generation Generating hypothesis to be evaluated by a pedestrian classifier. [Ger´onimo 2010] Goal Exploiting geometric and depth information available on single images to reduce the number of windows to be further processed. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 53/64
  • 71.
    Pedestrian candidate generation Method Overview a)Original Image d) Pedestrian Candidate Windows b) Geometric Information c) Depth Information Fusion Diego Cheda — Monocular Depth Cues in Computer Vision Applications 54/64
  • 72.
    Pedestrian candidate generation Method Agglomerativeclustering schema • Regions over ground surface • Agglomerating regions maintaining size coherence w.r.t. depth Original Image Geometric and Depth Information Superpixels (a) Geometric, Depth, and Spatial Information (b) Superpixels are merged Gravity Depth Size Hierarchical clustering (c) Bounding boxes surrounding regions Diego Cheda — Monocular Depth Cues in Computer Vision Applications 55/64
  • 73.
    Pedestrian candidate generation Experimentalresults Dataset • CVC Pedestrian dataset. • 15 sequences taken from a stereo-rig rigidly mounted in a car while it is driving on an urban scenario (4364 frames). • 7983 manual annotated pedestrians visible at less than 50 meters. Performance measures • Number of pedestrian candidates generated. • True Positive Rate TPR = TP TP + FN Diego Cheda — Monocular Depth Cues in Computer Vision Applications 56/64
  • 74.
    Pedestrian candidate generation Experimentalresults Diego Cheda — Monocular Depth Cues in Computer Vision Applications 57/64
  • 75.
    Pedestrian candidate generation Experimentalresults Lost pedestrians 4 % 1 8 % 7 8 % 0 - 1 0 1 0 - 2 5 > 2 5 0 3 0 0 6 0 0 9 0 0 1 2 0 0 LostPedestrians D i s t a n c e ( m ) Diego Cheda — Monocular Depth Cues in Computer Vision Applications 58/64
  • 76.
    Pedestrian candidate generation Conclusions Inthis section, we have presented: • Novel monocular method for generating pedestrian candidates. • It is based on geometric relationships and depth. Our results show that: • Our method overcome all considered methods because significantly reduces the number of candidates. • High value for TPR. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 59/64
  • 77.
    Outline 1 Objectives 2 Coarsedepth map estimation 3 Egomotion estimation 4 Background estimation 5 Pedestrian candidate generation 6 Conclusions and future work
  • 78.
    Conclusions and futurework Conclusions • We have proposed a supervised learning approach to classify the pixels of outdoor images in just four categories: near, medium-distance, far and very-far, based on monocular pictorial cues. • In comparison against the results of a most complex depth map estimation method, our method overcomes the performance of it, using low computational demanding techniques. • We have demonstrated the usefulness of our coarse depth maps in improving the results of egomotion estimation, background estimation, and pedestrian candidates generation. In each application, we have contributed with novel methods from a different perspective based on the use of coarse depth. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 61/64
  • 79.
    Conclusions and futurework Future work • Extend our approach to consider more monocular depth cues like occlusions, relative and familiar size, that could improve our coarse estimation. • Explore other possible applications of depth information (tracking, for initializing 3D reconstruction algorithms, learning pedestrians classifiers according with depth, etc). • Integrate our depth estimation method in different ADAS modules. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 62/64
  • 80.
    Conclusions and futurework Publications This thesis take as bases the following publications: Conference papers • Camera Egomotion Estimation in the ADAS Context, D. Cheda, D. Ponsa and A. M. L´opez, IEEE Conf. Intell. Transp. Syst., 2010. • Monocular Egomotion Estimation based on Image Matching, D. Cheda, D. Ponsa and A. M. L´opez, Int. Conf. Pattern Recognit. Appl. and Methods, 2012. • Monocular Depth-based Background Estimation, D. Cheda, D. Ponsa and A. M. L´opez, Int. Conf. Comput. Vision Theory Appl., 2012. • Pedestrian Candidates Generation using Monocular Cues, D. Cheda, D. Ponsa and A. M. L´opez, IEEE Intell. Vehicles Symposium, 2012. Journal papers under reviewing • Monocular Multilayer Depth Segmentation and Applications, D. Cheda, D. Ponsa and A. M. L´opez, submitted to IJCV, Springer. • Monocular Visual Odometry Boosted by Monocular Depth Cues, D. Cheda, D. Ponsa and A. M. L´opez, submitted to ITS, IEEE. Diego Cheda — Monocular Depth Cues in Computer Vision Applications 63/64
  • 81.