Presentation of the paper "Depth estimation from Multi-View sources based
on full search and Total Variation regularization" at WCVIM09 (PSIVT09), Tokyo, Japan
1. Depth estimation from Multi-View sources based
on full search and Total Variation regularization
Carlos V´zquez
a Wa James Tam
Advanced Video Systems
Broadcasting Technologies
Communications Research Centre Canada (CRC)
International Workshop on Computer Vision and
Its Application to Image Media Processing
Tokyo, Japan
2. Outline
Outline
1 Introduction
2 Depth information for 3D-TV
3 Depth from Multi-View sources
Algorithm overview
Error volume generation
First depth approximation
Depth refining
4 Experimental results
Application: Multi-View image coding
5 Conclusions
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 2 / 24
3. Introduction
Outline
1 Introduction
2 Depth information for 3D-TV
3 Depth from Multi-View sources
Algorithm overview
Error volume generation
First depth approximation
Depth refining
4 Experimental results
Application: Multi-View image coding
5 Conclusions
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 3 / 24
4. Introduction
3D-TV: is on the way!!
Next step in television broadcasting
1 More content available in 3D:
◮ 3D cinema (IMAX, RealD)
◮ Live 3D (U2-3D, sport events)
◮ Video games (3D at home)
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 4 / 24
5. Introduction
3D-TV: is on the way!!
Next step in television broadcasting
1 More content available in 3D:
◮ 3D cinema (IMAX, RealD)
◮ Live 3D (U2-3D, sport events)
◮ Video games (3D at home)
2 Availability of 3D displays:
◮ Stereoscopic (with glasses)
◮ Auto-stereoscopic (no glasses)
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 4 / 24
6. Introduction
3D-TV: is on the way!!
Next step in television broadcasting
1 More content available in 3D:
◮ 3D cinema (IMAX, RealD)
◮ Live 3D (U2-3D, sport events)
◮ Video games (3D at home)
2 Availability of 3D displays:
◮ Stereoscopic (with glasses)
◮ Auto-stereoscopic (no glasses)
3 Ongoing work to develop coding standards:
◮ Stereo extension to MPEG
◮ Depth coding extension to MPEG
(2D+Depth)
◮ Multi-View coding standard (JMVM)
◮ 3D@Home consortium
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 4 / 24
7. Depth information for 3D-TV
Outline
1 Introduction
2 Depth information for 3D-TV
3 Depth from Multi-View sources
Algorithm overview
Error volume generation
First depth approximation
Depth refining
4 Experimental results
Application: Multi-View image coding
5 Conclusions
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 5 / 24
8. Depth information for 3D-TV
Depth information in 3D-TV broadcasting
An essential information
Large variety of viewers and viewing devices:
◮ Need to adjust the amount of depth perceived.
◮ Need to adjust the depth to the size of the display.
◮ Coding of multi-view or stereoscopic sources.
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 6 / 24
9. Depth information for 3D-TV
Depth information in 3D-TV broadcasting
An essential information
Large variety of viewers and viewing devices:
◮ Need to adjust the amount of depth perceived.
◮ Need to adjust the depth to the size of the display.
◮ Coding of multi-view or stereoscopic sources.
How to fulfill these requirements?
◮ Generation of new views from the ones available.
⋆ Depth-Image-Based rendering.
⋆ Intermediate View Reconstruction.
◮ Predictive coding of 3D sources.
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 6 / 24
10. Depth information for 3D-TV
Depth information in 3D-TV broadcasting
An essential information
Large variety of viewers and viewing devices:
◮ Need to adjust the amount of depth perceived.
◮ Need to adjust the depth to the size of the display.
◮ Coding of multi-view or stereoscopic sources.
How to fulfill these requirements?
◮ Generation of new views from the ones available.
⋆ Depth-Image-Based rendering.
⋆ Intermediate View Reconstruction.
◮ Predictive coding of 3D sources.
⇒ Knowledge of depth becomes essential for 3D-TV.
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 6 / 24
11. Depth information for 3D-TV
Depth information in 3D-TV broadcasting
Depth is embedded in Multi-View sources
P
Multi−View source
Z
Y X
z
P1 P2 PN
x1 x2 xN
f
2D D
Camera N
Camera 1
Camera 2
+
BN
Problem statement
Recover the depth information from a Multi-View source to be used in the
transmission, processing and coding of the Multi-View video content.
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 7 / 24
12. Depth from Multi-View sources
Outline
1 Introduction
2 Depth information for 3D-TV
3 Depth from Multi-View sources
Algorithm overview
Error volume generation
First depth approximation
Depth refining
4 Experimental results
Application: Multi-View image coding
5 Conclusions
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 8 / 24
13. Depth from Multi-View sources Algorithm overview
Depth estimation from Multi-View sources
Proposed algorithm overview
Depth estimation from Multi-View sources with TV regularization
Full scan of possible depth values and subsequent refining of depth with
Total-Variation regularization combined with edge correspondence and
visibility consistency
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 9 / 24
14. Depth from Multi-View sources Algorithm overview
Depth estimation from Multi-View sources
Proposed algorithm overview
Depth estimation from Multi-View sources with TV regularization
Full scan of possible depth values and subsequent refining of depth with
Total-Variation regularization combined with edge correspondence and
visibility consistency
1 Pre-processing of the Multi-View source
◮ Noise reduction: A general noise removing step is applied.
◮ Gradient computation: We add the gradient information ∇Io as two
new ’color’ channels to the color image.
◮ Edges extraction: Image edges are used in the depth estimation
process. Edge map ǫo = δc (Io ).
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 9 / 24
15. Depth from Multi-View sources Algorithm overview
Depth estimation from Multi-View sources
Proposed algorithm overview
Depth estimation from Multi-View sources with TV regularization
Full scan of possible depth values and subsequent refining of depth with
Total-Variation regularization combined with edge correspondence and
visibility consistency
1 Pre-processing of the Multi-View source
2 Error volume generation
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 9 / 24
16. Depth from Multi-View sources Algorithm overview
Depth estimation from Multi-View sources
Proposed algorithm overview
Depth estimation from Multi-View sources with TV regularization
Full scan of possible depth values and subsequent refining of depth with
Total-Variation regularization combined with edge correspondence and
visibility consistency
1 Pre-processing of the Multi-View source
2 Error volume generation
3 First depth approximation
◮ Median filter
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 9 / 24
17. Depth from Multi-View sources Algorithm overview
Depth estimation from Multi-View sources
Proposed algorithm overview
Depth estimation from Multi-View sources with TV regularization
Full scan of possible depth values and subsequent refining of depth with
Total-Variation regularization combined with edge correspondence and
visibility consistency
1 Pre-processing of the Multi-View source
2 Error volume generation
3 First depth approximation
4 Depth refining
◮ TV regularization
◮ Edge correspondence
◮ Visibility consistency
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 9 / 24
18. Depth from Multi-View sources Error volume generation
Error volume generation
Overview
d4 d3 d2 d1
v5
V d5
v4
v3
v2
v1
X
Motivation
For each pixel in the central view and depth value a similarity measure is
evaluated for correspondent pixels in all views. The depth with the best
similarity measure is accepted as the best estimate.
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 10 / 24
19. Depth from Multi-View sources Error volume generation
Error volume generation
Equations
Mean square error across ’colors’:
C
1
¯
Ev (x, d) = (Iv (To,v (x, d), c) − Io (x, c))2
C
c=1
Mean error across ’views’
1 ¯
E (x, d) = Ev (x, d)
N (x, d)
v ∈Rm (x,d)
Matched views Number of matched views
¯
Rm = {v : Ev (x, d) < Tm } N (x, d) = ¯
Ev (x, d) < Tm
v ∈V(x,d)
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 11 / 24
20. Depth from Multi-View sources Error volume generation
Error volume generation
Error volume and visibility: Example
6
Depth
-
x
Error volume
6
Depth
-
x
Number of matching views
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 12 / 24
21. Depth from Multi-View sources First depth approximation
First depth approximation
Direct minimization of error measure
1 Minimize the error by penalizing disparities
with less matching views:
2
˜
V(x, d)
D0 (x) = arg min E (x, d)˜
˜
d
˜
N (x, d)
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 13 / 24
22. Depth from Multi-View sources First depth approximation
First depth approximation
Direct minimization of error measure
1 Minimize the error by penalizing disparities
with less matching views:
2
˜
V(x, d)
D0 (x) = arg min E (x, d)˜
˜
d
˜
N (x, d)
2 Apply a median filter to remove noise from
the estimated depth map.
D(1) = HM (D(0) )
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 13 / 24
23. Depth from Multi-View sources Depth refining
Depth refining
Total variation regularization
Depth as a function that minimizes a two-term global energy:
˜ ˜
D(x) = arg min (Gd (D, E ) + λGr (D))
˜
D
Data term Regularization term
1 2
Gd (D, E ) = E (x, D[x]) Gr (D) = ∇x D(n) dWo
2 Wo
x∈Λo
Level set minimization
∂E
D(n+1) = D(n) + ∆T λκ ∇x D(n) − E (D(n) )
∂d
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 14 / 24
24. Depth from Multi-View sources Depth refining
Depth refining
Edge correspondence
1 Image edges
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 15 / 24
25. Depth from Multi-View sources Depth refining
Depth refining
Edge correspondence
1 Image edges
2 Distance to image edges:
F(x) = max(dist(x, ǫo ), FM )
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 15 / 24
26. Depth from Multi-View sources Depth refining
Depth refining
Edge correspondence
1 Image edges
2 Distance to image edges:
F(x) = max(dist(x, ǫo ), FM )
3 Depth edges
η (n) = δc (D(n) )
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 15 / 24
27. Depth from Multi-View sources Depth refining
Depth refining
Edge correspondence
1 Image edges
2 Distance to image edges:
F(x) = max(dist(x, ǫo ), FM )
3 Depth edges
η (n) = δc (D(n) )
4 Edge correction term
φ(x) = η (n) (x)F(x)sign ∇D(n) (x) · ∇F(x)
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 15 / 24
28. Depth from Multi-View sources Depth refining
Depth refining
Visibility consistency
Estimated visibility vs. matching visibility
Compare the visibility resulting from the estimated depth map to the
visibility suggested by the number of matching views.
Estimated visibility Matching visibility
V(x, D(n) (x)) − L=1 (Ov (xv ) = xv )
v N (x)
Q(x) = S(x) =
V(x, D(n) (x)) V(x)
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 16 / 24
29. Depth from Multi-View sources Depth refining
Depth refining
Visibility consistency
Estimated visibility vs. matching visibility
Compare the visibility resulting from the estimated depth map to the
visibility suggested by the number of matching views.
Estimated visibility Matching visibility
V(x, D(n) (x)) − L=1 (Ov (xv ) = xv )
v N (x)
Q(x) = S(x) =
V(x, D(n) (x)) V(x)
Occluded and occluding regions Conflict
Ba = {x | (Q(x) < 1) ∧ (S(x) > Q(x))} B = {y ∈ Ba |x ∈ Ja }
Ja = {x = Ov (u) | Q(x) = 1} J = {x ∈ Ja |S(x) < 1}
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 16 / 24
30. Depth from Multi-View sources Depth refining
Depth refining
Visibility consistency
Estimated visibility vs. matching visibility
Compare the visibility resulting from the estimated depth map to the
visibility suggested by the number of matching views.
Estimated visibility Matching visibility
V(x, D(n) (x)) − L=1 (Ov (xv ) = xv )
v N (x)
Q(x) = S(x) =
V(x, D(n) (x)) V(x)
Conflict Correction
B = {y ∈ Ba |x ∈ Ja } B ⇒ pushed to Foreground
J = {x ∈ Ja |S(x) < 1} J ⇒ pushed to Background
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 16 / 24
31. Depth from Multi-View sources Depth refining
Depth refining
Final evolution equation
Level sets evolution equation
∂E
D(n+1) = D(n) + ∆T λκ ∇x D(n) − E (D(n) ) + µΦ + β(B − J )
∂d
1 Total variation regularization
2 Minimization of Multi-View matching error
3 Image and depth edges correspondence
4 Occlusion correction by visibility check
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 17 / 24
32. Experimental results
Outline
1 Introduction
2 Depth information for 3D-TV
3 Depth from Multi-View sources
Algorithm overview
Error volume generation
First depth approximation
Depth refining
4 Experimental results
Application: Multi-View image coding
5 Conclusions
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 18 / 24
33. Experimental results
Experimental results
Test images and depth maps.
Original color images: View 2
Original depth images: View 2
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 19 / 24
34. Experimental results
Experimental results
Resulting depth maps and error.
Estimated depth image: View 2
Error with respect to ground-truth: 1 pixel differences
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 20 / 24
35. Experimental results
Experimental results
Error with respect to ground-truth.
Image Venus Teddy Cones Art Bowling2
PSNR(dB) 51.96 44.02 44.76 36.72 36.26
E > 1(%) 6.93 10.96 8.01 18.99 17.80
E > 2(%) 2.19 6.49 4.13 11.88 10.46
1 PSNR indicates that results close to ground-truth
2 Errors larger than 1 pixel are large
3 Errors larger than 2 pixels drop significantly
4 A 2 pixels error is manageable in intended application
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 21 / 24
36. Experimental results Application: Multi-View image coding
Experimental results
Application: Multi-View image coding
2D+Depth+Occlusions Multi-View coding system
2D 2D+D Tx
View 1 Encode Embed
D Decode
Depth
View 2 D 2D
Estimation
Edges
Mask E Encode
MN
IN WC
View N Disocclu. Wav. Tran.
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 22 / 24
38. Conclusions
Outline
1 Introduction
2 Depth information for 3D-TV
3 Depth from Multi-View sources
Algorithm overview
Error volume generation
First depth approximation
Depth refining
4 Experimental results
Application: Multi-View image coding
5 Conclusions
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 23 / 24
39. Conclusions
Conclusions
High quality depth estimation from Multi-View sources.
Occlusion processing by analysis of visibility consistency.
Total-Variation regularization ensures smooth depth with sharp edges.
Application to Multi-View image coding
Outlook
◮ Improve the visibility consistency step.
◮ Speed-up the algorithm execution.
◮ Integrating into a MPEG-2 standard stream.
V´zquez, Tam (CRC)
a 3D–TV: Depth estimation WCVIM’09 24 / 24