Using geometry for scene understanding

Using geometry and related
things

Region labels + Boundaries Stronger geometric Reasoning on 3D point clouds
and objects constraints from aspects and
domain knowledge poses

More quantitative
Qualitative more precise Explicit

[D. Hoiem, A. A. Efros, and M. Hebert. Recovering surface layout from an image. IJCV, 75(1):151–172, 2007]

What level of representation?
Learning from image features to depth + MRF: A. Saxena, et al.. 3-D depth reconstruction from a single still image. IJCV, 76, 2007.

How qualitative?
Make3D: Learning 3D Scene Structure from a Single Still Image: A. Saxena, et al. TPAMI, 2010.

What type of training information is available?
Assumptions (camera geometry, etc…)?

Stage classes: Nedovic, V., Smeulders, A., Redert, A., Geusebroek, J.: Stages as models of scene geometry. In: PAMI (2010)

Next….
Can coarse surface labels be used for improving object
recognition and scene analysis performance through
better geometric reasoning?

Object Detection Surface Estimates Viewpoint Prior

Local Car Detector

From geometry to objects and back?
Distributions versus decisions? D. Hoiem, A. Efros, M. Hebert. Putting objects in perspective. IJCV 2009
Local Ped Detector

S.Y. Bao, M. Sun, S.Savarese. Toward
Coherent Object Detection And Scene Layout
Understanding. CVPR 2010.

B. Leibe, N. Cornelis, K. Cornelis,
and L. Van Gool. Dynamic 3D
Scene Analysis from a Moving
Vehicle. CVPR07

• Is a more precise representation
possible?
• Boundaries, interposition, relative
depth ordering….
• Still low-level: Can we combine
reasoning about semantic labels

Region labels + More geom. Stronger geometric Reasoning on 3D point clouds
relations and constraints from aspects and
semantic labels domain knowledge poses

More quantitative

Iterative refinement

Iter 1

Iter 2

Toward true integration of geometric and
Final D. Hoiem, A. A. Efros, and M. Hebert.
Closing the loop on scene interpretation. In

semantic cues: How to beat the intractable CVPR, 2008

Global optimization
nature of the problem?
F(D|I,L) = p (d ) +
1 p (d ,d ,d ) +
pqr (d ,d )
2 p q r p 3 p g
What (features/cues can be used?
F( |I,L,S) = )+ ( )+ ( , )
p 1 i i 2 i ij 3 i j

Tx =1
Tx =1 j
i

B. Liu, S. Gould, D. Koller. Single image depth estimation from predicted semantic labels. CVPR 2010
S. Gould, R. Fulton, D. Koller. Decomposing a scene into geometric and semantically consistent regions. ICCV 2009

• Still mostly bottom-up
classification approach
• No use of domain
constraints or constraints
governing the physical world

Region labels + Boundaries Stronger geometric Reasoning on 3D point clouds
and objects constraints from aspects and
domain knowledge poses

More quantitative

Score

How to generate and search through hypotheses
D. Lee, T. Kanade, M. Hebert. Geometric Reasoning
for Single Image Structure Recovery. CVPR09.

(in a tractable manner)?
How to evaluate score?
How to avoid early decisions?
Lines Faces
V. Hedau, D. Hoiem, D.Forsyth, “Recovering the
Spatial Layout of Cluttered Rooms,” IEEE
International Conference on Computer Vision (ICCV),
How to represent constraints in a more general
2009.

way? H. Wang, S. Gould, D. Koller. Discriminative learning
with latent variables for cluttered indoor scene
understanding. ECCV 2010.

Classifiers f(x,y,w) = wT (x,y)
Feature vector
measuring agreement
Learned weight vector between lines, faces
and labels

Region labels + Boundaries Stronger geometric + more constraints3D point clouds
and objects constraints from
domain knowledge

More quantitative

• Finite volume

• Spatial exclusion

• Containment

• Stability
• Contact
• Proximity
• ………….

• Search through hypothesis space

Input image Line segments and Room hypotheses
Vanishing points

Reject invalid
configurations

Geometric context Orientation map Object hypotheses

Compatibility of image data with geometric Penalty term for incompatible configurations
configuration

T T
f x, y w x, y w y
Features from image (surface labels,
vanishing points, etc.) Hypothesis: Scene layout+
object hypothesis

D. Lee, A. Gupta, M. Hebert, and T. Kanade. Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects
and Surfaces. Advances in Neural Information Processing Systems (NIPS), Vol. 24, 2010.

Surface Layout Density Map

Bag of Segments

Frontal Front-Right Front-Left

Left-Right Left-OccludedRight-Occluded

Porous Solid

Catalogue

sky above
above
above above

Medium
Medium Medium High
Infront
Medium Infront

Point-
supported
Point-
supported
Original Image supported High

Ground supported

3D Parse Graph
A. Gupta, A. Efros, and M. Hebert. Blocks World Revisited: Image Understanding Using Qualitative Geometry and
Mechanics. ECCV 2010.

http://www.cs.cmu.edu/~abhinavg/blocksworld

• Direct search through hypothesis space
• Sampling L. Del Pero, J. Guan, E. Brau, J. Schlecht, K. Barnard. Sampling
Bedrooms. CVPR 2011.

Diffusion moves:
Sample room boundary Sample camera

Change r = (x,y,z,w,h,l, ) Change c = f

Sample object parameters Sample over a block edge only

Change o = (x,y,z,w,h,l)
Jump moves:

• Sampling
• (Constrained) object detection

P(O1,..,ON,L,H|I) = P(H)P(L|H,I) i P(Oi|L,H,I)

V. Hedau, D. Hoiem, D. Forsyth, “Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry,”
European Conference on Computer Vision (ECCV), 2010.

• Sampling
• Object detection Truth Max Window Wall Balcony Door Roof

• Grammars

=

L. Simon, O. Teboul, P. Koutsourakis and N. Paragios. Random
Exploration of the Procedural Space for Single-View 3D Modeling of
Buildings. International Journal of Computer Vision (IJCV), 2010.
O. Teboul, I. Kokkinos, P. Koutsourakis, L. Simon and N. Paragios.
Shape Grammar Parsing via Reinforcement Learning. In IEEE
Conference on Computer Vision and Pattern Recognition (CVPR).
2011.

• Sampling
• Object detection
• Grammars

What constraints?
How to combine low level classifiers with “higher-level” reasoning?
What are the right tools to search through and score hypotheses?
How to represent partial interpretations without early
commitment?

G. Tsai et al. Real-
time Indoor Scene
Understanding using
Bayesian Filtering
with Motion Cues.
ICCV 2011.
+ sparse/partial
3D data
Region labels + Boundaries Stronger geometric + more constraints3D point clouds
+ other
and objects constraints from constraints
domain knowledge + (large) prior
data

More quantitative

• What level of representation? Do we need explicit
parse of the input or how far can we go with
associations?
– Hierarchical, region labels, associations,...
• How to incorporate knowledge/context external to
the input image?
– Task, geometry, contextual info (scene type,
location), text, ...
• Should we use global models vs. sequences of
simpler models? Is the problem too hard as posed,
i.e., intractable?
• What are the right definitions of actions, activities,
behaviors?
• How to combine temporal (actions) and spatial
(scenes, objects) information effectively?
• What should be evaluated and what stage?
– bounding boxes, pixelwise labels, 3D models,
actions/behaviors, predictions?

Using geometry for scene understanding

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Similar to Using geometry for scene understanding

Similar to Using geometry for scene understanding (20)

More from zukun

More from zukun (20)

Recently uploaded

Recently uploaded (20)

Using geometry for scene understanding