2. Region labels + Boundaries Stronger geometric Reasoning on 3D point clouds
and objects constraints from aspects and
domain knowledge poses
More quantitative
Qualitative more precise Explicit
3. [D. Hoiem, A. A. Efros, and M. Hebert. Recovering surface layout from an image. IJCV, 75(1):151–172, 2007]
What level of representation?
Learning from image features to depth + MRF: A. Saxena, et al.. 3-D depth reconstruction from a single still image. IJCV, 76, 2007.
How qualitative?
Make3D: Learning 3D Scene Structure from a Single Still Image: A. Saxena, et al. TPAMI, 2010.
What type of training information is available?
Assumptions (camera geometry, etc…)?
Stage classes: Nedovic, V., Smeulders, A., Redert, A., Geusebroek, J.: Stages as models of scene geometry. In: PAMI (2010)
4. Next….
Can coarse surface labels be used for improving object
recognition and scene analysis performance through
better geometric reasoning?
5. Object Detection Surface Estimates Viewpoint Prior
Local Car Detector
From geometry to objects and back?
Distributions versus decisions? D. Hoiem, A. Efros, M. Hebert. Putting objects in perspective. IJCV 2009
Local Ped Detector
S.Y. Bao, M. Sun, S.Savarese. Toward
Coherent Object Detection And Scene Layout
Understanding. CVPR 2010.
B. Leibe, N. Cornelis, K. Cornelis,
and L. Van Gool. Dynamic 3D
Scene Analysis from a Moving
Vehicle. CVPR07
6. • Is a more precise representation
possible?
• Boundaries, interposition, relative
depth ordering….
• Still low-level: Can we combine
reasoning about semantic labels
Region labels + More geom. Stronger geometric Reasoning on 3D point clouds
relations and constraints from aspects and
semantic labels domain knowledge poses
More quantitative
Qualitative more precise Explicit
7. Iterative refinement
Iter 1
Iter 2
Toward true integration of geometric and
Final D. Hoiem, A. A. Efros, and M. Hebert.
Closing the loop on scene interpretation. In
semantic cues: How to beat the intractable CVPR, 2008
Global optimization
nature of the problem?
F(D|I,L) = p (d ) +
1 p (d ,d ,d ) +
pqr (d ,d )
2 p q r p 3 p g
What (features/cues can be used?
F( |I,L,S) = )+ ( )+ ( , )
p 1 i i 2 i ij 3 i j
Tx =1
Tx =1 j
i
B. Liu, S. Gould, D. Koller. Single image depth estimation from predicted semantic labels. CVPR 2010
S. Gould, R. Fulton, D. Koller. Decomposing a scene into geometric and semantically consistent regions. ICCV 2009
8. • Still mostly bottom-up
classification approach
• No use of domain
constraints or constraints
governing the physical world
Region labels + Boundaries Stronger geometric Reasoning on 3D point clouds
and objects constraints from aspects and
domain knowledge poses
More quantitative
Qualitative more precise Explicit
9. Score
How to generate and search through hypotheses
D. Lee, T. Kanade, M. Hebert. Geometric Reasoning
for Single Image Structure Recovery. CVPR09.
(in a tractable manner)?
How to evaluate score?
How to avoid early decisions?
Lines Faces
V. Hedau, D. Hoiem, D.Forsyth, “Recovering the
Spatial Layout of Cluttered Rooms,” IEEE
International Conference on Computer Vision (ICCV),
How to represent constraints in a more general
2009.
way? H. Wang, S. Gould, D. Koller. Discriminative learning
with latent variables for cluttered indoor scene
understanding. ECCV 2010.
Classifiers f(x,y,w) = wT (x,y)
Feature vector
measuring agreement
Learned weight vector between lines, faces
and labels
10. Region labels + Boundaries Stronger geometric + more constraints3D point clouds
and objects constraints from
domain knowledge
More quantitative
Qualitative more precise Explicit
12. • Search through hypothesis space
Input image Line segments and Room hypotheses
Vanishing points
Reject invalid
configurations
Geometric context Orientation map Object hypotheses
Compatibility of image data with geometric Penalty term for incompatible configurations
configuration
T T
f x, y w x, y w y
Features from image (surface labels,
vanishing points, etc.) Hypothesis: Scene layout+
object hypothesis
D. Lee, A. Gupta, M. Hebert, and T. Kanade. Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects
and Surfaces. Advances in Neural Information Processing Systems (NIPS), Vol. 24, 2010.
13. Surface Layout Density Map
Bag of Segments
Frontal Front-Right Front-Left
Left-Right Left-OccludedRight-Occluded
Porous Solid
Catalogue
14. sky above
above
above above
Medium
Medium Medium High
Infront
Medium Infront
Point-
supported
Point-
supported
Original Image supported High
Ground supported
3D Parse Graph
A. Gupta, A. Efros, and M. Hebert. Blocks World Revisited: Image Understanding Using Qualitative Geometry and
Mechanics. ECCV 2010.
http://www.cs.cmu.edu/~abhinavg/blocksworld
15. • Direct search through hypothesis space
• Sampling L. Del Pero, J. Guan, E. Brau, J. Schlecht, K. Barnard. Sampling
Bedrooms. CVPR 2011.
Diffusion moves:
Sample room boundary Sample camera
Change r = (x,y,z,w,h,l, ) Change c = f
Sample object parameters Sample over a block edge only
Change o = (x,y,z,w,h,l)
Jump moves:
16. • Direct search through hypothesis space
• Sampling
• (Constrained) object detection
P(O1,..,ON,L,H|I) = P(H)P(L|H,I) i P(Oi|L,H,I)
V. Hedau, D. Hoiem, D. Forsyth, “Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry,”
European Conference on Computer Vision (ECCV), 2010.
17. • Direct search through hypothesis space
• Sampling
• Object detection Truth Max Window Wall Balcony Door Roof
• Grammars
=
L. Simon, O. Teboul, P. Koutsourakis and N. Paragios. Random
Exploration of the Procedural Space for Single-View 3D Modeling of
Buildings. International Journal of Computer Vision (IJCV), 2010.
O. Teboul, I. Kokkinos, P. Koutsourakis, L. Simon and N. Paragios.
Shape Grammar Parsing via Reinforcement Learning. In IEEE
Conference on Computer Vision and Pattern Recognition (CVPR).
2011.
18. • Direct search through hypothesis space
• Sampling
• Object detection
• Grammars
What constraints?
How to combine low level classifiers with “higher-level” reasoning?
What are the right tools to search through and score hypotheses?
How to represent partial interpretations without early
commitment?
19. G. Tsai et al. Real-
time Indoor Scene
Understanding using
Bayesian Filtering
with Motion Cues.
ICCV 2011.
+ sparse/partial
3D data
Region labels + Boundaries Stronger geometric + more constraints3D point clouds
+ other
and objects constraints from constraints
domain knowledge + (large) prior
data
More quantitative
Qualitative more precise Explicit
20. • What level of representation? Do we need explicit
parse of the input or how far can we go with
associations?
– Hierarchical, region labels, associations,...
• How to incorporate knowledge/context external to
the input image?
– Task, geometry, contextual info (scene type,
location), text, ...
• Should we use global models vs. sequences of
simpler models? Is the problem too hard as posed,
i.e., intractable?
• What are the right definitions of actions, activities,
behaviors?
• How to combine temporal (actions) and spatial
(scenes, objects) information effectively?
• What should be evaluated and what stage?
– bounding boxes, pixelwise labels, 3D models,
actions/behaviors, predictions?