Fcv rep tenenbaum

How should we represent visual scenes?
Common-Sense Core,
Probabilistic Programs

Josh Tenenbaum
MIT Brain and Cognitive Sciences
CSAIL

Joint work with Noah Goodman, Chris Baker, Rebecca Saxe,
Tomer Ullman, Peter Battaglia, Jess Hamrick and others.

Core of common-sense reasoning
Human thought is structured around a basic
understanding of physical objects, intentional
agents, and their relations.
“Core knowledge” (Spelke, Carey, Leslie, Baillargeon, Gergely…)
Intuitive theories (Carey, Gopnik, Wellman, Gelman, Gentner, Forbus, McCloskey…)
Primitives of lexical semantics (Pinker, Jackendoff, Talmy, Pustejovsky)
Visual scene understanding (Everyone here…)
From scenes to stories…
The key questions:
(1) What is the form and content of human common-sense
theories of the physical world, intentional agents, and their
interaction?
(2) How are these theories used to parse visual experience
into representations that support reasoning, planning,
communication?

A developmental perspective
A 3 year old and her dad:

Dad: “What's this a picture of?”
Sarah: “A bear hugging a panda bear.”
...
Dad: “What is the second panda bear
doing?”
Sarah: “It's trying to hug the bear.”
Dad: “What about the third bear?”
Sarah: “It’s walking away.”

But this feels too hard to approach now, so what about
looking at younger children (e.g.12 months or younger)?

Intuitive physics and psychology

Southgate and Csibra, 2009
(13 month olds)

Heider and Simmel, 1944

Intuitive physics
(Gupta, Efros, Hebert)

(Whiting et al)

Probabilistic generative models
• early 1990’s-early 2000’s
– Bayesian networks: model the causal processes that
give rise to observations; perform reasoning, prediction,
planning via probabilistic inference.

– The problem: not sufficiently flexible, expressive.

Scene understanding as an
inverse problem
The “inverse Pixar” problem:

World state (t)

graphics

Image (t)

Scene understanding as an
inverse problem
The “inverse Pixar” problem:

physics
… World state (t-1) World state (t) World state (t+1) …

graphics

Image (t-1) Image (t) Image (t+1)

Probabilistic programs
• Probabilistic models a la Laplace.
– The world is fundamentally deterministic (described by a program),
and perfectly predictable if we could observe all relevant variables.
– Observations are always incomplete or indirect, so we put probability
distributions on what we can’t observe.
• Compare with Bayesian networks.
– Thick nodes. Programs defined over unbounded sets of objects, their
properties, states and relations, rather than traditional finite-
dimensional random variables.
– Thick arrows. Programs capture fine-grained causal processes
unfolding over space and time, not simply directed statistical
dependencies.
– Recursive. Probabilistic programs can be arbitrarily manipulated
inside other programs. (e.g. perceptual inferences about entities that make
perceptual inferences, entities with goals and plans re: other agents’ goals and plans.)

• Compare with grammars or logic programs.

Probabilistic programs for “inverse
pixar” scene understanding
• World state: CAD++
• Graphics
– Approximate Rendering
• Simple surface primitives
• Rasterization rather than ray tracing (for each primitive, which
pixels does it affect?)
• Image features rather than pixels
– Probabilities:
• Image noise, image features
• Unseen objects (e.g., due to occlusion)

Probabilistic programs for “inverse
pixar” scene understanding
• World state: CAD++
• Graphics
• Physics
– Approximate Newton (physical simulation toolkit, e.g. ODE)
• Collision detection: zone of interaction
• Collision response: transient springs
• Dynamics simulation: only for objects in motion
– Probabilities:
• Latent properties (e.g., mass, friction)
• Latent forces

Modeling stability judgments

physics

graphics



physics

Prob. approx. rendering



Prob.
approx.
Newton




Prob.
approx.
Newton



= perceptual uncertainty

(Hamrick,
Battaglia,
Tenenbaum,
Cogsci 2011)

Perception: Approximate posterior with block positions normally distributed
around ground truth, subject to global stability.

Reasoning : Draw multiple samples from perception.
Simulate forward with deterministic approx. Newton (ODE)

Decision: Expectations of various functions evaluated on simulation outputs.

Results
Mean human
stability
judgment

Model prediction
(expected proportion of tower that will fall)

The flexibility of common sense
(“infinite use of finite means”, “visual Turing test”)

• Which way will the blocks fall?
• How far will the blocks fall?
• If this tower falls, will it knock that one over?
• If you bump the table, will more red blocks or
yellow blocks fall over?
• If this block had (not) been present, would the
tower (still) have fallen over?
• Which of these blocks is heavier or lighter than
the others?
• …

Direction and distance of fall

If you bump the table…
(Battaglia, & Tenenbaum, in prep)

Mean human
judgment

Model prediction
(expected proportion of red vs. yellow blocks that fall)

Experiment 1: Cause/ Prevention Judgments

(Gerstenberg, Tenenbaum,
Goodman, et al., in prep)

Modeling people’s cause/prevention judgments

• Physics Simulation Model

p(B|A) – p(B| not A)

0 if ball misses
p(B|A)
1 if ball goes in

p(B| not A): assume
sparse latent Gaussian
perturbations on B’s
velocity.

Intuitive psychology

Beliefs (B) Desires (D)

Actions (A)




Actions (A)

Pr(A|B,D)
Beliefs (B)…

Desires (D) …



Probabilistic
approximate
planning

Actions (A)

Probabilistic program


In state j, choose
Beliefs (B) Desires (D) Actions i action i* =
States j
arg max pij , j u j
Probabilistic i j
approximate
“Inverse economics”
planning “Inverse optimal control”
“Inverse reinforcement learning”
“Inverse Bayesian decision theory”
Actions (A)
(Lucas & Griffiths; Jern & Kemp;
Tauber & Steyvers; Rafferty & Griffiths;
Goodman & Baker; Goodman & Stuhlmuller;
Probabilistic program Bergen, Evans & Tenenbaum …

Ng & Russell; Todorov; Rao;
Ziebart, Dey & Bagnell…)

Goal inference as inverse constraints goals

probabilistic planning rational planning
(Baker, Tenenbaum & Saxe, Cognition, 2009) (MDP)

1
r = 0.98 actions
Agent

People
0.5

0
0 0.5 1
Model

Theory of mind: Agent
Environment
state
Joint inferences about beliefs
rational
and preferences perception
(Baker, Saxe & Tenenbaum, CogSci 2011)
Beliefs Preferences
Food truck scenarios:
rational
planning

Preferences Initial Beliefs
Actions
Agent

Goal inference with constraints goals

multiple agents constraints goals rational planning
(MDP)
(Baker, Goodman & Tenenbaum,
CogSci 2008, in prep) rational planning
(MDP) actions
Agent
Southgate
& Csibra: actions
Agent

People Model

constraints goals
Inferring social goals
(Baker, Goodman & Tenenbaum, Cog constraints goals rational planning
Sci 2008; Ullman, Baker, Evans, (MDP)
Macindoe & Tenenbaum, NIPS 2009)
rational planning
(MDP) actions
Hamlin, Kuhlmeier, Wynn & Bloom: Agent

actions
Agent

Subject
ratings
prediction
Model
Subject
ratings
prediction
Model

Conclusions
From scenes to stories… What contents of stories are
routinely accessed through visual scenes? How can we
represent that content for reasoning, communication,
prediction and planning?

Focus on core knowledge present in preverbal infants:
intuitive physics, intuitive psychology.

Representations using probabilistic programs: thick nodes
(e.g. CAD++), thick arrows (physics, graphics, planning),
recursive (inference about inference, goals about goals).

Challenges for future work: (1) Integrating physics and
psychology. (2) Efficient inference. (3) Learning.

Fcv rep tenenbaum

Recommended

Recommended

More Related Content

Similar to Fcv rep tenenbaum

Similar to Fcv rep tenenbaum (10)

More from zukun

More from zukun (20)

Recently uploaded

Recently uploaded (20)

Fcv rep tenenbaum