SlideShare a Scribd company logo
1 of 51
How should we represent visual scenes?
               Common-Sense Core,
              Probabilistic Programs


               Josh Tenenbaum
        MIT Brain and Cognitive Sciences
                    CSAIL

  Joint work with Noah Goodman, Chris Baker, Rebecca Saxe,
    Tomer Ullman, Peter Battaglia, Jess Hamrick and others.
Core of common-sense reasoning
Human thought is structured around a basic
 understanding of physical objects, intentional
 agents, and their relations.
“Core knowledge” (Spelke, Carey, Leslie, Baillargeon, Gergely…)
Intuitive theories (Carey, Gopnik, Wellman, Gelman, Gentner, Forbus, McCloskey…)
Primitives of lexical semantics (Pinker, Jackendoff, Talmy, Pustejovsky)
Visual scene understanding (Everyone here…)
                                                   From scenes to stories…
The key questions:
  (1) What is the form and content of human common-sense
  theories of the physical world, intentional agents, and their
  interaction?
  (2) How are these theories used to parse visual experience
  into representations that support reasoning, planning,
  communication?
A developmental perspective
A 3 year old and her dad:

Dad: “What's this a picture of?”
Sarah: “A bear hugging a panda bear.”
 ...
Dad: “What is the second panda bear
  doing?”
Sarah: “It's trying to hug the bear.”
Dad: “What about the third bear?”
Sarah: “It’s walking away.”



       But this feels too hard to approach now, so what about
       looking at younger children (e.g.12 months or younger)?
Intuitive physics and psychology




Southgate and Csibra, 2009
(13 month olds)



                             Heider and Simmel, 1944
Intuitive physics
(Gupta, Efros, Hebert)




                           (Whiting et al)
Intuitive psychology
Probabilistic generative models
• early 1990’s-early 2000’s
   – Bayesian networks: model the causal processes that
     give rise to observations; perform reasoning, prediction,
     planning via probabilistic inference.




   – The problem: not sufficiently flexible, expressive.
Scene understanding as an
         inverse problem
The “inverse Pixar” problem:



                 World state (t)

             graphics

                   Image (t)
Scene understanding as an
               inverse problem
   The “inverse Pixar” problem:

                      physics
… World state (t-1)         World state (t)   World state (t+1) …

                       graphics

      Image (t-1)               Image (t)       Image (t+1)
Probabilistic programs
• Probabilistic models a la Laplace.
   – The world is fundamentally deterministic (described by a program),
     and perfectly predictable if we could observe all relevant variables.
   – Observations are always incomplete or indirect, so we put probability
     distributions on what we can’t observe.
• Compare with Bayesian networks.
   – Thick nodes. Programs defined over unbounded sets of objects, their
     properties, states and relations, rather than traditional finite-
     dimensional random variables.
   – Thick arrows. Programs capture fine-grained causal processes
     unfolding over space and time, not simply directed statistical
     dependencies.
   – Recursive. Probabilistic programs can be arbitrarily manipulated
     inside other programs. (e.g. perceptual inferences about entities that make
      perceptual inferences, entities with goals and plans re: other agents’ goals and plans.)

• Compare with grammars or logic programs.
Probabilistic programs for “inverse
     pixar” scene understanding
• World state: CAD++
• Graphics
  – Approximate Rendering
     • Simple surface primitives
     • Rasterization rather than ray tracing (for each primitive, which
       pixels does it affect?)
     • Image features rather than pixels
  – Probabilities:
     • Image noise, image features
     • Unseen objects (e.g., due to occlusion)
Probabilistic programs for “inverse
     pixar” scene understanding
• World state: CAD++
• Graphics
• Physics
  – Approximate Newton (physical simulation toolkit, e.g. ODE)
     • Collision detection: zone of interaction
     • Collision response: transient springs
     • Dynamics simulation: only for objects in motion
  – Probabilities:
     • Latent properties (e.g., mass, friction)
     • Latent forces
Modeling stability judgments
Modeling stability judgments


                      physics
… World state (t-1)             World state (t)   World state (t+1) …

                        graphics

      Image (t-1)                 Image (t)         Image (t+1)
Modeling stability judgments


                      physics
… World state (t-1)             World state (t)   World state (t+1) …

                        Prob. approx. rendering

      Image (t-1)                 Image (t)         Image (t+1)
Modeling stability judgments


                      physics
… World state (t-1)             World state (t)   World state (t+1) …

                        Prob. approx. rendering

      Image (t-1)                 Image (t)         Image (t+1)
Modeling stability judgments

                    Prob.
                    approx.
                    Newton
… World state (t-1)           World state (t)    World state (t+1) …

                       Prob. approx. rendering

      Image (t-1)               Image (t)          Image (t+1)
Modeling stability judgments

                    Prob.
                    approx.
                    Newton
… World state (t-1)           World state (t)    World state (t+1) …

                       Prob. approx. rendering

      Image (t-1)               Image (t)          Image (t+1)




                               = perceptual uncertainty
Modeling stability judgments
   (Hamrick,
   Battaglia,
   Tenenbaum,
   Cogsci 2011)




Perception: Approximate posterior with block positions normally distributed
     around ground truth, subject to global stability.

Reasoning : Draw multiple samples from perception.
            Simulate forward with deterministic approx. Newton (ODE)

Decision: Expectations of various functions evaluated on simulation outputs.
Results
Mean human
stability
judgment




             Model prediction
             (expected proportion of tower that will fall)
Simpler alternatives?
The flexibility of common sense
(“infinite use of finite means”, “visual Turing test”)

• Which way will the blocks fall?
• How far will the blocks fall?
• If this tower falls, will it knock that one over?
• If you bump the table, will more red blocks or
  yellow blocks fall over?
• If this block had (not) been present, would the
  tower (still) have fallen over?
• Which of these blocks is heavier or lighter than
  the others?
• …
Direction of fall
Direction and distance of fall
If you bump the table…
If you bump the table…
              (Battaglia, & Tenenbaum, in prep)


Mean human
judgment




              Model prediction
              (expected proportion of red vs. yellow blocks that fall)
Experiment 1: Cause/ Prevention Judgments


                          (Gerstenberg, Tenenbaum,
                          Goodman, et al., in prep)
Modeling people’s cause/prevention judgments

• Physics Simulation Model

                                  p(B|A) – p(B| not A)

                                               0 if ball misses
                                  p(B|A)
                                               1 if ball goes in

                                    p(B| not A): assume
                                    sparse latent Gaussian
                                    perturbations on B’s
                                    velocity.
Simulation Model
Intuitive psychology



Beliefs (B)    Desires (D)


       Actions (A)




                             Heider and Simmel, 1944
Intuitive psychology

      Beliefs (B)          Desires (D)


                   Actions (A)

Pr(A|B,D)
Beliefs (B)…




                                         Heider and Simmel, 1944
               Desires (D) …
Intuitive psychology

Beliefs (B)    Desires (D)

      Probabilistic
      approximate
        planning


        Actions (A)


Probabilistic program

                             Heider and Simmel, 1944
Intuitive psychology
                                                 In state j, choose
Beliefs (B)    Desires (D)    Actions i          action i* =
                              States j
                                                 arg max        pij , j u j
      Probabilistic                                   i     j
      approximate
                             “Inverse economics”
        planning             “Inverse optimal control”
                             “Inverse reinforcement learning”
                             “Inverse Bayesian decision theory”
        Actions (A)
                              (Lucas & Griffiths; Jern & Kemp;
                             Tauber & Steyvers; Rafferty & Griffiths;
                             Goodman & Baker; Goodman & Stuhlmuller;
Probabilistic program        Bergen, Evans & Tenenbaum …

                             Ng & Russell; Todorov; Rao;
                             Ziebart, Dey & Bagnell…)
Goal inference as inverse                                        constraints    goals

 probabilistic planning                                               rational planning
  (Baker, Tenenbaum & Saxe, Cognition, 2009)                               (MDP)

                                        1
                                                r = 0.98                  actions
                                                                 Agent




                              People
                                       0.5



                                        0
                                            0          0.5   1
                                                     Model
Theory of mind:                                   Agent
                                                                        Environment
                                                          state
 Joint inferences about beliefs
                                                                 rational
        and preferences                                         perception
 (Baker, Saxe & Tenenbaum, CogSci 2011)
                                                                  Beliefs    Preferences
Food truck scenarios:
                                                                        rational
                                                                        planning


                        Preferences   Initial Beliefs
                                                                       Actions
                                                        Agent
Goal inference with                                   constraints     goals


  multiple agents                  constraints    goals      rational planning
                                                                  (MDP)
    (Baker, Goodman & Tenenbaum,
    CogSci 2008, in prep)               rational planning
                                             (MDP)               actions
                                                         Agent
Southgate
& Csibra:                                   actions
                                   Agent


                                                       People          Model
constraints       goals
 Inferring social goals
 (Baker, Goodman & Tenenbaum, Cog   constraints    goals             rational planning
 Sci 2008; Ullman, Baker, Evans,                                          (MDP)
 Macindoe & Tenenbaum, NIPS 2009)
                                         rational planning
                                              (MDP)                     actions
Hamlin, Kuhlmeier, Wynn & Bloom:                              Agent

                                             actions
                                    Agent




                                                        Subject
                                                        ratings
                                                        prediction
                                                          Model
                                                        Subject
                                                        ratings
                                                        prediction
                                                          Model
Conclusions
From scenes to stories… What contents of stories are
  routinely accessed through visual scenes? How can we
  represent that content for reasoning, communication,
  prediction and planning?

Focus on core knowledge present in preverbal infants:
  intuitive physics, intuitive psychology.

Representations using probabilistic programs: thick nodes
  (e.g. CAD++), thick arrows (physics, graphics, planning),
  recursive (inference about inference, goals about goals).

Challenges for future work: (1) Integrating physics and
  psychology. (2) Efficient inference. (3) Learning.

More Related Content

Similar to Fcv rep tenenbaum

Fcv hist zhu
Fcv hist zhuFcv hist zhu
Fcv hist zhu
zukun
 
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Numenta
 

Similar to Fcv rep tenenbaum (10)

GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Desy
DesyDesy
Desy
 
Statistics in Astronomy
Statistics in AstronomyStatistics in Astronomy
Statistics in Astronomy
 
Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
2주차
2주차2주차
2주차
 
Fcv hist zhu
Fcv hist zhuFcv hist zhu
Fcv hist zhu
 
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
 
Cosmology: A Bayesian Perspective
Cosmology: A Bayesian PerspectiveCosmology: A Bayesian Perspective
Cosmology: A Bayesian Perspective
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
zukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
zukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
zukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
zukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
zukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
zukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
zukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
zukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
zukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
zukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
zukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
zukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
zukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
zukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
zukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
zukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
zukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
zukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
zukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Fcv rep tenenbaum

  • 1. How should we represent visual scenes? Common-Sense Core, Probabilistic Programs Josh Tenenbaum MIT Brain and Cognitive Sciences CSAIL Joint work with Noah Goodman, Chris Baker, Rebecca Saxe, Tomer Ullman, Peter Battaglia, Jess Hamrick and others.
  • 2. Core of common-sense reasoning Human thought is structured around a basic understanding of physical objects, intentional agents, and their relations. “Core knowledge” (Spelke, Carey, Leslie, Baillargeon, Gergely…) Intuitive theories (Carey, Gopnik, Wellman, Gelman, Gentner, Forbus, McCloskey…) Primitives of lexical semantics (Pinker, Jackendoff, Talmy, Pustejovsky) Visual scene understanding (Everyone here…) From scenes to stories… The key questions: (1) What is the form and content of human common-sense theories of the physical world, intentional agents, and their interaction? (2) How are these theories used to parse visual experience into representations that support reasoning, planning, communication?
  • 3. A developmental perspective A 3 year old and her dad: Dad: “What's this a picture of?” Sarah: “A bear hugging a panda bear.” ... Dad: “What is the second panda bear doing?” Sarah: “It's trying to hug the bear.” Dad: “What about the third bear?” Sarah: “It’s walking away.” But this feels too hard to approach now, so what about looking at younger children (e.g.12 months or younger)?
  • 4. Intuitive physics and psychology Southgate and Csibra, 2009 (13 month olds) Heider and Simmel, 1944
  • 5. Intuitive physics (Gupta, Efros, Hebert) (Whiting et al)
  • 7. Probabilistic generative models • early 1990’s-early 2000’s – Bayesian networks: model the causal processes that give rise to observations; perform reasoning, prediction, planning via probabilistic inference. – The problem: not sufficiently flexible, expressive.
  • 8. Scene understanding as an inverse problem The “inverse Pixar” problem: World state (t) graphics Image (t)
  • 9. Scene understanding as an inverse problem The “inverse Pixar” problem: physics … World state (t-1) World state (t) World state (t+1) … graphics Image (t-1) Image (t) Image (t+1)
  • 10. Probabilistic programs • Probabilistic models a la Laplace. – The world is fundamentally deterministic (described by a program), and perfectly predictable if we could observe all relevant variables. – Observations are always incomplete or indirect, so we put probability distributions on what we can’t observe. • Compare with Bayesian networks. – Thick nodes. Programs defined over unbounded sets of objects, their properties, states and relations, rather than traditional finite- dimensional random variables. – Thick arrows. Programs capture fine-grained causal processes unfolding over space and time, not simply directed statistical dependencies. – Recursive. Probabilistic programs can be arbitrarily manipulated inside other programs. (e.g. perceptual inferences about entities that make perceptual inferences, entities with goals and plans re: other agents’ goals and plans.) • Compare with grammars or logic programs.
  • 11. Probabilistic programs for “inverse pixar” scene understanding • World state: CAD++ • Graphics – Approximate Rendering • Simple surface primitives • Rasterization rather than ray tracing (for each primitive, which pixels does it affect?) • Image features rather than pixels – Probabilities: • Image noise, image features • Unseen objects (e.g., due to occlusion)
  • 12. Probabilistic programs for “inverse pixar” scene understanding • World state: CAD++ • Graphics • Physics – Approximate Newton (physical simulation toolkit, e.g. ODE) • Collision detection: zone of interaction • Collision response: transient springs • Dynamics simulation: only for objects in motion – Probabilities: • Latent properties (e.g., mass, friction) • Latent forces
  • 14. Modeling stability judgments physics … World state (t-1) World state (t) World state (t+1) … graphics Image (t-1) Image (t) Image (t+1)
  • 15. Modeling stability judgments physics … World state (t-1) World state (t) World state (t+1) … Prob. approx. rendering Image (t-1) Image (t) Image (t+1)
  • 16. Modeling stability judgments physics … World state (t-1) World state (t) World state (t+1) … Prob. approx. rendering Image (t-1) Image (t) Image (t+1)
  • 17. Modeling stability judgments Prob. approx. Newton … World state (t-1) World state (t) World state (t+1) … Prob. approx. rendering Image (t-1) Image (t) Image (t+1)
  • 18. Modeling stability judgments Prob. approx. Newton … World state (t-1) World state (t) World state (t+1) … Prob. approx. rendering Image (t-1) Image (t) Image (t+1) = perceptual uncertainty
  • 19. Modeling stability judgments (Hamrick, Battaglia, Tenenbaum, Cogsci 2011) Perception: Approximate posterior with block positions normally distributed around ground truth, subject to global stability. Reasoning : Draw multiple samples from perception. Simulate forward with deterministic approx. Newton (ODE) Decision: Expectations of various functions evaluated on simulation outputs.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. Results Mean human stability judgment Model prediction (expected proportion of tower that will fall)
  • 28. The flexibility of common sense (“infinite use of finite means”, “visual Turing test”) • Which way will the blocks fall? • How far will the blocks fall? • If this tower falls, will it knock that one over? • If you bump the table, will more red blocks or yellow blocks fall over? • If this block had (not) been present, would the tower (still) have fallen over? • Which of these blocks is heavier or lighter than the others? • …
  • 31. If you bump the table…
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. If you bump the table… (Battaglia, & Tenenbaum, in prep) Mean human judgment Model prediction (expected proportion of red vs. yellow blocks that fall)
  • 40. Experiment 1: Cause/ Prevention Judgments (Gerstenberg, Tenenbaum, Goodman, et al., in prep)
  • 41. Modeling people’s cause/prevention judgments • Physics Simulation Model p(B|A) – p(B| not A) 0 if ball misses p(B|A) 1 if ball goes in p(B| not A): assume sparse latent Gaussian perturbations on B’s velocity.
  • 43. Intuitive psychology Beliefs (B) Desires (D) Actions (A) Heider and Simmel, 1944
  • 44. Intuitive psychology Beliefs (B) Desires (D) Actions (A) Pr(A|B,D) Beliefs (B)… Heider and Simmel, 1944 Desires (D) …
  • 45. Intuitive psychology Beliefs (B) Desires (D) Probabilistic approximate planning Actions (A) Probabilistic program Heider and Simmel, 1944
  • 46. Intuitive psychology In state j, choose Beliefs (B) Desires (D) Actions i action i* = States j arg max pij , j u j Probabilistic i j approximate “Inverse economics” planning “Inverse optimal control” “Inverse reinforcement learning” “Inverse Bayesian decision theory” Actions (A) (Lucas & Griffiths; Jern & Kemp; Tauber & Steyvers; Rafferty & Griffiths; Goodman & Baker; Goodman & Stuhlmuller; Probabilistic program Bergen, Evans & Tenenbaum … Ng & Russell; Todorov; Rao; Ziebart, Dey & Bagnell…)
  • 47. Goal inference as inverse constraints goals probabilistic planning rational planning (Baker, Tenenbaum & Saxe, Cognition, 2009) (MDP) 1 r = 0.98 actions Agent People 0.5 0 0 0.5 1 Model
  • 48. Theory of mind: Agent Environment state Joint inferences about beliefs rational and preferences perception (Baker, Saxe & Tenenbaum, CogSci 2011) Beliefs Preferences Food truck scenarios: rational planning Preferences Initial Beliefs Actions Agent
  • 49. Goal inference with constraints goals multiple agents constraints goals rational planning (MDP) (Baker, Goodman & Tenenbaum, CogSci 2008, in prep) rational planning (MDP) actions Agent Southgate & Csibra: actions Agent People Model
  • 50. constraints goals Inferring social goals (Baker, Goodman & Tenenbaum, Cog constraints goals rational planning Sci 2008; Ullman, Baker, Evans, (MDP) Macindoe & Tenenbaum, NIPS 2009) rational planning (MDP) actions Hamlin, Kuhlmeier, Wynn & Bloom: Agent actions Agent Subject ratings prediction Model Subject ratings prediction Model
  • 51. Conclusions From scenes to stories… What contents of stories are routinely accessed through visual scenes? How can we represent that content for reasoning, communication, prediction and planning? Focus on core knowledge present in preverbal infants: intuitive physics, intuitive psychology. Representations using probabilistic programs: thick nodes (e.g. CAD++), thick arrows (physics, graphics, planning), recursive (inference about inference, goals about goals). Challenges for future work: (1) Integrating physics and psychology. (2) Efficient inference. (3) Learning.