SlideShare a Scribd company logo
How should we represent visual scenes?
               Common-Sense Core,
              Probabilistic Programs


               Josh Tenenbaum
        MIT Brain and Cognitive Sciences
                    CSAIL

  Joint work with Noah Goodman, Chris Baker, Rebecca Saxe,
    Tomer Ullman, Peter Battaglia, Jess Hamrick and others.
Core of common-sense reasoning
Human thought is structured around a basic
 understanding of physical objects, intentional
 agents, and their relations.
“Core knowledge” (Spelke, Carey, Leslie, Baillargeon, Gergely…)
Intuitive theories (Carey, Gopnik, Wellman, Gelman, Gentner, Forbus, McCloskey…)
Primitives of lexical semantics (Pinker, Jackendoff, Talmy, Pustejovsky)
Visual scene understanding (Everyone here…)
                                                   From scenes to stories…
The key questions:
  (1) What is the form and content of human common-sense
  theories of the physical world, intentional agents, and their
  interaction?
  (2) How are these theories used to parse visual experience
  into representations that support reasoning, planning,
  communication?
A developmental perspective
A 3 year old and her dad:

Dad: “What's this a picture of?”
Sarah: “A bear hugging a panda bear.”
 ...
Dad: “What is the second panda bear
  doing?”
Sarah: “It's trying to hug the bear.”
Dad: “What about the third bear?”
Sarah: “It’s walking away.”



       But this feels too hard to approach now, so what about
       looking at younger children (e.g.12 months or younger)?
Intuitive physics and psychology




Southgate and Csibra, 2009
(13 month olds)



                             Heider and Simmel, 1944
Intuitive physics
(Gupta, Efros, Hebert)




                           (Whiting et al)
Intuitive psychology
Probabilistic generative models
• early 1990’s-early 2000’s
   – Bayesian networks: model the causal processes that
     give rise to observations; perform reasoning, prediction,
     planning via probabilistic inference.




   – The problem: not sufficiently flexible, expressive.
Scene understanding as an
         inverse problem
The “inverse Pixar” problem:



                 World state (t)

             graphics

                   Image (t)
Scene understanding as an
               inverse problem
   The “inverse Pixar” problem:

                      physics
… World state (t-1)         World state (t)   World state (t+1) …

                       graphics

      Image (t-1)               Image (t)       Image (t+1)
Probabilistic programs
• Probabilistic models a la Laplace.
   – The world is fundamentally deterministic (described by a program),
     and perfectly predictable if we could observe all relevant variables.
   – Observations are always incomplete or indirect, so we put probability
     distributions on what we can’t observe.
• Compare with Bayesian networks.
   – Thick nodes. Programs defined over unbounded sets of objects, their
     properties, states and relations, rather than traditional finite-
     dimensional random variables.
   – Thick arrows. Programs capture fine-grained causal processes
     unfolding over space and time, not simply directed statistical
     dependencies.
   – Recursive. Probabilistic programs can be arbitrarily manipulated
     inside other programs. (e.g. perceptual inferences about entities that make
      perceptual inferences, entities with goals and plans re: other agents’ goals and plans.)

• Compare with grammars or logic programs.
Probabilistic programs for “inverse
     pixar” scene understanding
• World state: CAD++
• Graphics
  – Approximate Rendering
     • Simple surface primitives
     • Rasterization rather than ray tracing (for each primitive, which
       pixels does it affect?)
     • Image features rather than pixels
  – Probabilities:
     • Image noise, image features
     • Unseen objects (e.g., due to occlusion)
Probabilistic programs for “inverse
     pixar” scene understanding
• World state: CAD++
• Graphics
• Physics
  – Approximate Newton (physical simulation toolkit, e.g. ODE)
     • Collision detection: zone of interaction
     • Collision response: transient springs
     • Dynamics simulation: only for objects in motion
  – Probabilities:
     • Latent properties (e.g., mass, friction)
     • Latent forces
Modeling stability judgments
Modeling stability judgments


                      physics
… World state (t-1)             World state (t)   World state (t+1) …

                        graphics

      Image (t-1)                 Image (t)         Image (t+1)
Modeling stability judgments


                      physics
… World state (t-1)             World state (t)   World state (t+1) …

                        Prob. approx. rendering

      Image (t-1)                 Image (t)         Image (t+1)
Modeling stability judgments


                      physics
… World state (t-1)             World state (t)   World state (t+1) …

                        Prob. approx. rendering

      Image (t-1)                 Image (t)         Image (t+1)
Modeling stability judgments

                    Prob.
                    approx.
                    Newton
… World state (t-1)           World state (t)    World state (t+1) …

                       Prob. approx. rendering

      Image (t-1)               Image (t)          Image (t+1)
Modeling stability judgments

                    Prob.
                    approx.
                    Newton
… World state (t-1)           World state (t)    World state (t+1) …

                       Prob. approx. rendering

      Image (t-1)               Image (t)          Image (t+1)




                               = perceptual uncertainty
Modeling stability judgments
   (Hamrick,
   Battaglia,
   Tenenbaum,
   Cogsci 2011)




Perception: Approximate posterior with block positions normally distributed
     around ground truth, subject to global stability.

Reasoning : Draw multiple samples from perception.
            Simulate forward with deterministic approx. Newton (ODE)

Decision: Expectations of various functions evaluated on simulation outputs.
Results
Mean human
stability
judgment




             Model prediction
             (expected proportion of tower that will fall)
Simpler alternatives?
The flexibility of common sense
(“infinite use of finite means”, “visual Turing test”)

• Which way will the blocks fall?
• How far will the blocks fall?
• If this tower falls, will it knock that one over?
• If you bump the table, will more red blocks or
  yellow blocks fall over?
• If this block had (not) been present, would the
  tower (still) have fallen over?
• Which of these blocks is heavier or lighter than
  the others?
• …
Direction of fall
Direction and distance of fall
If you bump the table…
If you bump the table…
              (Battaglia, & Tenenbaum, in prep)


Mean human
judgment




              Model prediction
              (expected proportion of red vs. yellow blocks that fall)
Experiment 1: Cause/ Prevention Judgments


                          (Gerstenberg, Tenenbaum,
                          Goodman, et al., in prep)
Modeling people’s cause/prevention judgments

• Physics Simulation Model

                                  p(B|A) – p(B| not A)

                                               0 if ball misses
                                  p(B|A)
                                               1 if ball goes in

                                    p(B| not A): assume
                                    sparse latent Gaussian
                                    perturbations on B’s
                                    velocity.
Simulation Model
Intuitive psychology



Beliefs (B)    Desires (D)


       Actions (A)




                             Heider and Simmel, 1944
Intuitive psychology

      Beliefs (B)          Desires (D)


                   Actions (A)

Pr(A|B,D)
Beliefs (B)…




                                         Heider and Simmel, 1944
               Desires (D) …
Intuitive psychology

Beliefs (B)    Desires (D)

      Probabilistic
      approximate
        planning


        Actions (A)


Probabilistic program

                             Heider and Simmel, 1944
Intuitive psychology
                                                 In state j, choose
Beliefs (B)    Desires (D)    Actions i          action i* =
                              States j
                                                 arg max        pij , j u j
      Probabilistic                                   i     j
      approximate
                             “Inverse economics”
        planning             “Inverse optimal control”
                             “Inverse reinforcement learning”
                             “Inverse Bayesian decision theory”
        Actions (A)
                              (Lucas & Griffiths; Jern & Kemp;
                             Tauber & Steyvers; Rafferty & Griffiths;
                             Goodman & Baker; Goodman & Stuhlmuller;
Probabilistic program        Bergen, Evans & Tenenbaum …

                             Ng & Russell; Todorov; Rao;
                             Ziebart, Dey & Bagnell…)
Goal inference as inverse                                        constraints    goals

 probabilistic planning                                               rational planning
  (Baker, Tenenbaum & Saxe, Cognition, 2009)                               (MDP)

                                        1
                                                r = 0.98                  actions
                                                                 Agent




                              People
                                       0.5



                                        0
                                            0          0.5   1
                                                     Model
Theory of mind:                                   Agent
                                                                        Environment
                                                          state
 Joint inferences about beliefs
                                                                 rational
        and preferences                                         perception
 (Baker, Saxe & Tenenbaum, CogSci 2011)
                                                                  Beliefs    Preferences
Food truck scenarios:
                                                                        rational
                                                                        planning


                        Preferences   Initial Beliefs
                                                                       Actions
                                                        Agent
Goal inference with                                   constraints     goals


  multiple agents                  constraints    goals      rational planning
                                                                  (MDP)
    (Baker, Goodman & Tenenbaum,
    CogSci 2008, in prep)               rational planning
                                             (MDP)               actions
                                                         Agent
Southgate
& Csibra:                                   actions
                                   Agent


                                                       People          Model
constraints       goals
 Inferring social goals
 (Baker, Goodman & Tenenbaum, Cog   constraints    goals             rational planning
 Sci 2008; Ullman, Baker, Evans,                                          (MDP)
 Macindoe & Tenenbaum, NIPS 2009)
                                         rational planning
                                              (MDP)                     actions
Hamlin, Kuhlmeier, Wynn & Bloom:                              Agent

                                             actions
                                    Agent




                                                        Subject
                                                        ratings
                                                        prediction
                                                          Model
                                                        Subject
                                                        ratings
                                                        prediction
                                                          Model
Conclusions
From scenes to stories… What contents of stories are
  routinely accessed through visual scenes? How can we
  represent that content for reasoning, communication,
  prediction and planning?

Focus on core knowledge present in preverbal infants:
  intuitive physics, intuitive psychology.

Representations using probabilistic programs: thick nodes
  (e.g. CAD++), thick arrows (physics, graphics, planning),
  recursive (inference about inference, goals about goals).

Challenges for future work: (1) Integrating physics and
  psychology. (2) Efficient inference. (3) Learning.

More Related Content

Similar to Fcv rep tenenbaum

GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
Natan Katz
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
Christian Robert
 
Statistics in Astronomy
Statistics in AstronomyStatistics in Astronomy
Statistics in Astronomy
Peter Coles
 
Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
manojsonkar
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
Jan Aerts
 
Fcv hist zhu
Fcv hist zhuFcv hist zhu
Fcv hist zhuzukun
 
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Numenta
 
Cosmology: A Bayesian Perspective
Cosmology: A Bayesian PerspectiveCosmology: A Bayesian Perspective
Cosmology: A Bayesian Perspective
Peter Coles
 

Similar to Fcv rep tenenbaum (10)

GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Desy
DesyDesy
Desy
 
Statistics in Astronomy
Statistics in AstronomyStatistics in Astronomy
Statistics in Astronomy
 
Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
2주차
2주차2주차
2주차
 
Fcv hist zhu
Fcv hist zhuFcv hist zhu
Fcv hist zhu
 
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
 
Cosmology: A Bayesian Perspective
Cosmology: A Bayesian PerspectiveCosmology: A Bayesian Perspective
Cosmology: A Bayesian Perspective
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 

Fcv rep tenenbaum

  • 1. How should we represent visual scenes? Common-Sense Core, Probabilistic Programs Josh Tenenbaum MIT Brain and Cognitive Sciences CSAIL Joint work with Noah Goodman, Chris Baker, Rebecca Saxe, Tomer Ullman, Peter Battaglia, Jess Hamrick and others.
  • 2. Core of common-sense reasoning Human thought is structured around a basic understanding of physical objects, intentional agents, and their relations. “Core knowledge” (Spelke, Carey, Leslie, Baillargeon, Gergely…) Intuitive theories (Carey, Gopnik, Wellman, Gelman, Gentner, Forbus, McCloskey…) Primitives of lexical semantics (Pinker, Jackendoff, Talmy, Pustejovsky) Visual scene understanding (Everyone here…) From scenes to stories… The key questions: (1) What is the form and content of human common-sense theories of the physical world, intentional agents, and their interaction? (2) How are these theories used to parse visual experience into representations that support reasoning, planning, communication?
  • 3. A developmental perspective A 3 year old and her dad: Dad: “What's this a picture of?” Sarah: “A bear hugging a panda bear.” ... Dad: “What is the second panda bear doing?” Sarah: “It's trying to hug the bear.” Dad: “What about the third bear?” Sarah: “It’s walking away.” But this feels too hard to approach now, so what about looking at younger children (e.g.12 months or younger)?
  • 4. Intuitive physics and psychology Southgate and Csibra, 2009 (13 month olds) Heider and Simmel, 1944
  • 5. Intuitive physics (Gupta, Efros, Hebert) (Whiting et al)
  • 7. Probabilistic generative models • early 1990’s-early 2000’s – Bayesian networks: model the causal processes that give rise to observations; perform reasoning, prediction, planning via probabilistic inference. – The problem: not sufficiently flexible, expressive.
  • 8. Scene understanding as an inverse problem The “inverse Pixar” problem: World state (t) graphics Image (t)
  • 9. Scene understanding as an inverse problem The “inverse Pixar” problem: physics … World state (t-1) World state (t) World state (t+1) … graphics Image (t-1) Image (t) Image (t+1)
  • 10. Probabilistic programs • Probabilistic models a la Laplace. – The world is fundamentally deterministic (described by a program), and perfectly predictable if we could observe all relevant variables. – Observations are always incomplete or indirect, so we put probability distributions on what we can’t observe. • Compare with Bayesian networks. – Thick nodes. Programs defined over unbounded sets of objects, their properties, states and relations, rather than traditional finite- dimensional random variables. – Thick arrows. Programs capture fine-grained causal processes unfolding over space and time, not simply directed statistical dependencies. – Recursive. Probabilistic programs can be arbitrarily manipulated inside other programs. (e.g. perceptual inferences about entities that make perceptual inferences, entities with goals and plans re: other agents’ goals and plans.) • Compare with grammars or logic programs.
  • 11. Probabilistic programs for “inverse pixar” scene understanding • World state: CAD++ • Graphics – Approximate Rendering • Simple surface primitives • Rasterization rather than ray tracing (for each primitive, which pixels does it affect?) • Image features rather than pixels – Probabilities: • Image noise, image features • Unseen objects (e.g., due to occlusion)
  • 12. Probabilistic programs for “inverse pixar” scene understanding • World state: CAD++ • Graphics • Physics – Approximate Newton (physical simulation toolkit, e.g. ODE) • Collision detection: zone of interaction • Collision response: transient springs • Dynamics simulation: only for objects in motion – Probabilities: • Latent properties (e.g., mass, friction) • Latent forces
  • 14. Modeling stability judgments physics … World state (t-1) World state (t) World state (t+1) … graphics Image (t-1) Image (t) Image (t+1)
  • 15. Modeling stability judgments physics … World state (t-1) World state (t) World state (t+1) … Prob. approx. rendering Image (t-1) Image (t) Image (t+1)
  • 16. Modeling stability judgments physics … World state (t-1) World state (t) World state (t+1) … Prob. approx. rendering Image (t-1) Image (t) Image (t+1)
  • 17. Modeling stability judgments Prob. approx. Newton … World state (t-1) World state (t) World state (t+1) … Prob. approx. rendering Image (t-1) Image (t) Image (t+1)
  • 18. Modeling stability judgments Prob. approx. Newton … World state (t-1) World state (t) World state (t+1) … Prob. approx. rendering Image (t-1) Image (t) Image (t+1) = perceptual uncertainty
  • 19. Modeling stability judgments (Hamrick, Battaglia, Tenenbaum, Cogsci 2011) Perception: Approximate posterior with block positions normally distributed around ground truth, subject to global stability. Reasoning : Draw multiple samples from perception. Simulate forward with deterministic approx. Newton (ODE) Decision: Expectations of various functions evaluated on simulation outputs.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. Results Mean human stability judgment Model prediction (expected proportion of tower that will fall)
  • 28. The flexibility of common sense (“infinite use of finite means”, “visual Turing test”) • Which way will the blocks fall? • How far will the blocks fall? • If this tower falls, will it knock that one over? • If you bump the table, will more red blocks or yellow blocks fall over? • If this block had (not) been present, would the tower (still) have fallen over? • Which of these blocks is heavier or lighter than the others? • …
  • 31. If you bump the table…
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. If you bump the table… (Battaglia, & Tenenbaum, in prep) Mean human judgment Model prediction (expected proportion of red vs. yellow blocks that fall)
  • 40. Experiment 1: Cause/ Prevention Judgments (Gerstenberg, Tenenbaum, Goodman, et al., in prep)
  • 41. Modeling people’s cause/prevention judgments • Physics Simulation Model p(B|A) – p(B| not A) 0 if ball misses p(B|A) 1 if ball goes in p(B| not A): assume sparse latent Gaussian perturbations on B’s velocity.
  • 43. Intuitive psychology Beliefs (B) Desires (D) Actions (A) Heider and Simmel, 1944
  • 44. Intuitive psychology Beliefs (B) Desires (D) Actions (A) Pr(A|B,D) Beliefs (B)… Heider and Simmel, 1944 Desires (D) …
  • 45. Intuitive psychology Beliefs (B) Desires (D) Probabilistic approximate planning Actions (A) Probabilistic program Heider and Simmel, 1944
  • 46. Intuitive psychology In state j, choose Beliefs (B) Desires (D) Actions i action i* = States j arg max pij , j u j Probabilistic i j approximate “Inverse economics” planning “Inverse optimal control” “Inverse reinforcement learning” “Inverse Bayesian decision theory” Actions (A) (Lucas & Griffiths; Jern & Kemp; Tauber & Steyvers; Rafferty & Griffiths; Goodman & Baker; Goodman & Stuhlmuller; Probabilistic program Bergen, Evans & Tenenbaum … Ng & Russell; Todorov; Rao; Ziebart, Dey & Bagnell…)
  • 47. Goal inference as inverse constraints goals probabilistic planning rational planning (Baker, Tenenbaum & Saxe, Cognition, 2009) (MDP) 1 r = 0.98 actions Agent People 0.5 0 0 0.5 1 Model
  • 48. Theory of mind: Agent Environment state Joint inferences about beliefs rational and preferences perception (Baker, Saxe & Tenenbaum, CogSci 2011) Beliefs Preferences Food truck scenarios: rational planning Preferences Initial Beliefs Actions Agent
  • 49. Goal inference with constraints goals multiple agents constraints goals rational planning (MDP) (Baker, Goodman & Tenenbaum, CogSci 2008, in prep) rational planning (MDP) actions Agent Southgate & Csibra: actions Agent People Model
  • 50. constraints goals Inferring social goals (Baker, Goodman & Tenenbaum, Cog constraints goals rational planning Sci 2008; Ullman, Baker, Evans, (MDP) Macindoe & Tenenbaum, NIPS 2009) rational planning (MDP) actions Hamlin, Kuhlmeier, Wynn & Bloom: Agent actions Agent Subject ratings prediction Model Subject ratings prediction Model
  • 51. Conclusions From scenes to stories… What contents of stories are routinely accessed through visual scenes? How can we represent that content for reasoning, communication, prediction and planning? Focus on core knowledge present in preverbal infants: intuitive physics, intuitive psychology. Representations using probabilistic programs: thick nodes (e.g. CAD++), thick arrows (physics, graphics, planning), recursive (inference about inference, goals about goals). Challenges for future work: (1) Integrating physics and psychology. (2) Efficient inference. (3) Learning.