SlideShare a Scribd company logo
1 of 33
Visual Representation

 Song-Chun Zhu




The Frontiers of Vision Workshop, August 20-23, 2011
Marr’s observation: studying vision at 3 levels




                               Visual                               Implementation
                                                       Algorithms
                           Representations




The Frontiers of Vision Workshop, August 20-23, 2011
Representing Two Types of Visual Knowledge

Axiomatic visual knowledge:                ---- for parsing
    e.g.    A face has two eyes
            Vehicle has four subtypes: Sedan, hunchback, Van, and SUV.



Domain visual knowledge: :                  ---- for reasoning
     e.g.   Room G403 has three chairs, a table, ….
            Sarah ate cornflake with milk as breakfast, …
            A Volve x90 parked in lot 3 during 1:30-2:30pm, …
Issues

1, General vs. task-specific representations
  ---- Do we need levels of abstraction between features and categories;
    An observation: popular research in the past decade were mostly task-specific,
                    and had a big setback to general vision.

2, What are the math principles/requirements for a general representation?
   ---- Why are grammar and logic back to vision?



3, Challenge: unsupervised learning of hierarchical representations
   ---- How do we evaluate a representation, especially for unsupervised learning.
Deciphering Marr’s message
The backbone for general vision
                                             where

                   Scaling        2.1D       2.5D           3D
Texture                           Sketch     Sketch        Sketch

               Primal Sketch

Texton
(primitives)
                                  HiS      Parts     Objects   Scenes
                                               what


In video, one augments it with event and causality.
cluster
                                      centers
                                                instances in each cluster

                                 1                                          sky, wall, floor


Textures and textons in images   2
                                                                            dry wall,
                                                                            ceiling
                                                                            carpet, ceiling,
                                 3
                                                                            thick clouds

                                 4                                          step edge
 texture clusters (blue)
                                                                            concrete floor,
 primitive clusters (pink).      5
                                                                            wood, wall

                                 6                                          L-junction

    0.2                          7                                          ridge/bar


   0.18                          8                                          carpet, wall
                                                                            L-junction
   0.16                          9                                          centered at
                                                                            165°

   0.14                          10                                         water


                                 11                                         lawn grass
   0.12
                                 12                                         terminator
    0.1
                                                                            wild grass,
                                 13
   0.08                                                                     roof
                                                                            L-junction
                                 14
                                                                            at 130°
   0.06
                                                                            plants from far
                                 15
                                                                            distance
   0.04
                                 16                                         sand
   0.02
                                                                            close-up of
                                 17
                                                                            concrete
     0
                                 18                                         wood grain

                                                                            L-junction
                                 19
                                                                            at 90°
Zhu, Shi and Si, 2009            20                                         Y-junction
Primal sketch: a token representation conjectured by Marr




    org image      sketching pursuit process         sketches




                  =                            +


    syn image         synthesized textures         sketch image
Mathematics branches for visual representation

                     regimes of representations / models
Reasoning                                       Logics
                                (common sense, domain knowledge)

Cognition
                                    Stochastic grammar
                                              partonomy,
                                              taxonomy,
                                               relations

Recognition




                          Sparse coding           Markov, Gibbs Fields
                          (low-D manifolds,                (hi-D manifolds,
 Coding/Processing
                               textons)                        textures)
Schedule

Erik Learned-Miller     Low level Vision
Benjamin Kimia          Middle level vision
Alex Berg               middle level Vision
Derek Hoiem             high level vision
Song-Chun Zhu           Conceptualization by Stochastic sets
Pedro Felzenszwalb      Grammar for objects
Sinisa Todorovic        Probabilistic 1st Order Logic
Trevor Darrel
Josh Tenenbaum          Probabilistic programs
Discussion 25 minutes
Visual Conceptualization with Stochastic Sets


 Song-Chun Zhu




The Frontiers of Vision Workshop, August 20-23, 2011.
Cognition: how do we represent a concept ?

 In Mathematics and logic, concepts are equal to deterministic sets, e.g.
Cantor, Boole, or spaces in continuous domain, and their compositions
through the “and”, “or”, and “negation” operators.




  But the world to us is fundamentally stochastic !

  Especially, in the image domain !


  Ref. [1] D. Mumford. The Dawning of the Age of Stochasticity. 2000.
        [2] E. Jaynes. Probability Theory: the Logic of Science. Cambridge University Press, 2003.
Stochastic sets in the image space
How do we define concepts as sets of image/video:
e.g. noun concepts: human face, willow tree, vehicle ?
     verbal concept: opening a door, making coffee ?



                     image space
                                                  What are the characteristics of such sets ?



        A point is an image or a video clip


Observation: This is the symbol grounding problem in AI.
1, Stochastic set in statistical physics
Statistical physics studies macroscopic properties of systems
that consist of massive elements with microscopic interactions.
 e.g.: a tank of insulated gas or ferro-magnetic material
           N = 1023
                              A state of the system is specified by the position of the
                              N elements XN and their momenta pN

                                           S = (xN, pN)
                               But we only care about some global properties
                                 Energy E, Volume V, Pressure, ….
  Micro-canonical Ensemble


Micro-canonical Ensemble =             N, E, V) = { s : h(S) = (N, E, V) }
It took us 30-years to transfer this theory to vision

        a texture           (h c ) { I : h i (I)      h c,i , i 1,2,..., K }

                                 hc are histograms of Gabor filters




                   Iobs             Isyn ~    h k=0        Isyn ~   h k=1




          Isyn ~    h k=3         Isyn ~     h k=4        Isyn ~    h k=7

                                                                    (Zhu,Wu, Mumford 97,99,00)
Equivalence of deterministic set and probabilistic models

                                                                                                          Gibbs 1902,
                                                                                                          Wu and Zhu, 2000
                                                                           Z2
Theorem 1
  For a very large image from the texture ensemble I ~ f (I ; hc ) any
  local patch of the image I given its neighborhood follows a conditional
  distribution specified by a FRAME/MRF model p(I | I : β)
Theorem 2
   As the image lattice goes to infinity, f (I ; h ) is the limit of the
                                                  c
   FRAME model p(I | I : β) , in the absence of phase transition.
                                                                            k
                                                        1
                          p(I | I          ; β)                 exp {            β j hj ( I   |I )}
                                                      z(    )              j 1


    Ref. Y. N. Wu, S. C. Zhu, “Equivalence of Julesz Ensemble and FRAME models,” Int’l J. Computer Vision, 38(3), 247-265, July, 2000
2, Stochastic set from sparse coding (origin: harmonic analysis)
Learning an over-complete image basis from natural images
  I=          i     i      i+   n
                                                                           (Olshausen and Fields, 1995-97)




  Textons




.
B. Olshausen and D. Fields, “Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1?” Vision Research, 37: 3311-25, 1997.
S.C. Zhu, C. E. Guo, Y.Z. Wang, and Z.J. Xu, “What are Textons?” Int'l J. of Computer Vision, vol.62(1/2), 121-143, 2005.
Lower dimensional sets or subspaces
               a texton   (h c ) { I : I          i   i   , ||   || 0 k }
                                             i

                                   K is far smaller than the dimension
                                   of the image space.
                                      is a basis function
                                    from a dictionary.
A second look at the space of images


               implicit manifolds




                             image space

                 +
                     +
                                     explicit manifolds
                         +
Two regimes of stochastic sets

I call them
 the implicit vs. explicit manifolds
Supplementary: continuous spectrum of entropy pattern




                   Scaling (zoom-out) increases the image entropy (dimensions)



     Where are the HoG, SIFT, LBP good at?

Ref: Y.N. Wu, C.E. Guo, and S.C. Zhu, “From Information Scaling of Natural Images to Regimes of Statistical Models,”
     Quarterly of Applied Mathematics, 2007.
3, Stochastic sets by And-Or composition (Grammar)

                                          A                         Or-node
 A ::= aB | a | aBc

A production rule
can be represented by          A1         A2            A3         And-nodes
an And-Or tree



                                    B1                   B2        Or-nodes


                          a1              a2       a3         c   terminal nodes

The language is the set of all valid configurations derived from a note A.
                L( A) { ( , p( )) : A    R*    }
And-Or graph, parse graphs, and configurations




   Zhu and Mumford, 2006
Each category is conceptualized to a grammar whose language
defines a set or “equivalence class” of all valid configurations

How does the space of a compositional set look like?

      Union space (OR)



                                              f
                                     a            g        d
                         b                        e
                a                        Product space (AND)
                                 d
                         c
                             e

                                                               By Zhangzhang Si 2011
Spatial-AoG for objects:   Example on human figures




                              Rothrock and Zhu, 2011
Appearance model for terminals,   learned from images



Grounding the symbols
Synthesis (Computer Dream) by sampling the S-AoG




                                          Rothrock and Zhu, 2011
Spatial-AoG for scene:   Example on indoor scene configurations

                                                   3D reconstruction




                           Results on the UCLA dataset




                                              Zhao and Zhu, 2011
Temporal AoG for action / events




Ref. M. Pei and S.C. Zhu, “Parsing Video Events with Goal inference and Intent Prediction,” ICCV, 2011.
Causal-AOG:                                                                                        learned from from video events


Causality between actions and fluent changes.



    Fluent                                                                    fluent
    Fluent Transit
    Action
    Action                        Door fluent                                 Light fluent                                Screen fluent


                 close                          open                 on                      off                off                      on


        A                A               A             A         A        A          A             A        A         A         A              A
        0                1               0             2         0        3          0             4        0         5         0              6


   a0       a1                      a0                 a1   a0       a6         a0            a7       a0        a1        a0       a8    a9       a1
                                                                                                                  0                                0


                  a2         a3       a4     a5   a3
Summary: Visual representations
Axiomatic knowledge: textures, textons + S/T/C And-Or graphs
Domain Knowledge: parse graphs
        Temporal




                          [ parse graphs]




                                            Spatial
Capacity and learnability of the stochastic sets
 1, Structures of the image space
 2, Structures and capacity of the model (hypothesis) space
 3, Learnability of the concepts
        Image space                          Representation space




                                              H              H(smp(m))
                                                     f
             f                                           p
                 p
Learning = Pursuing stochastic sets in the image universe

 f : target distribution;   p: our model; q: initial model



                                                             image universe:
                                                             every point is an image.


1, q = unif()
2, q =




                               model ~ image set ~ manifold ~ cluster
A unified foundation for visual knowledge representation

              regimes of representations / models
Reasoning                                Logics
                         (common sense, domain knowledge)

Cognition
                             Stochastic grammar
                                       partonomy,
                                       taxonomy,
                                        relations

Recognition




                   Sparse coding           Markov, Gibbs Fields
                   (low-D manifolds,                (hi-D manifolds,
 Coding
                        textons)                        textures)

More Related Content

More from zukun

Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featureszukun
 
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...zukun
 
Quoc le tera-scale deep learning
Quoc le   tera-scale deep learningQuoc le   tera-scale deep learning
Quoc le tera-scale deep learningzukun
 
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...zukun
 
Lecun 20060816-ciar-01-energy based learning
Lecun 20060816-ciar-01-energy based learningLecun 20060816-ciar-01-energy based learning
Lecun 20060816-ciar-01-energy based learningzukun
 
Lecun 20060816-ciar-02-deep learning for generic object recognition
Lecun 20060816-ciar-02-deep learning for generic object recognitionLecun 20060816-ciar-02-deep learning for generic object recognition
Lecun 20060816-ciar-02-deep learning for generic object recognitionzukun
 

More from zukun (20)

Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
 
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Act...
 
Quoc le tera-scale deep learning
Quoc le   tera-scale deep learningQuoc le   tera-scale deep learning
Quoc le tera-scale deep learning
 
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
Deep Learning workshop 2010: Deep Learning of Invariant Spatiotemporal Featur...
 
Lecun 20060816-ciar-01-energy based learning
Lecun 20060816-ciar-01-energy based learningLecun 20060816-ciar-01-energy based learning
Lecun 20060816-ciar-01-energy based learning
 
Lecun 20060816-ciar-02-deep learning for generic object recognition
Lecun 20060816-ciar-02-deep learning for generic object recognitionLecun 20060816-ciar-02-deep learning for generic object recognition
Lecun 20060816-ciar-02-deep learning for generic object recognition
 

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Fcv rep zhu

  • 1. Visual Representation Song-Chun Zhu The Frontiers of Vision Workshop, August 20-23, 2011
  • 2. Marr’s observation: studying vision at 3 levels Visual Implementation Algorithms Representations The Frontiers of Vision Workshop, August 20-23, 2011
  • 3. Representing Two Types of Visual Knowledge Axiomatic visual knowledge: ---- for parsing e.g. A face has two eyes Vehicle has four subtypes: Sedan, hunchback, Van, and SUV. Domain visual knowledge: : ---- for reasoning e.g. Room G403 has three chairs, a table, …. Sarah ate cornflake with milk as breakfast, … A Volve x90 parked in lot 3 during 1:30-2:30pm, …
  • 4. Issues 1, General vs. task-specific representations ---- Do we need levels of abstraction between features and categories; An observation: popular research in the past decade were mostly task-specific, and had a big setback to general vision. 2, What are the math principles/requirements for a general representation? ---- Why are grammar and logic back to vision? 3, Challenge: unsupervised learning of hierarchical representations ---- How do we evaluate a representation, especially for unsupervised learning.
  • 5. Deciphering Marr’s message The backbone for general vision where Scaling 2.1D 2.5D 3D Texture Sketch Sketch Sketch Primal Sketch Texton (primitives) HiS Parts Objects Scenes what In video, one augments it with event and causality.
  • 6. cluster centers instances in each cluster 1 sky, wall, floor Textures and textons in images 2 dry wall, ceiling carpet, ceiling, 3 thick clouds 4 step edge texture clusters (blue) concrete floor, primitive clusters (pink). 5 wood, wall 6 L-junction 0.2 7 ridge/bar 0.18 8 carpet, wall L-junction 0.16 9 centered at 165° 0.14 10 water 11 lawn grass 0.12 12 terminator 0.1 wild grass, 13 0.08 roof L-junction 14 at 130° 0.06 plants from far 15 distance 0.04 16 sand 0.02 close-up of 17 concrete 0 18 wood grain L-junction 19 at 90° Zhu, Shi and Si, 2009 20 Y-junction
  • 7. Primal sketch: a token representation conjectured by Marr org image sketching pursuit process sketches = + syn image synthesized textures sketch image
  • 8. Mathematics branches for visual representation regimes of representations / models Reasoning Logics (common sense, domain knowledge) Cognition Stochastic grammar partonomy, taxonomy, relations Recognition Sparse coding Markov, Gibbs Fields (low-D manifolds, (hi-D manifolds, Coding/Processing textons) textures)
  • 9. Schedule Erik Learned-Miller Low level Vision Benjamin Kimia Middle level vision Alex Berg middle level Vision Derek Hoiem high level vision Song-Chun Zhu Conceptualization by Stochastic sets Pedro Felzenszwalb Grammar for objects Sinisa Todorovic Probabilistic 1st Order Logic Trevor Darrel Josh Tenenbaum Probabilistic programs Discussion 25 minutes
  • 10. Visual Conceptualization with Stochastic Sets Song-Chun Zhu The Frontiers of Vision Workshop, August 20-23, 2011.
  • 11. Cognition: how do we represent a concept ? In Mathematics and logic, concepts are equal to deterministic sets, e.g. Cantor, Boole, or spaces in continuous domain, and their compositions through the “and”, “or”, and “negation” operators. But the world to us is fundamentally stochastic ! Especially, in the image domain ! Ref. [1] D. Mumford. The Dawning of the Age of Stochasticity. 2000. [2] E. Jaynes. Probability Theory: the Logic of Science. Cambridge University Press, 2003.
  • 12. Stochastic sets in the image space How do we define concepts as sets of image/video: e.g. noun concepts: human face, willow tree, vehicle ? verbal concept: opening a door, making coffee ? image space What are the characteristics of such sets ? A point is an image or a video clip Observation: This is the symbol grounding problem in AI.
  • 13. 1, Stochastic set in statistical physics Statistical physics studies macroscopic properties of systems that consist of massive elements with microscopic interactions. e.g.: a tank of insulated gas or ferro-magnetic material N = 1023 A state of the system is specified by the position of the N elements XN and their momenta pN S = (xN, pN) But we only care about some global properties Energy E, Volume V, Pressure, …. Micro-canonical Ensemble Micro-canonical Ensemble = N, E, V) = { s : h(S) = (N, E, V) }
  • 14. It took us 30-years to transfer this theory to vision a texture (h c ) { I : h i (I) h c,i , i 1,2,..., K } hc are histograms of Gabor filters Iobs Isyn ~ h k=0 Isyn ~ h k=1 Isyn ~ h k=3 Isyn ~ h k=4 Isyn ~ h k=7 (Zhu,Wu, Mumford 97,99,00)
  • 15. Equivalence of deterministic set and probabilistic models Gibbs 1902, Wu and Zhu, 2000 Z2 Theorem 1 For a very large image from the texture ensemble I ~ f (I ; hc ) any local patch of the image I given its neighborhood follows a conditional distribution specified by a FRAME/MRF model p(I | I : β) Theorem 2 As the image lattice goes to infinity, f (I ; h ) is the limit of the c FRAME model p(I | I : β) , in the absence of phase transition. k 1 p(I | I ; β) exp { β j hj ( I |I )} z( ) j 1 Ref. Y. N. Wu, S. C. Zhu, “Equivalence of Julesz Ensemble and FRAME models,” Int’l J. Computer Vision, 38(3), 247-265, July, 2000
  • 16. 2, Stochastic set from sparse coding (origin: harmonic analysis) Learning an over-complete image basis from natural images I= i i i+ n (Olshausen and Fields, 1995-97) Textons . B. Olshausen and D. Fields, “Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1?” Vision Research, 37: 3311-25, 1997. S.C. Zhu, C. E. Guo, Y.Z. Wang, and Z.J. Xu, “What are Textons?” Int'l J. of Computer Vision, vol.62(1/2), 121-143, 2005.
  • 17. Lower dimensional sets or subspaces a texton (h c ) { I : I i i , || || 0 k } i K is far smaller than the dimension of the image space. is a basis function from a dictionary.
  • 18. A second look at the space of images implicit manifolds image space + + explicit manifolds +
  • 19. Two regimes of stochastic sets I call them the implicit vs. explicit manifolds
  • 20. Supplementary: continuous spectrum of entropy pattern Scaling (zoom-out) increases the image entropy (dimensions) Where are the HoG, SIFT, LBP good at? Ref: Y.N. Wu, C.E. Guo, and S.C. Zhu, “From Information Scaling of Natural Images to Regimes of Statistical Models,” Quarterly of Applied Mathematics, 2007.
  • 21. 3, Stochastic sets by And-Or composition (Grammar) A Or-node A ::= aB | a | aBc A production rule can be represented by A1 A2 A3 And-nodes an And-Or tree B1 B2 Or-nodes a1 a2 a3 c terminal nodes The language is the set of all valid configurations derived from a note A. L( A) { ( , p( )) : A R* }
  • 22. And-Or graph, parse graphs, and configurations Zhu and Mumford, 2006
  • 23. Each category is conceptualized to a grammar whose language defines a set or “equivalence class” of all valid configurations How does the space of a compositional set look like? Union space (OR) f a g d b e a Product space (AND) d c e By Zhangzhang Si 2011
  • 24. Spatial-AoG for objects: Example on human figures Rothrock and Zhu, 2011
  • 25. Appearance model for terminals, learned from images Grounding the symbols
  • 26. Synthesis (Computer Dream) by sampling the S-AoG Rothrock and Zhu, 2011
  • 27. Spatial-AoG for scene: Example on indoor scene configurations 3D reconstruction Results on the UCLA dataset Zhao and Zhu, 2011
  • 28. Temporal AoG for action / events Ref. M. Pei and S.C. Zhu, “Parsing Video Events with Goal inference and Intent Prediction,” ICCV, 2011.
  • 29. Causal-AOG: learned from from video events Causality between actions and fluent changes. Fluent fluent Fluent Transit Action Action Door fluent Light fluent Screen fluent close open on off off on A A A A A A A A A A A A 0 1 0 2 0 3 0 4 0 5 0 6 a0 a1 a0 a1 a0 a6 a0 a7 a0 a1 a0 a8 a9 a1 0 0 a2 a3 a4 a5 a3
  • 30. Summary: Visual representations Axiomatic knowledge: textures, textons + S/T/C And-Or graphs Domain Knowledge: parse graphs Temporal [ parse graphs] Spatial
  • 31. Capacity and learnability of the stochastic sets 1, Structures of the image space 2, Structures and capacity of the model (hypothesis) space 3, Learnability of the concepts Image space Representation space H H(smp(m)) f f p p
  • 32. Learning = Pursuing stochastic sets in the image universe f : target distribution; p: our model; q: initial model image universe: every point is an image. 1, q = unif() 2, q = model ~ image set ~ manifold ~ cluster
  • 33. A unified foundation for visual knowledge representation regimes of representations / models Reasoning Logics (common sense, domain knowledge) Cognition Stochastic grammar partonomy, taxonomy, relations Recognition Sparse coding Markov, Gibbs Fields (low-D manifolds, (hi-D manifolds, Coding textons) textures)