Mit6870 orsu lecture11

684 views
486 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
684
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  •  A cognitive system has to categorize/recognize a large number of categories/objects
  •  A cognitive system has to categorize/recognize a large number of categories/objects
  • SECOND: Our model is inspired from generative Topic models which typically use a “bag of words” approximation, ignoring sentence structure. The reason for using these models is that, Topics models are relevant to our problem because they allows transfer of information within a corpus of related documents while the mixing proportions capture the distinctive features of particular documents Previous work on “bag of features” image models: Object Recognition (Sivic et. al., ICCV 2005) Scene Recognition (Fei-Fei et. al., CVPR 2005) In this graphical model, the filled circles are the observations, the empty circles are random variables, and the rounded squares are fixed model parameters.
  • The Global density defines all possible parts Each object category is defined by sampling the same set of shared parts with different weights. Every part defines a distribution over appearances and locations (relative to the object center) Each object instance is then obtained by sampling the feature appearances from each part. There is no context. The generative process is happy to produce impossible configurations of parts
  • Mit6870 orsu lecture11

    1. 1. Lecture 11 Hierarchies 6.870 Object Recognition and Scene Understanding http://people.csail.mit.edu/torralba/courses/6.870/6.870.recognition.htm
    2. 2. Next week Alec Rivers Scene Understanding Based on Object Relationships Gokberk Cinbis Category Level 3D Object Detection Using View-Invariant Representations Hueihan Jhuang and Sharat Chikkerur Video shot boundary detection using GIST representation Jenny Yuen Semiautomatic alignment of text and images Nathaniel R Twarog A Filtering Approach to Image Segmentation: Perceptual Grouping in Feature Space Nicolas Pinto Evaluating dense feature descriptor and multi-kernel learning for face detection/recognition Tilke Judd and Vladimir Bychkovsky Identify the same people in different photographs from the same event Tom Kollar Context-based object priors for scene understanding Tom Ouyang Hand-Drawn Sketch Recognition, A Vision-Based Approach Papers due this Friday (5pm): send PDF by email
    3. 3. Hierarchies vs. holistic features Although we have seen some “successful” holistic methods.
    4. 4. Hierarchies, compositionality and reusable parts <ul><li>Compositionality refers to our evident ability to construct hierarchical representations, whereby constituents are used and reused in an essentially infinite variety of relational compositions. </li></ul><ul><li>Assumption (Bienenstock, Geman): what is learnable is what is representable as a hierarchy of more-or-less simple composition rules. </li></ul>Bienenstock, Geman. Compositionality in neural systems.
    5. 5. Hierarchies vs. holistic features Feature hierarchies are often inspired by the structure of the primate visual system, which has been shown to use a hierarchy of features of increasing complexity, from simple local features in the primary visual cortex, to complex shapes and object views in higher cortical areas. S. Ullman et al.
    6. 6. Diagram of the visual system Felleman and Van Essen, 1991
    7. 7. Modified by T. Serre from Ungerleider and Haxby, and then shamelessly copied by me.
    8. 8. Modified by T. Serre from Ungerleider and Haxby, and then copied by me.
    9. 9. Modified by T. Serre from Ungerleider and Haxby, and then copied by me.
    10. 10. Modified by T. Serre from Ungerleider and Haxby, and then copied by me.
    11. 11. Modified by T. Serre from Ungerleider and Haxby, and then copied by me.
    12. 12. IT readout Slide by Serre
    13. 13. Identifying natural images from human brain activity ? Kay, K.N., Naselaris, T., Prenger, R.J., & Gallant, J.L. (2008). Identifying natural images from human brain activity. Nature, 452, 352-355.
    14. 15. Voxel Activity Model Goal : to predict the image seen by the observer out of a large collection of possible images. And to do this for new images: this requires predicting fMRI activity for unseen images. Kay, K.N., Naselaris, T., Prenger, R.J., & Gallant, J.L. (2008). Identifying natural images from human brain activity. Nature, 452, 352-355.
    15. 16. Kay, K.N., Naselaris, T., Prenger, R.J., & Gallant, J.L. (2008). Identifying natural images from human brain activity. Nature, 452, 352-355.
    16. 17. Performance Kay, K.N., Naselaris, T., Prenger, R.J., & Gallant, J.L. (2008). Identifying natural images from human brain activity. Nature, 452, 352-355.
    17. 18. D. Marr
    18. 19. Neocognitron Fukushima (1980). Hierarchical multilayered neural network S-cells work as feature-extracting cells. They resemble simple cells of the primary visual cortex in their response. C-cells , which resembles complex cells in the visual cortex, are inserted in the network to allow for positional errors in the features of the stimulus. The input connections of C-cells, which come from S-cells of the preceding layer, are fixed and invariable. Each C-cell receives excitatory input connections from a group of S-cells that extract the same feature, but from slightly different positions. The C-cell responds if at least one of these S-cells yield an output.
    19. 20. Neocognitron Learning is done greedily for each layer
    20. 21. Convolutional Neural Network The output neurons share all the intermediate levels Le Cun et al, 98
    21. 22. Hierarchical models of object recognition in cortex Hierarchical extension of the classical paradigm of building complex cells from simple cells. Uses same notation than Fukushima: “S” units performing template matching, solid lines and “C” units performing non-linear operations ( “MAX” operation, dashed lines) Riesenhuber, M. and Poggio, T. 99
    22. 23. Slide by T. Serre
    23. 24. Slide by T. Serre
    24. 31. Learning a Compositional Hierarchy of Object Structure Fidler & Leonardis, CVPR’07; Fidler, Boben & Leonardis, CVPR 2008 The architecture Parts model Learned parts
    25. 32. Learning a Compositional Hierarchy of Object Structure Fidler & Leonardis, CVPR’07; Fidler, Boben & Leonardis, CVPR 2008
    26. 33. Learning a Compositional Hierarchy of Object Structure <ul><li>Fidler & Leonardis, CVPR’07 </li></ul><ul><li>Fidler, Boben & Leonardis, CVPR 2008 </li></ul>Layer 2 Layer 3 Layer 4 Layer 1 LEARN hierarchical library car motorcycle dog person <ul><li>Hierarchical compositional architecture </li></ul><ul><li>Features are shared at each layer </li></ul><ul><li>Learning is done on natural images </li></ul><ul><li>Indexing and matching detection scheme </li></ul>Learned L1 – L3 Learned hierarchical vocabulary Detections
    27. 34. Learning a Compositional Hierarchy of Object Structure <ul><li>Fidler & Leonardis, CVPR’07 </li></ul><ul><li>Fidler, Boben & Leonardis, CVPR 2008 </li></ul>Layer 2 Layer 3 Layer 4 Layer 1 LEARN hierarchical library car motorcycle dog person Learned hierarchical vocabulary Detections <ul><li>Hierarchical compositional architecture </li></ul><ul><li>Features are shared at each layer </li></ul><ul><li>Learning is done on natural images </li></ul><ul><li>Biologically plausible? </li></ul><ul><li>Learns T- and L- junctions, different curvatures, and features that gradually increase in complexity </li></ul>
    28. 35. Hierarchical Topic Models   z x J N K Latent Dirichlet Allocation (LDA) Blei, Ng, & Jordan, JMLR 2003 Pr(topic | doc) Pr(word | topic)  “ bag of features” models: Object Recognition (Sivic et. al., ICCV 2005) Scene Recognition (Fei-Fei et. al., CVPR 2005)
    29. 36. HDP Object Model <ul><li>We learn the number of parts. </li></ul><ul><li>Each object uses a different number of parts. </li></ul><ul><li>The model assumes a known number of object categories. </li></ul>Sudderth et al. IJCV 2008 Parts are distributions over appearances and locations

    ×