Fcv bio cv_poggio


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Fcv bio cv_poggio

  1. 1. The  computa,onal  magic  of  the   ventral  stream tomaso poggio McGovern Institute, I2, CBCL,BCS, CSAIL MIT (with Jim Mutch, Joel Leibo, Lorenzo Rosasco) A  dream:   I  have  a  theory  of  what  the   ventral  stream  does  and  how;  it   can  explain  why  evolu:on  chose   a  hierarchical  architecture  and   how  computa:ons  determine   proper:es  of  cells  specific  for   each  visual  area. Very preliminary, provocative, probably completely wrong theory, but if true... what else do we want from computational neuroscience?  Monday, August 29, 2011
  2. 2. Machine  Learning  +  Vision  @CBCL   Poggio, T. and F. Girosi. Networks for Approximation and Learning, Proceedings of the IEEE 1990) also Science, 1990 LEARNING THEORY + Mathematics Poggio, T. and S.Smale. , Notices American Mathematical Society (AMS), 2003 ALGORITHMS Poggio, T., R. Rifkin, S. Mukherjee and P. Niyogi. General Conditions for Predictivity in Learning Theory, Nature, 2004 Beymer, D. and T. Poggio. , Science, 272., 1905-1909, 1996 Brunelli, R. and T. Poggio. Face Recognition: Engineering Features Versus Templates, IEEE PAMI, 1993 ENGINEERING APPLICATIONS Ezzat, T., G. Geiger and T. Poggio. “Trainable Vi d e o r e a l i s t i c S p e e c h A n i m a t i o n , ” A C M SIGGRAPH 2002 Freedman, D.J., M. Riesenhuber, T. Poggio and E.K. Miller. , Science, 291, 312-316, 2001. Riesenhuber, M. and T. Poggio. COMPUTATIONAL NEUROSCIENCE: models+experiments Science Neuroscience, 2, 1019-1025, 1999. Serre, T., A. Oliva and T. Poggio. , Nature , (PNAS), Vol. 104, No. 15, 6424-6429, 2007. Poggio, T. and E. Bizzi. , Nature, Vol. 431, 768-774, 2004.Monday, August 29, 2011
  3. 3. Vision:   what  is  where  VisionA Computational Investigation into the Human Representation and Processing of Visual InformationDavid MarrForeword by Shimon UllmanAfterword by Tomaso PoggioDavid Marrs posthumously published Vision (1982) influenced a generation of brain and cognitive scientists, inspiring many toenter the field. In Vision, Marr describes a general framework for understanding visual perception and touches on broaderquestions about how the brain and its functions can be studied and understood. Researchers from a range of brain andcognitive sciences have long valued Marrs creativity, intellectual power, and ability to integrate insights and data fromneuroscience, psychology, and computation. This MIT Press edition makes Marrs influential work available to a new generationof students and scientists.In Marrs framework, the process of vision constructs a set of representations, starting from a description of the input imageand culminating with a description of three-dimensional objects in the surrounding environment. A central theme, and one thathas had far-reaching influence in both neuroscience and cognitive science, is the notion of different levels of analysis—in Marrsframework, the computational level, the algorithmic level, and the hardware implementation level.Now, thirty years later, the main problems that occupied Marr remain fundamental open problems in the study of perception.Vision provides inspiration for the continuiMonday, August 29, 2011
  4. 4. The  ventral  stream Movshon et al. ventral stream Desimone & Ungerleider 1989Monday, August 29, 2011
  5. 5.    Recogni9on  in  the  Ventral  Stream:   classical,  “standard”  model  (of  immediate  recogni9on) *Modified from (Gross, 1998) [software available online Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu with CNS (for GPUs)] Knoblich Kreiman & Poggio 2005; Serre Oliva Poggio 2007Monday, August 29, 2011
  6. 6.    Model  “works”: it  accounts  for  bits  of  physiology  +  psychophysics Hierarchical  Feedforward  Models: is  consistent  with  or  predict    neural  data V1: Simple and complex cells tuning (Schiller et al 1976; Hubel & Wiesel 1965; Devalois et al 1982) MAX-like operation in subset of complex cells (Lampl et al 2004) V2: Subunits and their tuning (Anzai, Peng, Van Essen 2007) V4: Tuning for two-bar stimuli (Reynolds Chelazzi & Desimone 1999) MAX-like operation (Gawne et al 2002) Two-spot interaction (Freiwald et al 2005) Tuning for boundary conformation (Pasupathy & Connor 2001, Cadieu, Kouh, Connor et al., 2007) Tuning for Cartesian and non-Cartesian gratings (Gallant et al 1996) IT: Tuning and invariance properties (Logothetis et al 1995, paperclip objects) Differential role of IT and PFC in categorization (Freedman et al 2001, 2002, 2003) Read out results (Hung Kreiman Poggio & DiCarlo 2005) Pseudo-average effect in IT (Zoccolan Cox & DiCarlo 2005; Zoccolan Kouh Poggio & DiCarlo 2007) Human: Rapid categorization (Serre Oliva Poggio 2007) Face processing (fMRI + psychophysics) (Riesenhuber et al 2004; Jiang et al 2006)Monday, August 29, 2011
  7. 7.    Model  “works”: it  performs  well  at  computa9onal  level Models of the ventral stream in cortex perform well compared to engineered computer vision systems (in 2006) on several databases Bileschi, Wolf, Serre, Poggio, 2007Monday, August 29, 2011
  8. 8.    Model  “works”: it  performs  well  at  computa9onal  level Models of the ventral stream in cortex perform well compared to engineered computer vision systems (in 2006) on several databases Bileschi, Wolf, Serre, Poggio, 2007Monday, August 29, 2011
  9. 9.    Model  “works”: it  performs  well  at  computa9onal  level Models of the ventral stream in cortex perform well compared to engineered computer vision systems (in 2006) on several databases Bileschi, Wolf, Serre, Poggio, 2007Monday, August 29, 2011
  10. 10.    Recogni9on  in  Visual  Cortex:   computa9on  and  mathema9cal  theory For 10years+... I did not manage to understand how model works.... we need a theory -- not only a model!Monday, August 29, 2011
  11. 11. Why do hierarchical architectures work? If  features  do  not  ma.er....what  does?  Monday, August 29, 2011
  12. 12. Monday, August 29, 2011
  13. 13.   What  is  the  main  difficulty  of  object  recogni7on? Geometric  transforma7ons  or  intraclass  variability? • We have “factored out” viewpoint, scale, translation: is it now still “difficult” for a classifier to categorize dogs vs horses? Leibo, 2011Monday, August 29, 2011
  14. 14. Are  transforma7ons  the  main  difficulty for  biological  object  recogni7on? Leibo, 2011Monday, August 29, 2011
  15. 15. 6 main ideas, steps in the theory1. The computational goal of the ventral stream is to learn transformations (affine in R^2) during development and be invariant to them (McCulloch+Pitts 1945, Hoffman 1966, Jack Gallant, 1993,...)2. A memory-based module, similar to simple-complex cells, can learn from a video of an image transforming (eg translating) to provide a signature to any single image of any object that is invariant under the transformation (Invariance Lemma)3. Since the group Aff(2,R) can be factorized as a semidirect product of translations and GL(2, R), storage requirements of the memory-based module can be reduced by order of magnitudes (Factorization Theorem). I argue that this is the evolutionary reason for hierarchical architecture in the ventral stream.Monday, August 29, 2011
  16. 16. 6 main ideas, steps in the theory4. Evolution selects translations by using small apertures in the first layer (the Stratification Theorem). Size of receptive fields in the sequence of ventral stream area automatically selects specific transformations (translations, scaling, shear and rotations).5. If Hebb-like rule active at at synapses then the storage requirements can be reduced further: tuning of cell in each layer -- during development -- will mimic the spectrum (SVD) of stored templates (Linking Theorem).6. At the top layers invariances learned for “places” and for class- specific transformations such changes of expression of a face or rotation in depth of a face or different poses of a body ===> class-specific patchesMonday, August 29, 2011
  17. 17. StratificationMonday, August 29, 2011
  18. 18. SVD  (of  templatebook)  for  (x)  transla=ons  from natural  images   Jim  MutchMonday, August 29, 2011
  19. 19. SVD  (of  templatebook)  for  (y)  transla=ons   noise Jim  MutchMonday, August 29, 2011
  20. 20. These are predicted features orthogonal to trajectories of Lie group: for translation (x, y), expansion, rotation V1 V2/V4? This is from Hoffman, 1966 cited by Jack GallantMonday, August 29, 2011
  21. 21. Implications • Image  prior,  shape  features,  natural  images  priors  are  “irrelevant” • The   type   of  transforma@on  that  are  learned  from  visual  experience   depend  on  the  size  (measured  in  terms  of   wavelength)  and  thus  on   the   area   (layer   in   the   models)   -­‐-­‐   assuming   that   the   aperture   size   increases  with  layers     • The  mix  of  transforma@ons  learned  determine  the  proper@es  of  the     recep@ve  fields   -­‐-­‐   oriented  bars   in  V1+V2,  radial  and  spiral  paLerns   in  V4  up  to  class  specific  tuning  in  AIT  (eg  face  tuned  cells)Monday, August 29, 2011
  22. 22. Some predictions • Invariance to small transformations in early areas (eg translations in V1) may underly stability of visual perception (suggested by Stu Geman) • Each cells tuning properties are shaped by visual experience of image transformations during developmental and adult plasticity • Simple cells are likely to be the same population as complex cells, arising from different convergence of the Hebbian learning rule. The input to complex ``complex cells are dendritic branches with simple cell properties • Class-specific transformations are learned and represented at the top of the ventral stream hierarchy; thus class-specific modules -- such as faces, places and possibly body areas -- should exist in IT • The type of transformations that are learned from visual experience depend on the size of the receptive fields and thus on the area (layer in the models) -- assuming that the size increases with layers • The mix of transformations learned in each area influences the tuning properties of the cells -- oriented bars in V1+V2, radial and spiral patterns in V4 up to class specific tuning in AIT (eg face tuned cells).Monday, August 29, 2011
  23. 23. Collaborators  in  recent  work J.  Leibo,  J.  Mutch,  L.  Isik,   S.  Ullman,  S.  Smale,  L.  Rosasco,   H.  Jhuang,  C.  Tan,    N.  Edelman,  E.  Meyers,  B.  Desimone O  Lewis,  S  Voinea Also:    M.  Riesenhuber,  T.  Serre,  S.  Chikkerur,  A.  Wibisono,  J.  Bouvrie,  M.  Kouh,      J.  DiCarlo,  E.   Miller,    C.  Cadieu,  A.  Oliva,  C.  Koch,    A.  CaponneLo  ,D.    Walther,      U.  Knoblich,    T.  Masquelier,   S.  Bileschi,    L.  Wolf,  E.  Connor,  D.  Ferster,  I.  Lampl,  S.  Chikkerur,  G.  Kreiman,  N.  Logothe@s Jim  MutchMonday, August 29, 2011