Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mechanisms of bottom-up and top-down processing in visual perception

15,078 views

Published on

This is a talk given in April 2009 in the Redwood center at UC Berkeley.

Published in: Technology
  • As a management instructor I enjoy viewing the function of others. This is among the greatest demonstration on planning I have viewed.
    Sharika
    http://winkhealth.com http://financewink.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • pretty impressive work. Very useful. Thanks
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Mechanisms of bottom-up and top-down processing in visual perception

  1. 1. Mechanisms of bottom-up and top-down processing in visual perception Thomas Serre McGovern Institute for Brain Research Department of Brain & Cognitive Sciences Massachusetts Institute of Technology
  2. 2. The problem: recognition in natural scenes
  3. 3. Rapid recognition: human behavior Potter 1971, 1975 see also Biederman 1972; Thorpe 1996 movie courtesy of Jim DiCarlo
  4. 4. Rapid recognition: human behavior Potter 1971, 1975 see also Biederman 1972; Thorpe 1996 movie courtesy of Jim DiCarlo
  5. 5. Rapid recognition: human behavior Gist of the scene at 7 images/s from unpredictable random sequence of images No time for eye movements No top-down / expectations Potter 1971, 1975 see also Biederman 1972; Thorpe 1996 movie courtesy of Jim DiCarlo
  6. 6. Rapid recognition: human behavior Gist of the scene at 7 images/s from unpredictable random sequence of images No time for eye movements No top-down / expectations Feedforward processing: Coarse / base image representation Potter 1971, 1975 see also Biederman 1972; Thorpe 1996 movie courtesy of Jim DiCarlo
  7. 7. Outline 1.Rapid recognition and feedforward processing: Loose hierarchy of image fragments “Clutter problem”
  8. 8. Outline 1.Rapid recognition and feedforward processing: Loose hierarchy of image fragments “Clutter problem”
  9. 9. Outline 1.Rapid recognition and feedforward processing: Loose hierarchy of image fragments “Clutter problem”
  10. 10. Outline 1.Rapid recognition and feedforward processing: Loose hierarchy of image fragments “Clutter problem” 2.Beyond feedforward processing: X X Top-down cortical feedback and attention to solve the “clutter problem” XX Predicting human eye movements
  11. 11. Outline 1.Rapid recognition and feedforward processing: Loose hierarchy of image fragments “Clutter problem” 2.Beyond feedforward processing: Top-down cortical feedback and attention to solve the “clutter problem” Predicting human eye movements
  12. 12. Object recognition in the visual cortex source: Jim DiCarlo
  13. 13. Object recognition in the visual cortex Ventral visual stream source: Jim DiCarlo
  14. 14. Object recognition in the visual cortex Hierarchical architecture: Ventral visual stream source: Jim DiCarlo
  15. 15. Object recognition in the visual cortex Hierarchical architecture: Latencies Ventral visual stream source: Jim DiCarlo
  16. 16. Object recognition in the visual cortex Hierarchical architecture: Latencies Ventral visual stream Anatomy source: Jim DiCarlo
  17. 17. Object recognition in the visual cortex Hierarchical architecture: Latencies Ventral visual stream Anatomy Function source: Jim DiCarlo
  18. 18. Object recognition in the visual cortex Nobel prize 1981 Hubel & Wiesel 1959, 1962, 1965, 1968
  19. 19. Object recognition in the visual cortex gradual increase in complexity of preferred stimulus Kobatake & Tanaka 1994 see also Oram & Perrett 1993; Sheinberg & Logothetis 1996; Gallant et al 1996; Riesenhuber & Poggio 1999
  20. 20. Object recognition in the visual cortex Parallel increase in invariance properties (position and scale) of neurons Kobatake & Tanaka 1994 see also Oram & Perrett 1993; Sheinberg & Logothetis 1996; Gallant et al 1996; Riesenhuber & Poggio 1999
  21. 21. Model RF sizes Num. layers units Animal vs. Prefrontal 11, task-dependent learning classification 8 46 45 12 10 0 non-animal Cortex 13 units Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT TE 2 o S4 7 10 STP Rostral STS } 36 35 TG o 10 3 C3 7 TPO PGa IPa TEa TEm PG Cortex task-independent learning AIT o 10 3 7 C2b Unsupervised o o 10 4 1.2 - 3.2 S3 PIT VIP LIP 7a PP MSTcMSTp DP FST o o TF 10 7 0.9 - 4.4 S2b o o 10 5 1.1 - 3.0 C2 o o 10 7 0.6 - 2.4 V4 PO V3A MT S2 o o 10 4 0.4 - 1.6 V3 C1 V2 o 0.2o- 1.1 10 6 V1 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Main routes Tuning Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 MAX Bypass routes
  22. 22. Model RF sizes Num. layers units Animal vs. Prefrontal 11, task-dependent learning classification 8 46 45 12 10 0 non-animal Cortex 13 Large-scale (108 units Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST units), spans LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT several areas of TE 2 o S4 7 10 the visual cortex STP Rostral STS } 36 35 TG o 10 3 C3 7 TPO PGa IPa TEa TEm PG Cortex task-independent learning AIT o 10 3 7 C2b Unsupervised o o 10 4 1.2 - 3.2 S3 PIT VIP LIP 7a PP MSTcMSTp DP FST o o TF 10 7 0.9 - 4.4 S2b o o 10 5 1.1 - 3.0 C2 o o 10 7 0.6 - 2.4 V4 PO V3A MT S2 o o 10 4 0.4 - 1.6 V3 C1 V2 o 0.2o- 1.1 10 6 V1 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Main routes Tuning Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 MAX Bypass routes
  23. 23. Model RF sizes Num. layers units Animal vs. Prefrontal 11, task-dependent learning classification 8 46 45 12 10 0 non-animal Cortex 13 Large-scale (108 units Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST units), spans LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT several areas of TE 2 o S4 7 10 the visual cortex STP Rostral STS } 36 35 TG o 10 3 C3 7 TPO PGa IPa TEa TEm PG Cortex task-independent learning Combination of AIT o 3 7 10 C2b Unsupervised forward 10 and o o 4 1.2 - 3.2 S3 reverse PIT VIP LIP 7a PP MSTcMSTp DP FST o o TF 7 0.9 - 4.4 10 engineering S2b o o 10 5 1.1 - 3.0 C2 o o 10 7 0.6 - 2.4 V4 PO V3A MT S2 o o 10 4 0.4 - 1.6 V3 C1 V2 o 0.2o- 1.1 10 6 V1 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Main routes Tuning Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 MAX Bypass routes
  24. 24. Model RF sizes Num. layers units Animal vs. Prefrontal 11, task-dependent learning classification 8 46 45 12 10 0 non-animal Cortex 13 Large-scale (108 units Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST units), spans LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT several areas of TE 2 o S4 7 10 the visual cortex STP Rostral STS } 36 35 TG o 10 3 C3 7 TPO PGa IPa TEa TEm PG Cortex task-independent learning Combination of AIT o 3 7 10 C2b Unsupervised forward 10 and o o 4 1.2 - 3.2 S3 reverse PIT VIP LIP 7a PP MSTcMSTp DP FST o o TF 7 0.9 - 4.4 10 engineering S2b o o 10 5 1.1 - 3.0 C2 Shown to be o o 7 0.6 - 2.4 10 V4 PO V3A MT S2 consistent with o o 4 0.4 - 1.6 10 V3 C1 V2 many1.1 10 experimental o o 6 0.2 - V1 data across areas S1 of visual cortex dorsal stream ventral stream 'where' pathway 'what' pathway (V1, V2, V4, MT and IT) Simple cells Complex cells Main routes Tuning Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 MAX Bypass routes
  25. 25. Two functional classes of cells Simple cells Complex cells Invariance Template matching max-like operation Gaussian-like tuning ~”OR” ~ “AND” Riesenhuber & Poggio 1999 (building on Fukushima 1980 and Hubel & Wiesel 1962)
  26. 26. Model RF sizes Num. layers units Animal vs. Prefrontal 11, task-dependent learning classification 8 46 45 12 10 0 non-animal Cortex 13 units Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT TE 2 o S4 7 10 STP Rostral STS } 36 35 TG o 10 3 C3 7 TPO PGa IPa TEa TEm PG Cortex task-independent learning AIT o 10 3 7 C2b Unsupervised o o 10 4 1.2 - 3.2 S3 PIT VIP LIP 7a PP MSTcMSTp DP FST o o TF 10 7 0.9 - 4.4 S2b o o 10 5 1.1 - 3.0 C2 o o 10 7 0.6 - 2.4 V4 PO V3A MT S2 o o 10 4 0.4 - 1.6 V3 C1 V2 o 0.2o- 1.1 10 6 V1 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Main routes Tuning Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 MAX Bypass routes
  27. 27. Hierarchy of image fragments see also Ullman et al 2002
  28. 28. Hierarchy of image fragments Unsupervised learning of frequent image fragments during development see also Ullman et al 2002
  29. 29. Hierarchy of image fragments Unsupervised learning of frequent image fragments during development Reusable fragments shared across categories see also Ullman et al 2002
  30. 30. Hierarchy of image fragments Unsupervised learning of frequent image fragments during development Reusable fragments shared across categories Large redundant vocabulary for implicit geometry see also Ullman et al 2002
  31. 31. Hierarchy of image fragments Unsupervised learning of frequent image fragments IT during development Reusable fragments shared across categories Large redundant vocabulary for implicit geometry V1 see also Ullman et al 2002
  32. 32. Hierarchy of image fragments Unsupervised learning of frequent image fragments IT during development Reusable fragments shared across categories Large redundant vocabulary for implicit geometry V1 see also Ullman et al 2002
  33. 33. Hierarchy of image fragments Unsupervised learning of frequent image fragments IT during development Reusable fragments shared across categories Large redundant vocabulary for implicit geometry V1 see also Ullman et al 2002
  34. 34. Hierarchy of image fragments category selective units linear perceptron Unsupervised learning of frequent image fragments IT during development Reusable fragments shared across categories Large redundant vocabulary for implicit geometry V1 see also Ullman et al 2002
  35. 35. Model vs. IT 1 IT Model 0.8 Classification performance 0.6 0.4 0.2 0 Size: 3.4o 3.4o 1.7o 6.8o 3.4o 3.4o center 2ohorz. 4ohorz. Position: center center center TRAIN Model data: Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 Experimental data: Hung* Kreiman* Poggio & DiCarlo 2005
  36. 36. Is this model sufficient to explain performance in rapid categorization tasks? Image Interval Image-Mask Mask 1/f noise 20 ms 30 ms ISI 80 ms Animal present or not ? Thorpe et al 1996; Van Rullen & Koch 2003; Bacon-Mace et al 2005
  37. 37. Rapid categorization Serre Oliva & Poggio 2007
  38. 38. Rapid categorization Head Close-body Medium-body Far-body Animals Natural distractors Artificial distractors Serre Oliva & Poggio 2007
  39. 39. Rapid categorization Serre Oliva & Poggio 2007
  40. 40. Rapid categorization Head Close-body Medium-body Far-body Animals Natural distractors Serre Oliva & Poggio 2007
  41. 41. Rapid categorization 2.6 2.4 Performance (d') 1.8 1.4 Model (82% correct) Human observers (80% correct) 1.0 Head Close-body Medium-body Far-body Head Close- Medium- Far- body body body Animals Natural distractors Serre Oliva & Poggio 2007
  42. 42. “Clutter effect” Limitation of feedforward model compatible with reduced selectivity in V4 (Reynolds et al 1999) and IT in the presence of clutter (Zoccolan et al 2005, 2007; Rolls et al 2003) Meyers Freiwald Embark Kreiman Serre Poggio in prep
  43. 43. “Clutter effect” Recording site in monkey’s IT Limitation of feedforward model compatible with reduced selectivity in V4 Model (Reynolds et al 1999) and IT in the presence of clutter IT neurons (Zoccolan et al 2005, 2007; Rolls et al 2003) fMRI Meyers Freiwald Embark Kreiman Serre Poggio in prep
  44. 44. Summary I Rapid categorization seems compatible with model based on feedforward hierarchy of image fragments Consistent with psychophysics, key limitation of architecture is recognition in clutter How does the visual system overcome such limitation?
  45. 45. Outline 1.Rapid recognition and feedforward processing: Loose hierarchy of image fragments “Clutter problem” 2.Beyond feedforward processing: X X Top-down cortical feedback and attention to solve the “clutter problem” XX Predicting human eye movements
  46. 46. Spatial attention solves the “clutter problem” see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997; and many others
  47. 47. Spatial attention solves the “clutter problem” see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997; and many others foreground
  48. 48. Spatial attention solves the “clutter problem” see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997; and many others background foreground
  49. 49. Spatial attention solves the “clutter problem” see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997; and many others background foreground X X XX
  50. 50. Spatial attention solves the “clutter problem” see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997; and many others background foreground X X XX Problem: How to know where to attend?
  51. 51. Spatial attention solves X X XX the “clutter problem” see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997; and many others Science 22 April 2005: Vol. 308. no. 5721, pp. 529 - 534 Parallel and Serial Neural Mechanisms for Visual Search in Macaque Area V4 Narcisse P. Bichot, Andrew F. Rossi, Robert Desimone
  52. 52. Spatial attention solves X X XX the “clutter problem” see also Broadbent 1952 1954; Treisman 1960; Treisman & Gelade 1980; Duncan & Desimone 1995; Wolfe, 1997; and many others Science 22 April 2005: Vol. 308. no. 5721, pp. 529 - 534 Parallel and Serial Neural Mechanisms for Visual Search in Macaque Area V4 Narcisse P. Bichot, Andrew F. Rossi, Robert Desimone Answer: Parallel feature-based attention
  53. 53. Parallel feature-based X X XX attention modulation normalized spike activity 2 1 0 0 100 200 0 100 200 time from fixation (ms)
  54. 54. Serial spatial attention X X XX modulation Test for serial (spatial) selection 2 attend within RF normalized spike activity 1 FIX attend away from RF RF 0 0 100 200 RF stimulus is SACCADE: target of saccade ruary 18, 2009 time from fixation (ms) vs. RF stimulus is not SACCADE: target of saccade Fig. 4. Illustration of the saccade enhancement analysis. We compared neuronal measures when the monkey made a saccade to an RF stimulus versus a saccade away from the RF. In this dis-
  55. 55. Attention as Bayesian inference PFC IT V4/PIT V2 Chikkerur Serre & Poggio in prep see also Rao 2005; Lee & Mumford 2003
  56. 56. Attention as Bayesian inference PFC feature-based attention IT V4/PIT V2 Chikkerur Serre & Poggio in prep see also Rao 2005; Lee & Mumford 2003
  57. 57. Attention as Bayesian inference PFC feature-based attention IT FEF/LIP V4/PIT spatial attention V2 Chikkerur Serre & Poggio in prep see also Rao 2005; Lee & Mumford 2003
  58. 58. Attention as Bayesian inference O PFC feature-based object priors attention Fi IT L FEF/LIP Fli V4/PIT location priors spatial attention N I V2 Chikkerur Serre & Poggio in prep see also Rao 2005; Lee & Mumford 2003
  59. 59. Attention as Bayesian inference PFC O LIP IT Fi L V4 Fli N V2 I Chikkerur Serre & Poggio in prep
  60. 60. Attention as Bayesian inference feature-based PFC O attention belief propagation: FEF/LIP = P (L) mLIP →V 4 IT Fi = P (F i |O) mIT →V 4 = P (Fli |F, L)P (L)P (I|Fli ) mV 4→IT L L Fli = P (Fli |F, L)P (F i |O)P (I|Fli ) mV 4→LIP V4 Fli Fi Fli N Where is at object O? V2 I Chikkerur Serre & Poggio in prep see also Rao 2005; Lee & Mumford 2003
  61. 61. Attention as Bayesian inference spatial attention PFC O belief propagation: FEF/LIP = P (L) mLIP →V 4 IT Fi = P (F i |O) mIT →V 4 = P (Fli |F, L)P (L)P (I|Fli ) mV 4→IT L L Fli = P (Fli |F, L)P (F i |O)P (I|Fli ) mV 4→LIP V4 Fli Fi Fli N What is at location L? V2 I Chikkerur Serre & Poggio in prep see also Rao 2005; Lee & Mumford 2003
  62. 62. Model performance improves with attention performance (d’) one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  63. 63. Model performance improves with attention 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  64. 64. Model performance improves with attention 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  65. 65. Model performance improves with attention 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  66. 66. Model performance improves with attention mask no mask 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  67. 67. Agreement with neurophysiology data Feature-based attention: Differential modulation for preferred vs. non-preferred stimulus (Bichot et al’ 05) Spatial attention: Gain modulation on neuron’s tuning curves (McAdams & Maunsell’99) Competitive mechanisms in V2 and V4 (Reynolds et al’ 99) Improved readout in clutter (being tested in collaboration with the Desimone lab)
  68. 68. IT readout improves with attention train readout classifier on + isolated object Zhang Meyers Serre Bichot Desimone Poggio in prep
  69. 69. IT readout improves with attention + Zhang Meyers Serre Bichot Desimone Poggio in prep
  70. 70. IT readout improves with attention + Zhang Meyers Serre Bichot Desimone Poggio in prep
  71. 71. IT readout improves with attention + Zhang Meyers Serre Bichot Desimone Poggio in prep
  72. 72. IT readout improves with attention cue transient change 7 attention on object Average rank attention away 8 + from object object not shown 9 0 500 1000 1500 2000 Time (ms) n=34 Zhang Meyers Serre Bichot Desimone Poggio in prep
  73. 73. IT readout improves with attention cue transient change 7 attention on object Average rank attention away 8 + from object object not shown 9 0 500 1000 1500 2000 Time (ms) n=34 Zhang Meyers Serre Bichot Desimone Poggio in prep
  74. 74. Could these attentional mechanisms also explain search strategies in complex natural images?
  75. 75. Matching human eye movements Dataset: 100 street-scenes images with cars & pedestrians and 20 without Experiment 8 participants asked to count the number of cars/pedestrians Blocks/randomized presentations Each image presented twice Eye movements recorded using an infra-red eye tracker Eye movements as proxy for attention Chikkerur Tan Serre & Poggio in sub
  76. 76. Matching human eye movements Car search Pedestrian search Chikkerur Tan Serre & Poggio in sub
  77. 77. Matching human eye movements Car search Pedestrian search Chikkerur Tan Serre & Poggio in sub
  78. 78. Attention as Bayesian inference PFC O FEF/LIP IT Fi L V4 Fli N V2 I Chikkerur Serre & Poggio in prep
  79. 79. Matching human eye movements
  80. 80. Matching human eye 100% movements fraction fixations 75% 50% 25% 10% 20% 30% % image covered by saliency maps
  81. 81. Matching human eye 100% area movements fraction fixations 75% under 50% ROC 25% curve 10% 20% 30% % image covered by saliency maps
  82. 82. Results ROC area Humans Bottom-up Top-down (feature-based) Chikkerur Tan Serre & Poggio in sub
  83. 83. Results 1 ROC area 0.75 0.5 0.25 0 car pedestrian Humans Bottom-up Top-down (feature-based) Chikkerur Tan Serre & Poggio in sub
  84. 84. Results 1 ROC area 0.75 0.5 0.25 0 car pedestrian Humans Bottom-up Top-down (feature-based) Chikkerur Tan Serre & Poggio in sub
  85. 85. Results 1 ROC area 0.75 0.5 0.25 0 car pedestrian Humans Bottom-up Top-down (feature-based) Chikkerur Tan Serre & Poggio in sub
  86. 86. Results 1 ROC area 0.75 0.5 0.25 0 car pedestrian Humans Bottom-up Top-down (feature-based) Chikkerur Tan Serre & Poggio in sub

×