Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A neuromoprhic approach to computer vision

1,390 views

Published on

Published in: Technology, Education
  • What format it is in? I downloaded it and it's a 221MB beast with format '*.key'. Please specify the exact format so that it can be viewed after conversion to that format.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

A neuromoprhic approach to computer vision

  1. 1. A Neuromorphic Approach to Computer Vision Thomas Serre & Tomaso Poggio Center for Biological and Computational Learning Computer Science and Artificial Intelligence Laboratory McGovern Institute for Brain Research Department of Brain & Cognitive Sciences Massachusetts Institute of Technology
  2. 2. Past Neo2 team: CalTech, Bremen & MIT Tomaso Poggio, MIT Bob Desimone, MIT Christof Koch, CalTech Expertise: Winrich Freiwald, Bremen Computational neuroscience Animal behavior Neuronal recording in IT and V4 + fMRI in monkeys Data processing Access to human recordings Multi electrodes
  3. 3. The problem: invariant recognition in natural scenes
  4. 4. The problem: invariant recognition in natural scenes Object recognition is hard!
  5. 5. The problem: invariant recognition in natural scenes Object recognition is hard! Our visual capabilities are computationally amazing
  6. 6. The problem: invariant recognition in natural scenes Object recognition is hard! Our visual capabilities are computationally amazing Long-term goal: Reverse- engineer the visual system and build machines that see and interpret the visual world as well as we do
  7. 7. Neurally plausible quantitative model of visual perception Model layers RF sizes Num. units Animal Prefrontal 11, vs. task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT TE o 2 S4 7 10 STP Rostral STS } TG 36 35 o TPO PGa IPa TEa TEm C3 7 10 3 PG Cortex task-independent learning AIT o C2b 7 10 3 Unsupervised o o S3 1.2 - 3.2 10 4 DP VIP LIP 7a PP MSTcMSTp FST PIT TF o o S2b 0.9 - 4.4 10 7 o o C2 1.1 - 3.0 10 5 o o PO V3A MT V4 S2 0.6 - 2.4 10 7 o o V2 V3 C1 0.4 - 1.6 10 4 o V1 0.2o- 1.1 10 6 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Tuning Main routes MAX Bypass routes
  8. 8. Neurally plausible quantitative model of visual perception Model layers RF sizes Num. units Animal Prefrontal 11, vs. task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units Supervised Increase in complexity (number of subunits), RF size and invariance PG Large-scale (108 units), V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT TE spans several areas of the o 2 S4 7 10 STP Rostral STS } TG 36 35 visual cortex o TPO PGa IPa TEa TEm C3 7 10 3 PG Cortex task-independent learning AIT o C2b 7 10 3 Unsupervised o o S3 1.2 - 3.2 10 4 DP VIP LIP 7a PP MSTcMSTp FST PIT TF o o S2b 0.9 - 4.4 10 7 o o C2 1.1 - 3.0 10 5 o o PO V3A MT V4 S2 0.6 - 2.4 10 7 o o V2 V3 C1 0.4 - 1.6 10 4 o V1 0.2o- 1.1 10 6 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Tuning Main routes MAX Bypass routes
  9. 9. Neurally plausible quantitative model of visual perception Model layers RF sizes Num. units Animal Prefrontal 11, vs. task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units Supervised Increase in complexity (number of subunits), RF size and invariance PG Large-scale (108 units), V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT TE spans several areas of the o 2 S4 7 10 STP Rostral STS } TG 36 35 visual cortex o TPO PGa IPa TEa TEm C3 7 10 3 PG Cortex task-independent learning AIT o C2b 7 10 3 Unsupervised Combination of forward o o S3 1.2 - 3.2 10 4 DP VIP LIP 7a PP MSTcMSTp FST PIT TF o o and reverse engineering S2b 0.9 - 4.4 10 7 o o C2 1.1 - 3.0 10 5 o o PO V3A MT V4 S2 0.6 - 2.4 10 7 o o V2 V3 C1 0.4 - 1.6 10 4 o V1 0.2o- 1.1 10 6 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Tuning Main routes MAX Bypass routes
  10. 10. Neurally plausible quantitative model of visual perception Model layers RF sizes Num. units Animal Prefrontal 11, vs. task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units Supervised Increase in complexity (number of subunits), RF size and invariance PG Large-scale (108 units), V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT TE spans several areas of the o 2 S4 7 10 STP Rostral STS } TG 36 35 visual cortex o TPO PGa IPa TEa TEm C3 7 10 3 PG Cortex task-independent learning AIT o C2b 7 10 3 Unsupervised Combination of forward o o S3 1.2 - 3.2 10 4 DP VIP LIP 7a PP MSTcMSTp FST PIT TF o o and reverse engineering S2b 0.9 - 4.4 10 7 o o C2 1.1 - 3.0 10 5 o o 0.6 - 2.4 10 7 Shown to be consistent PO V3A MT V4 S2 o o V2 V3 C1 0.4 - 1.6 10 4 V1 S1 with many experimental 0.2o- 1.1 o 10 6 dorsal stream 'where' pathway ventral stream 'what' pathway data across areas of visual cortex Simple cells Complex cells Tuning Main routes MAX Bypass routes
  11. 11. Feedforward processing and rapid recognition
  12. 12. Feedforward processing and rapid recognition
  13. 13. Feedforward processing and rapid recognition
  14. 14. Feedforward processing and rapid recognition
  15. 15. Feedforward processing and rapid recognition category selective units linear perceptron
  16. 16. Model validation against electrophysiology data
  17. 17. Model validation against electrophysiology data 1 IT Model 0.8 Classification performance 0.6 0.4 0.2 0 Size: 3.4o 3.4o 1.7o 6.8o 3.4o 3.4o Position: center center center center 2ohorz. 4ohorz. TRAIN Model data: Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 Experimental data: Hung* Kreiman* Poggio & DiCarlo 2005
  18. 18. Explaining human performance in rapid categorization tasks Serre Oliva & Poggio 2007
  19. 19. Explaining human performance in rapid categorization tasks Serre Oliva & Poggio 2007
  20. 20. Explaining human performance in rapid categorization tasks Head Close-body Medium-body Far-body Animals Serre Oliva & Poggio 2007 Natural
  21. 21. Explaining human performance in rapid categorization tasks 2.6 2.4 Performance (d') 1.8 1.4 Model (82% correct) 1.0 Human observers (80% correct) Head Close-body Medium-body Far-body Head Close- Medium- Far- body body body Animals Serre Oliva & Poggio 2007 Natural
  22. 22. Decoding animal category from IT cortex Recording site in monkey’s IT Meyers Freiwald Embark Kreiman Serre Poggio in prep
  23. 23. Decoding animal category from IT cortex Model IT neurons Recording site in monkey’s IT fMRI Meyers Freiwald Embark Kreiman Serre Poggio in prep
  24. 24. Decoding animal category from IT cortex in humans
  25. 25. Decoding animal category from IT cortex in humans ~145 ms Animal Non-animal
  26. 26. Decoding animal category from IT cortex in humans
  27. 27. Decoding animal category from IT cortex in humans
  28. 28. Decoding animal category from IT cortex in humans
  29. 29. Bio-motivated computer vision Scene parsing and object recognition Computer vision system based on the response properties of neurons in the ventral stream of the visual cortex Serre Wolf & Poggio 2005; Wolf & Bileschi 2006; Serre et al 2007
  30. 30. Bio-motivated computer vision Scene parsing and object recognition Serre Wolf & Poggio 2005; Wolf & Bileschi 2006; Serre et al 2007
  31. 31. Bio-motivated computer vision Scene parsing and object recognition Gflops Serre Wolf & Poggio 2005; Wolf & Bileschi 2006; Serre et al 2007
  32. 32. Bio-motivated computer vision Scene parsing and object recognition Speed improvement since 2006 image size multi-thread GPU (cuda) 64x64 4.5x 14x 128x128 3.5x 14x 256x256 1.5x 17x 512x512 2.5x 25x From ~1 min down to ~1 sec !! Serre Wolf & Poggio 2005; Wolf & Bileschi 2006; Serre et al 2007
  33. 33. Bio-motivated computer vision Action recognition in video sequences motion-sensitive MT-like units wave 2 bend jump 2 side jack wave 1 walk jump run Jhuang Serre Wolf & Poggio 2007
  34. 34. Recognition accuracy Dollar et model chance al ‘05 KTH Human 81.3% 91.6% 16.7% Weiz. Human 86.7% 96.3% 11.1% UCSD Mice 75.6% 79.0% 20.0% ★ Cross-validation: 2/3 training, 1/3 testing, 10 repeats Jhuang Serre Wolf & Poggio ICCV’07
  35. 35. Automatic recognition of rodent behavior Serre Jhuang Garrote Poggio Steele in prep
  36. 36. Automatic recognition of rodent behavior Performance human 72% agreement proposed 71% system commercial 56% system chance 12% Serre Jhuang Garrote Poggio Steele in prep
  37. 37. Neuroscience of attention and Bayesian inference
  38. 38. Neuroscience of attention and Bayesian inference
  39. 39. Neuroscience of attention and Bayesian inference
  40. 40. Neuroscience of attention and Bayesian inference integrated model of attention and recognition
  41. 41. Neuroscience of attention and Bayesian inference PFC IT V4/PIT integrated model of V2 attention and recognition in collaboration with Desimone lab (monkey electrophysiology)
  42. 42. Neuroscience of attention and Bayesian inference PFC feature-based attention IT V4/PIT integrated model of V2 attention and recognition in collaboration with Desimone lab (monkey electrophysiology)
  43. 43. Neuroscience of attention and Bayesian inference PFC feature-based attention IT LIP/FEF V4/PIT spatial attention integrated model of V2 attention and recognition in collaboration with Desimone lab (monkey electrophysiology)
  44. 44. Neuroscience of Attention and Bayesian inference PFC feature-based attention IT LIP/FEF V4/PIT spatial attention V2 see also Rao 2005; Lee & Mumford 2003 Chikkerur Serre & Poggio in prep
  45. 45. Neuroscience of Attention and Bayesian inference PFC O feature-based object priors attention IT Fi LIP/FEF L V4/PIT Fli spatial attention location priors N V2 I see also Rao 2005; Lee & Mumford 2003 Chikkerur Serre & Poggio in prep
  46. 46. Model predicts well human eye-movements Integrating (local) feature-based + (global) context-based cues accounts for 92% of inter-subject agreement! Chikkerur Tan Serre & Poggio in sub
  47. 47. Model performance improves with attention performance (d’) one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  48. 48. Model performance improves with attention 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  49. 49. Model performance improves with attention 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  50. 50. Model performance improves with attention 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  51. 51. Model performance improves with attention mask no mask 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  52. 52. Main Achievements in Neo2
  53. 53. Main Achievements in Neo2 Extended + extensively tested feedforward model on real-world recognition tasks [Poggio]: matches neural data mimics human performance in rapid categorization performs at the level of state-of-the-art computer vision systems C++ software + interface available / 100x speed-up combined with saliency algorithm + tested on real-time street surveillance (video)
  54. 54. Main Achievements in Neo2 Extended + extensively tested feedforward model on real-world recognition tasks [Poggio]: matches neural data mimics human performance in rapid categorization performs at the level of state-of-the-art computer vision systems C++ software + interface available / 100x speed-up combined with saliency algorithm + tested on real-time street surveillance (video) Demonstrated read out of cluttered natural images from monkey fMRI and physiology recordings in inferotemporal cortex [Freiwald and Poggio]: first decoding of cluttered complex images agreement with original feedforward model
  55. 55. Main Achievements in Neo2 Extended + extensively tested feedforward model on real-world recognition tasks [Poggio]: matches neural data mimics human performance in rapid categorization performs at the level of state-of-the-art computer vision systems C++ software + interface available / 100x speed-up combined with saliency algorithm + tested on real-time street surveillance (video) Demonstrated read out of cluttered natural images from monkey fMRI and physiology recordings in inferotemporal cortex [Freiwald and Poggio]: first decoding of cluttered complex images agreement with original feedforward model Characterized neural encoding in V4, IT and FEF under passive and task- dependent viewing conditions [Desimone and Poggio]: characterized the dynamics of bottom-up vs. top-down visual information processing (characteristic timing signature of activity in V4 and IT vs. FEF) top-down, task-dependent, attention modulates features in V4 and IT
  56. 56. Main Achievements in Neo2
  57. 57. Main Achievements in Neo2 Implemented new extended model suggested by these neuroscience data from Desimone lab to include attention via feedback loops from higher areas [Poggio] predicts well human gaze in natural images significantly improves recognition performance of original model in clutter
  58. 58. Main Achievements in Neo2 Implemented new extended model suggested by these neuroscience data from Desimone lab to include attention via feedback loops from higher areas [Poggio] predicts well human gaze in natural images significantly improves recognition performance of original model in clutter Extended model for classification of video sequences (i.e., action recognition) [Poggio] tested on several video databases and shown to outperform previous algorithms
  59. 59. Main Achievements in Neo2 Implemented new extended model suggested by these neuroscience data from Desimone lab to include attention via feedback loops from higher areas [Poggio] predicts well human gaze in natural images significantly improves recognition performance of original model in clutter Extended model for classification of video sequences (i.e., action recognition) [Poggio] tested on several video databases and shown to outperform previous algorithms Demonstrated read-out from human medial temporal lobe (MTL) [Koch] Decoding of natural scenes from single neurons in human MTL Improved ability of saliency model to mimic human gaze patterns
  60. 60. Main Achievements in Neo2 Implemented new extended model suggested by these neuroscience data from Desimone lab to include attention via feedback loops from higher areas [Poggio] predicts well human gaze in natural images significantly improves recognition performance of original model in clutter Extended model for classification of video sequences (i.e., action recognition) [Poggio] tested on several video databases and shown to outperform previous algorithms Demonstrated read-out from human medial temporal lobe (MTL) [Koch] Decoding of natural scenes from single neurons in human MTL Improved ability of saliency model to mimic human gaze patterns Model used to transfer neuroscience data to biologically inspired vision systems
  61. 61. MIT team: Poggio, Desimone, Serre, Future Directions 1-of-2 IT physiologist, + (Koch+Itti) Develop new technologies to decode computations and representations in the visual cortex:
  62. 62. MIT team: Poggio, Desimone, Serre, Future Directions 1-of-2 IT physiologist, + (Koch+Itti) Develop new technologies to decode computations and representations in the visual cortex: Optical silencing and circuits stimulation technology based on X-rhodopsin
  63. 63. MIT team: Poggio, Desimone, Serre, Future Directions 1-of-2 IT physiologist, + (Koch+Itti) Develop new technologies to decode computations and representations in the visual cortex: Optical silencing and circuits stimulation technology based on X-rhodopsin Multi-electrode network technology
  64. 64. MIT team: Poggio, Desimone, Serre, Future Directions 1-of-2 IT physiologist, + (Koch+Itti) Develop new technologies to decode computations and representations in the visual cortex: Optical silencing and circuits stimulation technology based on X-rhodopsin Multi-electrode network technology Simultaneous recordings system across areas
  65. 65. MIT team: From the neuroscience Poggio, Desimone, Serre, XXX data towards a system-level model of natural vision 1. Clutter and image ambiguities: Attention and cortical feedback 2. Learning and recognition of objects in video sequences
  66. 66. Clutter and image ambiguities: Attention and cortical feedback IT
  67. 67. Clutter and image ambiguities: Attention and cortical feedback Circuitry of attention and role of synchronization in top-down and bottom-up search tasks: monkey IT electrophysiology in V4, IT and FEF
  68. 68. Clutter and image ambiguities: Attention and cortical feedback + IT
  69. 69. Learning and recognition of objects in video sequences How current computer How brains learn vision systems learn
  70. 70. Learning and recognition of objects in video sequences How current computer How brains learn vision systems learn
  71. 71. Thank you!
  72. 72. Past Neo2 team: CalTech, Bremen & MIT Tomaso Poggio, MIT Bob Desimone, MIT Christof Koch, CalTech Winrich Freiwald, Bremen
  73. 73. IT readout improves with attention stim cue transient change isolated object + object not shown Zhang Meyers Serre Bichot Desimone Poggio in prep n=67
  74. 74. IT readout improves with attention stim cue transient change isolated object + attention away from object object not shown Zhang Meyers Serre Bichot Desimone Poggio in prep n=67
  75. 75. IT readout improves with attention stim cue transient change isolated object + attention away from object object not shown Zhang Meyers Serre Bichot Desimone Poggio in prep n=67
  76. 76. MIT team: IT readout improves Poggio, Desimone, Serre, XXX with attention stim cue transient change isolated object attention on object + attention away from object object not shown Zhang Meyers Serre Bichot Desimone Poggio in prep n=67
  77. 77. Two functional classes of cells to explain invariant object recognition in the visual cortex Simple cells Complex cells Template matching Invariance Gaussian-like tuning max-like operation ~ “AND” ~”OR” Riesenhuber & Poggio 1999 (building on Fukushima 1980 and Hubel & Wiesel 1962)

×