A biologically-motivated approach to computer vision

1,349 views
1,256 views

Published on

Presentation given at Yale University in August 2009.

Published in: Technology, Business
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,349
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
179
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

A biologically-motivated approach to computer vision

  1. 1. A biologically-motivated approach to computer vision Thomas Serre McGovern Institute for Brain Research Department of Brain & Cognitive Sciences Massachusetts Institute of Technology
  2. 2. The problem: invariant recognition in natural scenes • Object recognition is hard! • Our visual capabilities are computationally amazing • Reverse-engineer the visual system and build machines that see and interpret the visual world as well as we do
  3. 3. Computer vision Face detection successes
  4. 4. Computer vision Face detection successes
  5. 5. The recipe lots of training Lots of simple features fancy classifier examples Given example images where for negative and positive examples respec- tively. Initialize weights for respec- tively, where and are the number of negatives and positives respectively. For : 1. Normalize is very valuable, in their implementation it is necessary to the weights, first evaluate some feature detector at every location. These features are then grouped to find unusual co-occurrences. In practice, since the form of our detector and the features that so that it uses are extremely efficient, the amortized cost of evalu- is a probability distribution. + + 2. For each feature, , detector at every scale and location is much faster first and second features selected by Ad- ating our train a classifier which Figure 3: The is restricted to findingaand grouping edges throughoutaBoost. The two features are shown in the top row and then than using single feature. The the image. error is evaluated with work Fleuret and Geman have presented a face In recent respect to , overlayed on a typical training face in the bottom row. The detection. technique which relies on a “chain” of tests in or- first feature measures the difference in intensity between the 3. Choose theder to signifywith the lowest of a face at a particular scale and and a region across the upper cheeks. The classifier, , the presence error . region of the eyes location [4]. The image properties measured by Fleuret and 4. Update the weights: feature capitalizes on the observation that the eye region is Geman, disjunctions of fine scale edges, are quite different often darker than the cheeks. The second feature compares than rectangle features which are simple, exist at all scales, the intensities in the eye regions to the intensity across the and are somewhat interpretable. The two approaches also where if example is classified cor- bridge of the nose. differ radically in their learning philosophy. The motivation rectly, otherwise, and . for Fleuret and Geman’s learning process is density estima- tion and density discrimination, while our detector nose and cheeks (see Figure 3). This feature is rel- The final strong classifier is: of the is purely Figure 3: The first and second features selected by Ad- discriminative. Finally the false positive rate of Fleuret andcomparison with the detection sub-window, atively large in Figure 5: Example of frontal upright face images used for aBoost. The two features are shown in the top row and then Geman’s approach appears to be higher than that of previ- and should be somewhat insensitive to size and location of otherwise training. ous approaches like Rowley et al. and thisthe face. The second feature selected relies on the property approach. Un- overlayed on a typical training face in the bottom row. The fortunately the paper does not report quantitative results are darker than the bridge of the nose. where that the eyes of first feature measures the difference in intensity between the this kind. The included example images each have between region of the eyes and a region across the upper cheeks. The 2 and 10 false positives. feature capitalizes on the observation that the eye region is Table 1: The AdaBoost algorithm for classifier learn- 4. The Attentional speed of the cascaded detector is directly related to The Cascade ing. Each round of boosting selects one feature from the the number of features evaluated per scanned sub-window. often darker than the cheeks. The second feature compares 180,000 potential features. This section describes an algorithm for constructing a cas-[12], an average of 10 Evaluated on the MIT+CMU test set the intensities in the eye regions to the intensity across the 5 Results cade of classifiers which achieves increased detectionevaluated per sub-window. features out of a total of 6061 are per- This is possible because a large majority of sub-windows formance while radically reducing computation time. The bridge of the nose. Schneiderman & Kanade ’99 number of features are retained (perhaps a classifier was or A 38 layer cascaded few hundred trained to detect frontalthat smaller, and by the first or second layer in the cascade. On key insight is are rejected therefore more efficient, Face detection thousand). upright faces. To train the detector, a set of face and non- can be constructed which processor, the face detector can pro- boosted classifiers a 700 Mhz Pentium III reject many of face training images were used. The face training set con- the negative sub-windows a 384 by 288 pixel image in about .067 seconds (us- cess while detecting almost all posi- of the nose and cheeks (see Figure 3). This feature is rel- atively large in comparison with the detection sub-window, 3.2. Learning Results Viola & Jones ’01 sisted of 4916 hand labeled faces scaled and aligned to (i.e. the threshold of scale of 1.25 and a step size of 1.5 described tive instances a ing a starting a boosted classifier can base resolution of 24 by 24 pixels. The be adjusted so that the false negative rate is close times faster than the Rowley- faces were ex- below). This is roughly 15 to zero). and should be somewhat insensitive to size and location of While details on the trainingfrom performance of the final a random crawl of tracted and images downloaded during Simpler classifiers are used to reject the majority of about 600 times faster than Baluja-Kanade detector [12] and sub- system are presented the world wide several simple results examples are shown more complex classifiers are called upon in Section 5, web. Some typical face windows before the Schneiderman-Kanade detector [15]. the face. The second feature selected relies on the property merit discussion. InitialFigure 5. The non-face subwindows used to train the in experiments demonstrated that a to achieve low false positive rates. that the eyes are darker than the bridge of the nose. frontal face classifier detector come from 9544 images which were manually in- constructed from 200 features yields Image Processing The overall form of the detection process is that of a de- a detection rate of 95% withand found to not contain any faces. generate decision tree, what example asub-windows used for training were vari- spected a false positive rate of 1 in There are about All we call “cascade” (see Fig-
  6. 6. 10K-1M training examples Schneiderman & Kanade ’99 Face detection Viola & Jones ’01
  7. 7. over 100K training examples Car detection Schneiderman & Kanade ’99
  8. 8. over 1K training examples Pedestrian detection Dalal & Triggs ’05
  9. 9. What’s wrong with this picture?
  10. 10. • Tens of thousands of manually annotated training examples • ~30,000 object categories (Biederman, 1987) • Approach unlikely to scale up ... What’s wrong with this picture?
  11. 11. One-shot learning in By age 6, a child knows 10-30K categories humans
  12. 12. One-shot learning in By age 6, a child knows 10-30K categories humans
  13. 13. One-shot learning in By age 6, a child knows 10-30K categories humans
  14. 14. What are the computational mechanisms underlying this amazing feat? source: cerebral cortex
  15. 15. What are the computational mechanisms underlying this amazing feat? source: cerebral cortex
  16. 16. What are the computational mechanisms underlying this amazing feat? 1. Organization of the visual system source: cerebral cortex
  17. 17. What are the computational mechanisms underlying this amazing feat? 1. Organization of the visual system 2. Computational model of the visual cortex source: cerebral cortex
  18. 18. What are the computational mechanisms underlying this amazing feat? 1. Organization of the visual system 2. Computational model of the visual cortex 3. Application to computer vision source: cerebral cortex
  19. 19. What are the computational mechanisms underlying this amazing feat? 1. Organization of the visual system 2. Computational model of the visual cortex 3. Application to computer vision source: cerebral cortex
  20. 20. Hierarchical architecture: Rockland & Pandya ’79; Anatomy Maunsell & Van Essen ‘83; Felleman & Van Essen ’91
  21. 21. Hierarchical architecture: Rockland & Pandya ’79; Anatomy Maunsell & Van Essen ‘83; Felleman & Van Essen ’91
  22. 22. source: Thorpe & Fabre-Thorpe ‘01 Hierarchical architecture: Nowak & Bullier ’97 Schmolesky et al ’98 Latencies
  23. 23. Hierarchical architecture: Function
  24. 24. ventral visual stream Hierarchical architecture: Function
  25. 25. Hierarchical architecture: Function
  26. 26. Hierarchical architecture: Function
  27. 27. Hierarchical architecture: Hubel & Wiesel 1959, 1962, 1965, 1968 Function
  28. 28. simple complex cells cells Nobel prize 1981 Hierarchical architecture: Hubel & Wiesel 1959, 1962, 1965, 1968 Function
  29. 29. gradual increase in complexity of preferred stimulus Hierarchical architecture: Kobatake & Tanaka 1994 see also Oram & Perrett 1993; Sheinberg & Function Logothetis 1996; Gallant et al 1996; Riesenhuber & Poggio 1999
  30. 30. Parallel increase in invariance properties (position and scale) of neurons Hierarchical architecture: Kobatake & Tanaka 1994 see also Oram & Perrett 1993; Sheinberg & Function Logothetis 1996; Gallant et al 1996; Riesenhuber & Poggio 1999
  31. 31. Hierarchical architecture: Function
  32. 32. Hierarchical architecture: Function
  33. 33. Hierarchical architecture: Hung* Kreiman* Poggio & DiCarlo 2005 Function
  34. 34. • Invariant object recognition in IT: • Robust invariant readout of category information from small population of neurons • Single spikes after response onset carry most of the information Hierarchical architecture: Hung* Kreiman* Poggio & DiCarlo 2005 Function
  35. 35. Hierarchical architecture: Thorpe Fize & Marlot ‘96 Feedforward processing
  36. 36. Hierarchical architecture: Thorpe Fize & Marlot ‘96 Feedforward processing
  37. 37. Hierarchical architecture: Feedforward processing
  38. 38. Hierarchical architecture: Feedforward processing
  39. 39. What are the computational mechanisms used by brains to achieve this amazing feat? 1. Organization of the visual system 2. Computational model of the visual cortex 3. Application to computer vision source: cerebral cortex
  40. 40. • Qualitative neurobiological models (Hubel & Wiesel ‘58; Perrett & Oram ‘93) • Biologically-inspired (Fukushima ‘80; Mel ‘97; LeCun et al ‘98; Thorpe ‘02; Ullman et al ‘02; Wersing & Koerner ‘03) • Quantitative neurobiological models (Wallis & Rolls ‘97; Riesenhuber & Poggio ‘99; Amit & Mascaro ‘03; Deco & Rolls ‘06) Feedforward hierarchical model of object recognition
  41. 41. Model layers RF sizes Num. units • Large-scale (108 Prefrontal 11, Animal vs. units), spans several areas of the visual task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units Supervised Increase in complexity (number of subunits), RF size and invariance PG cortex V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 P P P T AIT,36,35 PIT, AIT TE o 2 S4 7 10 STP • Combination of Rostral STS TG 36 35 } o TPO PGa IPa TEa TEm m C3 7 10 3 PG Cortex task-independent learning AIT C2b 7 o 10 3 forward and reverse Unsupervised S3 o 1.2 - 3.2 o 10 4 engineering DP VIP LIP 7a PP MSTcMSTp M TcM p FST T PIT TF o o S2b 0.9 - 4.4 10 7 o o 10 5 • Shown to be C2 1.1 - 3.0 o o 7 0.6 - 2.4 consistent with many PO V3A V4 S2 10 o o 10 4 experimental data V2 V3 C1 0.4 - 1.6 o 0.2o- 1.1 V1 S1 10 6 across areas of visual dorsal stream ventral stream cortex 'where' pathway 'what' pathway Simple cells Complex cells Tuning Main routes MAX Bypass routes Feedforward hierarchical model
  42. 42. Simple units Complex units Selective pooling Riesenhuber & Poggio 1999 (building on Fukushima ‘80 and Hubel & Wiesel ‘62) mechanisms
  43. 43. Simple units Complex units Template matching Invariance Gaussian-like tuning max-like operation ~ “AND” ~”OR” Selective pooling Riesenhuber & Poggio 1999 (building on Fukushima ‘80 and Hubel & Wiesel ‘62) mechanisms
  44. 44. Model layers RF sizes Num. units • Large-scale (108 Prefrontal 11, Animal vs. units), spans several areas of the visual task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units Supervised Increase in complexity (number of subunits), RF size and invariance PG cortex V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 P P P T AIT,36,35 PIT, AIT TE o 2 S4 7 10 STP • Combination of Rostral STS TG 36 35 } o TPO PGa IPa TEa TEm m C3 7 10 3 PG Cortex task-independent learning AIT C2b 7 o 10 3 forward and reverse Unsupervised S3 o 1.2 - 3.2 o 10 4 engineering DP VIP LIP 7a PP MSTcMSTp M TcM p FST T PIT TF o o S2b 0.9 - 4.4 10 7 o o 10 5 • Shown to be C2 1.1 - 3.0 o o 7 0.6 - 2.4 consistent with many PO V3A V4 S2 10 o o 10 4 experimental data V2 V3 C1 0.4 - 1.6 o 0.2o- 1.1 V1 S1 10 6 across areas of visual dorsal stream ventral stream cortex 'where' pathway 'what' pathway Simple cells Complex cells Tuning Main routes MAX Bypass routes Feedforward hierarchical model
  45. 45. Kouh & Poggio 2007; Knoblich Bouvrie Poggio 2007 Both operations can be Basic circuit for the two approximated gain control operations circuits using shunting inhibition
  46. 46. Model RF size layers Animal Prefrontal Cortex 45 12 11, 13 vs. non-animal PFC classification units PG V1 AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 o S3 1.2 PIT S2b V4 0.9 o o C2 1.1 V4 V2 0.6 o S2 o C1 0.4 V1 V2 V1 0.2o S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
  47. 47. Model RF size layers Animal Prefrontal 45 12 11, vs. PFC classification PFC, IT very likely non-animal Cortex 13 units PG Evidence for adult plasticity V1 AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 o S3 1.2 V4 likely PIT S2b V4 0.9 o o C2 1.1 V4 V2 0.6 o S2 o C1 0.4 V1 V2 V1/V2 limited evidence V1 S1 0.2o dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
  48. 48. Model RF size layers Animal Prefrontal Cortex 45 12 11, 13 vs. non-animal PFC classification units PG V1 AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 o S3 1.2 PIT S2b V4 0.9 o Unsupervised developmental- C2 1.1 o V2 0.6 like learning stage: V4 S2 o Frequent image features C1 0.4 o V1 V2 V1 0.2o S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
  49. 49. Model RF size layers Animal Prefrontal Cortex 45 12 11, 13 vs. non-animal PFC classification units PG V1 AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 o S3 1.2 PIT S2b V4 0.9 o Unsupervised developmental- C2 1.1 o V2 0.6 like learning stage: V4 S2 o Frequent image features C1 0.4 o V1 V2 V1 0.2o S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
  50. 50. Model RF size layers Animal Prefrontal Cortex 45 12 11, 13 vs. non-animal PFC classification units PG V1 AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 o S3 1.2 PIT S2b V4 0.9 o Unsupervised developmental- C2 1.1 o V2 0.6 like learning stage: V4 S2 o Frequent image features C1 0.4 o V1 V2 V1 0.2o S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
  51. 51. Model RF size layers Learned V2/V4 units Prefrontal 45 12 11, Animal vs. PFC classification Cortex 13 non-animal units stronger PG V1 facilitation AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 stronger suppression S3 1.2 o PIT S2b V4 0.9 o Unsupervised developmental- C2 1.1 o V2 0.6 like learning stage: V4 S2 o Frequent image features C1 0.4 o V1 V2 V1 0.2o S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
  52. 52. Model RF size layers Beyond V4 Prefrontal 45 12 11, Animal vs. PFC classification Cortex 13 non-animal Combinations of those... PG units V1 AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 o S3 1.2 PIT S2b V4 0.9 o Unsupervised developmental- C2 1.1 o V2 0.6 like learning stage: V4 S2 o Frequent image features C1 0.4 o V1 V2 V1 0.2o S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
  53. 53. Model RF size layers Animal Supervised learning from a Prefrontal Cortex 45 12 11, 13 vs. non-animal PFC classification units handful of training examples PG ~ linear perceptron V1 AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 o S3 1.2 PIT S2b V4 0.9 o Unsupervised developmental- C2 1.1 o V2 0.6 like learning stage: V4 S2 o Frequent image features C1 0.4 o V1 V2 V1 0.2o S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
  54. 54. Learning and sample complexity
  55. 55. Model RF sizes Num. layers units Animal Prefrontal 11, vs. task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 P P P T AIT,36,35 PIT, AIT TE o 2 S4 7 10 STP Rostral STS TG 36 35 } o TPO PGa IPa TEa TEm m C3 7 10 3 PG Cortex task-independent learning AIT o C2b 7 10 3 Unsupervised o o S3 1.2 - 3.2 10 4 DP VIP LIP 7a PP MSTcMSTp M TcM p FST T PIT TF o o S2b 0.9 - 4.4 10 7 o o C2 1.1 - 3.0 10 5 o o PO V3A V4 S2 0.6 - 2.4 10 7 o o V2 V3 C1 0.4 - 1.6 10 4 o V1 0.2o- 1.1 10 6 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Tuning Main routes MAX Bypass routes Feedforward hierarchical model

×