A biologically-motivated approach to computer vision

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    3 Favorites

    A biologically-motivated approach to computer vision - Presentation Transcript

    1. A biologically-motivated approach to computer vision Thomas Serre McGovern Institute for Brain Research Department of Brain & Cognitive Sciences Massachusetts Institute of Technology
    2. The problem: invariant recognition in natural scenes • Object recognition is hard! • Our visual capabilities are computationally amazing • Reverse-engineer the visual system and build machines that see and interpret the visual world as well as we do
    3. Computer vision Face detection successes
    4. Computer vision Face detection successes
    5. The recipe lots of training Lots of simple features fancy classifier examples Given example images where for negative and positive examples respec- tively. Initialize weights for respec- tively, where and are the number of negatives and positives respectively. For : 1. Normalize is very valuable, in their implementation it is necessary to the weights, first evaluate some feature detector at every location. These features are then grouped to find unusual co-occurrences. In practice, since the form of our detector and the features that so that it uses are extremely efficient, the amortized cost of evalu- is a probability distribution. + + 2. For each feature, , detector at every scale and location is much faster first and second features selected by Ad- ating our train a classifier which Figure 3: The is restricted to findingaand grouping edges throughoutaBoost. The two features are shown in the top row and then than using single feature. The the image. error is evaluated with work Fleuret and Geman have presented a face In recent respect to , overlayed on a typical training face in the bottom row. The detection. technique which relies on a “chain” of tests in or- first feature measures the difference in intensity between the 3. Choose theder to signifywith the lowest of a face at a particular scale and and a region across the upper cheeks. The classifier, , the presence error . region of the eyes location [4]. The image properties measured by Fleuret and 4. Update the weights: feature capitalizes on the observation that the eye region is Geman, disjunctions of fine scale edges, are quite different often darker than the cheeks. The second feature compares than rectangle features which are simple, exist at all scales, the intensities in the eye regions to the intensity across the and are somewhat interpretable. The two approaches also where if example is classified cor- bridge of the nose. differ radically in their learning philosophy. The motivation rectly, otherwise, and . for Fleuret and Geman’s learning process is density estima- tion and density discrimination, while our detector nose and cheeks (see Figure 3). This feature is rel- The final strong classifier is: of the is purely Figure 3: The first and second features selected by Ad- discriminative. Finally the false positive rate of Fleuret andcomparison with the detection sub-window, atively large in Figure 5: Example of frontal upright face images used for aBoost. The two features are shown in the top row and then Geman’s approach appears to be higher than that of previ- and should be somewhat insensitive to size and location of otherwise training. ous approaches like Rowley et al. and thisthe face. The second feature selected relies on the property approach. Un- overlayed on a typical training face in the bottom row. The fortunately the paper does not report quantitative results are darker than the bridge of the nose. where that the eyes of first feature measures the difference in intensity between the this kind. The included example images each have between region of the eyes and a region across the upper cheeks. The 2 and 10 false positives. feature capitalizes on the observation that the eye region is Table 1: The AdaBoost algorithm for classifier learn- 4. The Attentional speed of the cascaded detector is directly related to The Cascade ing. Each round of boosting selects one feature from the the number of features evaluated per scanned sub-window. often darker than the cheeks. The second feature compares 180,000 potential features. This section describes an algorithm for constructing a cas-[12], an average of 10 Evaluated on the MIT+CMU test set the intensities in the eye regions to the intensity across the 5 Results cade of classifiers which achieves increased detectionevaluated per sub-window. features out of a total of 6061 are per- This is possible because a large majority of sub-windows formance while radically reducing computation time. The bridge of the nose. Schneiderman & Kanade ’99 number of features are retained (perhaps a classifier was or A 38 layer cascaded few hundred trained to detect frontalthat smaller, and by the first or second layer in the cascade. On key insight is are rejected therefore more efficient, Face detection thousand). upright faces. To train the detector, a set of face and non- can be constructed which processor, the face detector can pro- boosted classifiers a 700 Mhz Pentium III reject many of face training images were used. The face training set con- the negative sub-windows a 384 by 288 pixel image in about .067 seconds (us- cess while detecting almost all posi- of the nose and cheeks (see Figure 3). This feature is rel- atively large in comparison with the detection sub-window, 3.2. Learning Results Viola & Jones ’01 sisted of 4916 hand labeled faces scaled and aligned to (i.e. the threshold of scale of 1.25 and a step size of 1.5 described tive instances a ing a starting a boosted classifier can base resolution of 24 by 24 pixels. The be adjusted so that the false negative rate is close times faster than the Rowley- faces were ex- below). This is roughly 15 to zero). and should be somewhat insensitive to size and location of While details on the trainingfrom performance of the final a random crawl of tracted and images downloaded during Simpler classifiers are used to reject the majority of about 600 times faster than Baluja-Kanade detector [12] and sub- system are presented the world wide several simple results examples are shown more complex classifiers are called upon in Section 5, web. Some typical face windows before the Schneiderman-Kanade detector [15]. the face. The second feature selected relies on the property merit discussion. InitialFigure 5. The non-face subwindows used to train the in experiments demonstrated that a to achieve low false positive rates. that the eyes are darker than the bridge of the nose. frontal face classifier detector come from 9544 images which were manually in- constructed from 200 features yields Image Processing The overall form of the detection process is that of a de- a detection rate of 95% withand found to not contain any faces. generate decision tree, what example asub-windows used for training were vari- spected a false positive rate of 1 in There are about All we call “cascade” (see Fig-
    6. 10K-1M training examples Schneiderman & Kanade ’99 Face detection Viola & Jones ’01
    7. over 100K training examples Car detection Schneiderman & Kanade ’99
    8. over 1K training examples Pedestrian detection Dalal & Triggs ’05
    9. What’s wrong with this picture?
    10. • Tens of thousands of manually annotated training examples • ~30,000 object categories (Biederman, 1987) • Approach unlikely to scale up ... What’s wrong with this picture?
    11. One-shot learning in By age 6, a child knows 10-30K categories humans
    12. One-shot learning in By age 6, a child knows 10-30K categories humans
    13. One-shot learning in By age 6, a child knows 10-30K categories humans
    14. What are the computational mechanisms underlying this amazing feat? source: cerebral cortex
    15. What are the computational mechanisms underlying this amazing feat? source: cerebral cortex
    16. What are the computational mechanisms underlying this amazing feat? 1. Organization of the visual system source: cerebral cortex
    17. What are the computational mechanisms underlying this amazing feat? 1. Organization of the visual system 2. Computational model of the visual cortex source: cerebral cortex
    18. What are the computational mechanisms underlying this amazing feat? 1. Organization of the visual system 2. Computational model of the visual cortex 3. Application to computer vision source: cerebral cortex
    19. What are the computational mechanisms underlying this amazing feat? 1. Organization of the visual system 2. Computational model of the visual cortex 3. Application to computer vision source: cerebral cortex
    20. Hierarchical architecture: Rockland & Pandya ’79; Anatomy Maunsell & Van Essen ‘83; Felleman & Van Essen ’91
    21. Hierarchical architecture: Rockland & Pandya ’79; Anatomy Maunsell & Van Essen ‘83; Felleman & Van Essen ’91
    22. source: Thorpe & Fabre-Thorpe ‘01 Hierarchical architecture: Nowak & Bullier ’97 Schmolesky et al ’98 Latencies
    23. Hierarchical architecture: Function
    24. ventral visual stream Hierarchical architecture: Function
    25. Hierarchical architecture: Function
    26. Hierarchical architecture: Function
    27. Hierarchical architecture: Hubel & Wiesel 1959, 1962, 1965, 1968 Function
    28. simple complex cells cells Nobel prize 1981 Hierarchical architecture: Hubel & Wiesel 1959, 1962, 1965, 1968 Function
    29. gradual increase in complexity of preferred stimulus Hierarchical architecture: Kobatake & Tanaka 1994 see also Oram & Perrett 1993; Sheinberg & Function Logothetis 1996; Gallant et al 1996; Riesenhuber & Poggio 1999
    30. Parallel increase in invariance properties (position and scale) of neurons Hierarchical architecture: Kobatake & Tanaka 1994 see also Oram & Perrett 1993; Sheinberg & Function Logothetis 1996; Gallant et al 1996; Riesenhuber & Poggio 1999
    31. Hierarchical architecture: Function
    32. Hierarchical architecture: Function
    33. Hierarchical architecture: Hung* Kreiman* Poggio & DiCarlo 2005 Function
    34. • Invariant object recognition in IT: • Robust invariant readout of category information from small population of neurons • Single spikes after response onset carry most of the information Hierarchical architecture: Hung* Kreiman* Poggio & DiCarlo 2005 Function
    35. Hierarchical architecture: Thorpe Fize & Marlot ‘96 Feedforward processing
    36. Hierarchical architecture: Thorpe Fize & Marlot ‘96 Feedforward processing
    37. Hierarchical architecture: Feedforward processing
    38. Hierarchical architecture: Feedforward processing
    39. What are the computational mechanisms used by brains to achieve this amazing feat? 1. Organization of the visual system 2. Computational model of the visual cortex 3. Application to computer vision source: cerebral cortex
    40. • Qualitative neurobiological models (Hubel & Wiesel ‘58; Perrett & Oram ‘93) • Biologically-inspired (Fukushima ‘80; Mel ‘97; LeCun et al ‘98; Thorpe ‘02; Ullman et al ‘02; Wersing & Koerner ‘03) • Quantitative neurobiological models (Wallis & Rolls ‘97; Riesenhuber & Poggio ‘99; Amit & Mascaro ‘03; Deco & Rolls ‘06) Feedforward hierarchical model of object recognition
    41. Model layers RF sizes Num. units • Large-scale (108 Prefrontal 11, Animal vs. units), spans several areas of the visual task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units Supervised Increase in complexity (number of subunits), RF size and invariance PG cortex V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 P P P T AIT,36,35 PIT, AIT TE o 2 S4 7 10 STP • Combination of Rostral STS TG 36 35 } o TPO PGa IPa TEa TEm m C3 7 10 3 PG Cortex task-independent learning AIT C2b 7 o 10 3 forward and reverse Unsupervised S3 o 1.2 - 3.2 o 10 4 engineering DP VIP LIP 7a PP MSTcMSTp M TcM p FST T PIT TF o o S2b 0.9 - 4.4 10 7 o o 10 5 • Shown to be C2 1.1 - 3.0 o o 7 0.6 - 2.4 consistent with many PO V3A V4 S2 10 o o 10 4 experimental data V2 V3 C1 0.4 - 1.6 o 0.2o- 1.1 V1 S1 10 6 across areas of visual dorsal stream ventral stream cortex 'where' pathway 'what' pathway Simple cells Complex cells Tuning Main routes MAX Bypass routes Feedforward hierarchical model
    42. Simple units Complex units Selective pooling Riesenhuber & Poggio 1999 (building on Fukushima ‘80 and Hubel & Wiesel ‘62) mechanisms
    43. Simple units Complex units Template matching Invariance Gaussian-like tuning max-like operation ~ “AND” ~”OR” Selective pooling Riesenhuber & Poggio 1999 (building on Fukushima ‘80 and Hubel & Wiesel ‘62) mechanisms
    44. Model layers RF sizes Num. units • Large-scale (108 Prefrontal 11, Animal vs. units), spans several areas of the visual task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units Supervised Increase in complexity (number of subunits), RF size and invariance PG cortex V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 P P P T AIT,36,35 PIT, AIT TE o 2 S4 7 10 STP • Combination of Rostral STS TG 36 35 } o TPO PGa IPa TEa TEm m C3 7 10 3 PG Cortex task-independent learning AIT C2b 7 o 10 3 forward and reverse Unsupervised S3 o 1.2 - 3.2 o 10 4 engineering DP VIP LIP 7a PP MSTcMSTp M TcM p FST T PIT TF o o S2b 0.9 - 4.4 10 7 o o 10 5 • Shown to be C2 1.1 - 3.0 o o 7 0.6 - 2.4 consistent with many PO V3A V4 S2 10 o o 10 4 experimental data V2 V3 C1 0.4 - 1.6 o 0.2o- 1.1 V1 S1 10 6 across areas of visual dorsal stream ventral stream cortex 'where' pathway 'what' pathway Simple cells Complex cells Tuning Main routes MAX Bypass routes Feedforward hierarchical model
    45. Kouh & Poggio 2007; Knoblich Bouvrie Poggio 2007 Both operations can be Basic circuit for the two approximated gain control operations circuits using shunting inhibition
    46. Model RF size layers Animal Prefrontal Cortex 45 12 11, 13 vs. non-animal PFC classification units PG V1 AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 o S3 1.2 PIT S2b V4 0.9 o o C2 1.1 V4 V2 0.6 o S2 o C1 0.4 V1 V2 V1 0.2o S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
    47. Model RF size layers Animal Prefrontal 45 12 11, vs. PFC classification PFC, IT very likely non-animal Cortex 13 units PG Evidence for adult plasticity V1 AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 o S3 1.2 V4 likely PIT S2b V4 0.9 o o C2 1.1 V4 V2 0.6 o S2 o C1 0.4 V1 V2 V1/V2 limited evidence V1 S1 0.2o dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
    48. Model RF size layers Animal Prefrontal Cortex 45 12 11, 13 vs. non-animal PFC classification units PG V1 AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 o S3 1.2 PIT S2b V4 0.9 o Unsupervised developmental- C2 1.1 o V2 0.6 like learning stage: V4 S2 o Frequent image features C1 0.4 o V1 V2 V1 0.2o S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
    49. Model RF size layers Animal Prefrontal Cortex 45 12 11, 13 vs. non-animal PFC classification units PG V1 AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 o S3 1.2 PIT S2b V4 0.9 o Unsupervised developmental- C2 1.1 o V2 0.6 like learning stage: V4 S2 o Frequent image features C1 0.4 o V1 V2 V1 0.2o S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
    50. Model RF size layers Animal Prefrontal Cortex 45 12 11, 13 vs. non-animal PFC classification units PG V1 AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 o S3 1.2 PIT S2b V4 0.9 o Unsupervised developmental- C2 1.1 o V2 0.6 like learning stage: V4 S2 o Frequent image features C1 0.4 o V1 V2 V1 0.2o S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
    51. Model RF size layers Learned V2/V4 units Prefrontal 45 12 11, Animal vs. PFC classification Cortex 13 non-animal units stronger PG V1 facilitation AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 stronger suppression S3 1.2 o PIT S2b V4 0.9 o Unsupervised developmental- C2 1.1 o V2 0.6 like learning stage: V4 S2 o Frequent image features C1 0.4 o V1 V2 V1 0.2o S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
    52. Model RF size layers Beyond V4 Prefrontal 45 12 11, Animal vs. PFC classification Cortex 13 non-animal Combinations of those... PG units V1 AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 o S3 1.2 PIT S2b V4 0.9 o Unsupervised developmental- C2 1.1 o V2 0.6 like learning stage: V4 S2 o Frequent image features C1 0.4 o V1 V2 V1 0.2o S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
    53. Model RF size layers Animal Supervised learning from a Prefrontal Cortex 45 12 11, 13 vs. non-animal PFC classification units handful of training examples PG ~ linear perceptron V1 AIT AIT,36,35 PIT, AIT TE S4 7 35 AIT C3 PIT 7 C2b 7 o S3 1.2 PIT S2b V4 0.9 o Unsupervised developmental- C2 1.1 o V2 0.6 like learning stage: V4 S2 o Frequent image features C1 0.4 o V1 V2 V1 0.2o S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cell Complex ce Tuning MAX Learning and plasticity
    54. Learning and sample complexity
    55. Model RF sizes Num. layers units Animal Prefrontal 11, vs. task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 P P P T AIT,36,35 PIT, AIT TE o 2 S4 7 10 STP Rostral STS TG 36 35 } o TPO PGa IPa TEa TEm m C3 7 10 3 PG Cortex task-independent learning AIT o C2b 7 10 3 Unsupervised o o S3 1.2 - 3.2 10 4 DP VIP LIP 7a PP MSTcMSTp M TcM p FST T PIT TF o o S2b 0.9 - 4.4 10 7 o o C2 1.1 - 3.0 10 5 o o PO V3A V4 S2 0.6 - 2.4 10 7 o o V2 V3 C1 0.4 - 1.6 10 4 o V1 0.2o- 1.1 10 6 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Tuning Main routes MAX Bypass routes Feedforward hierarchical model
    56. • V1 | Simple and complex cells tuning Model layers RF sizes Num. units properties (Schiller et al 1976; Hubel Animal & Wiesel 1965; Devalois et al 1982) Prefrontal vs. • IT | Tuning and invariance properties 11, task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units (Logothetis et al 1995) Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 P P P T AIT,36,35 PIT, AIT TE o 2 S4 7 10 STP Rostral STS TG 36 35 } o TPO PGa IPa TEa TEm m C3 7 10 3 PG Cortex task-independent learning AIT o C2b 7 10 3 Unsupervised o o S3 1.2 - 3.2 10 4 DP VIP LIP 7a PP MSTcMSTp M TcM p FST T PIT TF o o S2b 0.9 - 4.4 10 7 o o C2 1.1 - 3.0 10 5 o o PO V3A V4 S2 0.6 - 2.4 10 7 o o V2 V3 C1 0.4 - 1.6 10 4 o V1 0.2o- 1.1 10 6 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Tuning Main routes MAX Bypass routes Feedforward hierarchical model
    57. • V1 | Simple and complex cells tuning Model layers RF sizes Num. units properties (Schiller et al 1976; Hubel Animal & Wiesel 1965; Devalois et al 1982) Prefrontal vs. • IT | Tuning and invariance properties 11, task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units (Logothetis et al 1995) Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 P P P T AIT,36,35 PIT, AIT TE o 2 S4 7 10 • V4 | Tuning for two-bar stimuli (Reynolds Chelazzi & Desimone 1999) STP Rostral STS TG 36 35 } o TPO PGa IPa TEa TEm m C3 7 10 3 PG Cortex • V4 | MAX operation (Gawne et al task-independent learning AIT o 10 3 C2b 7 2002) Unsupervised S3 o 1.2 - 3.2 o 10 4 • V4 | Two-spot interaction (Freiwald et DP VIP LIP 7a PP MSTcMSTp M TcM p FST T PIT TF S2b o 0.9 - 4.4 o 10 7 al 2005) C2 o 1.1 - 3.0 o 10 5 • V4 | Tuning for boundary o o conformation (Pasupathy & Connor PO V3A V4 0.6 - 2.4 10 7 S2 2001) o • V4 | Tuning for Cartesian and non- o V2 V3 C1 0.4 - 1.6 10 4 V1 S1 0.2o- 1.1 o 10 6 Cartesian gratings (Gallant et al 1996) dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Tuning Main routes MAX Bypass routes Feedforward hierarchical model
    58. • V1 | Simple and complex cells tuning Model layers RF sizes Num. units properties (Schiller et al 1976; Hubel Animal & Wiesel 1965; Devalois et al 1982) Prefrontal vs. • IT | Tuning and invariance properties 11, task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units (Logothetis et al 1995) Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 P P P T AIT,36,35 PIT, AIT TE o 2 S4 7 10 • V4 | Tuning for two-bar stimuli (Reynolds Chelazzi & Desimone 1999) STP Rostral STS TG 36 35 } o TPO PGa IPa TEa TEm m C3 7 10 3 PG Cortex • V4 | MAX operation (Gawne et al task-independent learning AIT o 10 3 C2b 7 2002) Unsupervised S3 o 1.2 - 3.2 o 10 4 • V4 | Two-spot interaction (Freiwald et DP VIP LIP 7a PP MSTcMSTp M TcM p FST T PIT TF S2b o 0.9 - 4.4 o 10 7 al 2005) C2 o 1.1 - 3.0 o 10 5 • V4 | Tuning for boundary o o conformation (Pasupathy & Connor PO V3A V4 0.6 - 2.4 10 7 S2 2001) o • V4 | Tuning for Cartesian and non- o V2 V3 C1 0.4 - 1.6 10 4 V1 S1 0.2o- 1.1 o 10 6 Cartesian gratings (Gallant et al 1996) dorsal stream ventral stream 'where' pathway 'what' pathway • V1 | MAX operation in subset of Simple cells complex cells (Lampl et al 2004) Complex cells • IT | Differential role of IT and PFC in Main routes Tuning MAX Bypass routes categorization (Freedman et al 2001 2002 2003) • IT | Read out data (Hung Kreiman Poggio & DiCarlo 2005) • IT | Average effect in IT (Zoccolan Cox Feedforward hierarchical & DiCarlo 2005; Zoccolan Kouh Poggio & DiCarlo in press) • Human psychophysics | Rapid animal model categorization (Serre Oliva Poggio 2007)
    59. Invariance in IT
    60. 1 IT Model 0.8 Classification performance 0.6 0.4 0.2 0 Size: 3.4o 3.4o 1.7o 6.8o 3.4o 3.4o Position: center center center center 2ohorz. 4ohorz. TRAIN Model data: Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 Experimental data: Hung* Kreiman* Poggio & DiCarlo 2005 TEST Invariance in IT
    61. Explaining human performance in rapid Serre Oliva & Poggio 2007 categorization tasks
    62. Explaining human performance in rapid Serre Oliva & Poggio 2007 categorization tasks
    63. Head Close-body Medium-body Far-body Animals Natural distractors Explaining human Artificial distractors performance in rapid Serre Oliva & Poggio 2007 categorization tasks
    64. 2.6 2.4 Performance (d') 1.8 1.4 Model (82% correct) 1.0 Human observers (80% correct) Head Head Close- Close-body Medium- Far- Medium-body Far-body body body body Animals Natural distractors Explaining human Artificial distractors performance in rapid Serre Oliva & Poggio 2007 categorization tasks
    65. What are the computational mechanisms used by brains to achieve this amazing feat? source: cerebral cortex
    66. What are the computational mechanisms used by brains to achieve this amazing feat? 1. Organization of the visual system source: cerebral cortex
    67. What are the computational mechanisms used by brains to achieve this amazing feat? 1. Organization of the visual system 2. Computational model of the visual cortex source: cerebral cortex
    68. What are the computational mechanisms used by brains to achieve this amazing feat? 1. Organization of the visual system 2. Computational model of the visual cortex 3. Application to computer vision source: cerebral cortex
    69. Scene parsing and object recognition Computer vision system based on the response properties of neurons in the ventral stream of the visual cortex Bio-motivated computer Serre Wolf & Poggio 2005; Wolf & Bileschi 2006; Serre et al 2007 vision
    70. GPU acceleration Gflops Bio-motivated computer Mutch, 2009 vision
    71. GPU acceleration • GPU can run certain classes of algorithms 50-100x faster than a CPU Gflops Bio-motivated computer Mutch, 2009 vision
    72. GPU acceleration • GPU can run certain classes of algorithms 50-100x faster than a CPU • Designed to run same program (“kernel”) for each element of a large 2D grid Gflops Bio-motivated computer Mutch, 2009 vision
    73. GPU acceleration • GPU can run certain classes of algorithms 50-100x faster than a CPU • Designed to run same program (“kernel”) for each element of a large 2D grid Gflops • 240 parallel processors! (512 by 2010 Q1) Bio-motivated computer Mutch, 2009 vision
    74. GPU acceleration • GPU can run certain • 97 times speed over our best CPU implementation classes of algorithms 50-100x faster than a CPU • Designed to run same • 0.291 sec/image for a 256x256 pixel image program (“kernel”) for each element of a large 2D grid • 240 parallel processors! • currently downloading +processing about 300K (512 by 2010 Q1) images from internet / per day Bio-motivated computer Mutch, 2009 vision
    75. Recognition in videos
    76. Source: Wikipedia, “ventral stream” Ungerleider & Mishkin ‘84
    77. Source: Wikipedia, “ventral stream” ventral stream “shape pathway” Ungerleider & Mishkin ‘84
    78. Source: Wikipedia, “ventral stream” dorsal stream “motion pathway” ventral stream “shape pathway” Ungerleider & Mishkin ‘84
    79. Source: Wikipedia, “ventral stream” dorsal stream “motion pathway” ventral stream “shape pathway” Ungerleider & Mishkin ‘84
    80. Action recognition in video sequences motion-sensitive MT-like units bend jump 2 wave 2 side jack wave 1 walk jump run Bio-motivated computer Jhuang Serre Wolf & Poggio 2007 vision
    81. Action recognition in video sequences Dollar et model chance al ‘05 KTH Human 81.3% 91.6% 16.7% Weiz. Human 86.7% 96.3% 11.1% UCSD Mice 75.6% 79.0% 20.0% ★ Cross-validation: 2/3 training, 1/3 testing, 10 repeats Bio-motivated computer Jhuang Serre Wolf & Poggio 2007 vision
    82. Automatic recognition of rodent behavior • Limit subjectivity of human intervention and stress on the animal (compared to standardized tests) • 24 hr surveillance towards assessing well-being of animals • Help validate models of mental and neuro-generative diseases (Huntington, schizophrenia, autism, etc) • Help assess efficacy of drugs
    83. Behaviors of interest
    84. Data Set • Manually annotated two sets: • Frame accurate action clips: • ~50 man-hr/hr of video • 4000 clips • Fine-tuning system parameters (speed tuning, spatial resolution, feature learning, etc) • Fully (continuous) annotated videos (less acurate): • ~20 man-hr/hr of video • Learning temporal statistics
    85. • Proof of concept with 8 primitive behaviors (groom, eat, hang, drink, walk, jump, micro-move, rest) • System is trainable and could be trained for additional behaviors Demo available at http:// techtv.mit.edu/ videos/1838 Automatic recognition of Serre* Jhuang* Garrote Poggio Steele in prep rodent behavior
    86. • Proof of concept with 8 primitive behaviors (groom, eat, hang, drink, Performance walk, jump, micro-move, rest) human 72% agreement • System is trainable and could be trained for additional behaviors proposed 71% system Demo available commercial 56% at http:// system techtv.mit.edu/ videos/1838 chance 12% Automatic recognition of Serre* Jhuang* Garrote Poggio Steele in prep rodent behavior
    87. Computer system vs. human
    88. • 24 hour monitoring of 4 different strains (n=8): • CAST/EiJ (wild-like strain) • C57Bl/6J (popular inbred mouse strains) • DBA/2J (popular inbred mouse strains) • BTBR2 (potential model of autism) • Corresponds to about 7 yr of work for manual scoring Behavioral comparison Serre* Jhuang* Garrote Poggio Steele in prep between 4 strains
    89. Behavioral comparison between 4 strains
    90. Behavioral comparison between 4 strains
    91. Predicting strain based on behavior
    92. Summary • Feedforward hierarchical model of visual perception seems consistent with “immediate recognition”, i.e. during passive viewing or when visual system forced to operate without top-down cortical feedback • Application to automated behavior recognition in home-cage mice • Beyond feedforward processing: • Cortical feedback • Shifts of attention
    93. PFC IT V4/PIT integrated model of attention and recognition V2 Neuroscience of attention in collaboration with Desimone and Bayesian inference lab (monkey electrophysiology)
    94. PFC feature-based attention IT V4/PIT integrated model of attention and recognition V2 Neuroscience of attention in collaboration with Desimone and Bayesian inference lab (monkey electrophysiology)
    95. PFC feature-based attention IT LIP/FEF V4/PIT spatial attention integrated model of attention and recognition V2 Neuroscience of attention in collaboration with Desimone and Bayesian inference lab (monkey electrophysiology)
    96. performance (d’) one shift of no attention attention Model Humans Model performance Chikkerur Serre & Poggio in prep improves with attention
    97. 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Model performance Chikkerur Serre & Poggio in prep improves with attention
    98. 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Model performance Chikkerur Serre & Poggio in prep improves with attention
    99. 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Model performance Chikkerur Serre & Poggio in prep improves with attention
    100. mask no mask 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Model performance Chikkerur Serre & Poggio in prep improves with attention
    101. Other: • Narcisse Bichot • Stan Bileschi • Charles Cadieu • Robert Desimone • Jim DiCarlo • Michelle Fabre-Thorpe Tomaso Poggio • Winrich Freiwald • Estibaliz Garrote • Hueihan Jhuang • Ulf Knoblich • Christof Koch • Minjoon Kouh • Gabriel Kreiman • Timothee Masquelier • Leila Reddy • David Sheinberg Hueihan Jhuang Estibaliz Garrote Andrew Steele • Jed Singer • Andrew Steele • Simon Thorpe • Nao Tsuchyia Acknowledgments • Lior Wolf • Ying Zhang

    + tserretserre, 5 months ago

    custom

    974 views, 3 favs, 0 embeds more stats

    Presentation given at Yale University in August 200 more

    More info about this presentation

    © All Rights Reserved

    • Total Views 974
      • 974 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 3
    • Downloads 76
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories