Robust Object Recognition with Cortex-like Mechanisms (PAMI, 06) Presented by Ala Stolpnik T. Serre, L. Wolf, S. Bileschi,...
Introduction <ul><li>A general framework for the recognition of complex visual scenes </li></ul><ul><li>The system follows...
Scene Understanding Watch Out! Probably Hanging Out
The StreetScenes Database 3,547 Images, all taken with the same  camera , of the same type of  scene , and hand labeled wi...
More StreetScenes Examples Database Performance Measures Approach
Even More Street Scenes Examples Database Performance Measures Approach
Challenges: In-class variability Partial, or weak labeling Includes Rigid, Articulated and Amorphous objects  Database Per...
Challenges: In-class variability Partial, or weak labeling Includes  Rigid ,  Articulated  and  Amorphous  objects   Datab...
Texture Sample Locations Building, Tree, Road and Sky  Hand-drawn Labels Training  Sample   Locations Database Performance...
Input image Segmented image Texture classification Windowing Crop classification Output Texture-based objects pathway (e.g...
Texture-based Object Detection Input image Classification Smoothing Over Segmentation Tree / Not-Tree Standard Model  Feat...
Shape-based Object Detection Windowing Crop classification Output car car ped Car / Not-Car Standard Model  Feature Extrac...
Standard Model Features from a neuroscience view. Retina Complexity Approach
Standard Model Features from a neuroscience view. <ul><li>Simple units (S) – increase selectivity </li></ul><ul><li>Comple...
Overview <ul><li>Introduction </li></ul><ul><li>The model </li></ul><ul><li>Results </li></ul>Approach
S1 - Gabor filter <ul><li>θ represents the orientation </li></ul><ul><li>λ represents the wavelength of the cosine factor ...
Gabor filter - rotation Input sample Thetha = 0 Thetha = 90 Approach We use 4 different orientations: 0, 45, 90, 135
Gabor filter - scaling Lambda = 3.5 Lambda = 22.8 Lambda = 10.3 Approach We use 16 different scales from Lambda=3.5 to 22.8
S1 Input Image <ul><li>4 orientations (0, 45, 90, 135) </li></ul>C1 S2 C2 Approach Apply Gebor filter to gray scale image
Apply Gebor filter to gray scale image Input Image S1 C1 S2 C2 Approach <ul><li>4 orientations (0, 45, 90, 135) </li></ul>...
S1 S1 C1 S2 C2 Approach <ul><li>4 orientations  </li></ul><ul><li>16 scales for each orientation </li></ul><ul><li>Total: ...
Input Image S1 C1 S1 C1 S2 C2 Local maximization takes place in each orientation channel separately, and also over nearby ...
C1 S1 C1 S2 C2 Approach <ul><li>Local maximum over position and scale </li></ul><ul><li>4 orientations </li></ul><ul><li>8...
S1 -> C1 Approach S1 C1 S2 C2
S1 C1 S2 C2 Approach <ul><li>P i  is one of   the  N  features </li></ul><ul><li>X is image patch from C1 </li></ul><ul><l...
S2 Approach <ul><li>Filter C1 units with N previously seen patches (Pi) </li></ul><ul><li>Pi are in C1 format and is nXnX4...
C2 C2 is simply the global maximum of the S2 response image. S1 C1 S2 C2 Each Prototype gives rise to one C2 value. C2 = m...
Overview <ul><li>Make classification using SVM </li></ul>Approach
The learning stage <ul><li>Where do we get the Pi from? </li></ul><ul><li>Input: a collection of images (Task specific of ...
Model overview <ul><li>At the learning stage compute N prototype templates (Pi) from training images. </li></ul><ul><li>Ob...
Overview <ul><li>Goals </li></ul><ul><li>The model </li></ul><ul><li>Results </li></ul>
StreetScenes Database. Subjective Results Results
StreetScenes Database. Subjective Results Results
C2 vs. Sift – number of features Results
C2 vs. Sift – number of training examples Results
Object specific vs. universal features  Results
Conclusion <ul><li>A general framework for the recognition of complex visual scenes </li></ul><ul><li>The system follows t...
Thanks!
Upcoming SlideShare
Loading in …5
×

Ala Stolpnik's Standard Model talk

1,416 views

Published on

Robust Object Recognition with Cortex-like Mechanisms

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,416
On SlideShare
0
From Embeds
0
Number of Embeds
46
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • אני אדבר על שיטה לזיהוי עצמים בתמונות . נתרכז במיוחד בזיהוי עצמים בסצנות של רחוב .
  • Ala Stolpnik's Standard Model talk

    1. 1. Robust Object Recognition with Cortex-like Mechanisms (PAMI, 06) Presented by Ala Stolpnik T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio
    2. 2. Introduction <ul><li>A general framework for the recognition of complex visual scenes </li></ul><ul><li>The system follows the organization of visual cortex </li></ul><ul><li>Texture and shape based object recognition </li></ul>
    3. 3. Scene Understanding Watch Out! Probably Hanging Out
    4. 4. The StreetScenes Database 3,547 Images, all taken with the same camera , of the same type of scene , and hand labeled with the same objects , using the same labeling rules . Database Performance Measures Approach sky road tree building bicycle pedestrian car Object 2562 3400 4932 5067 209 1449 5799 # Labeled Examples
    5. 5. More StreetScenes Examples Database Performance Measures Approach
    6. 6. Even More Street Scenes Examples Database Performance Measures Approach
    7. 7. Challenges: In-class variability Partial, or weak labeling Includes Rigid, Articulated and Amorphous objects Database Performance Measures Approach
    8. 8. Challenges: In-class variability Partial, or weak labeling Includes Rigid , Articulated and Amorphous objects Database Performance Measures Approach
    9. 9. Texture Sample Locations Building, Tree, Road and Sky Hand-drawn Labels Training Sample Locations Database Performance Measures Approach
    10. 10. Input image Segmented image Texture classification Windowing Crop classification Output Texture-based objects pathway (e.g., trees, road..) Shape-based objects pathway (e.g., pedestrians, cars..) car car ped Approach Two Slightly Different Pathways
    11. 11. Texture-based Object Detection Input image Classification Smoothing Over Segmentation Tree / Not-Tree Standard Model Feature Extraction Classification Database Performance Measures Approach Feature Vector Decision Feature Vector Decision
    12. 12. Shape-based Object Detection Windowing Crop classification Output car car ped Car / Not-Car Standard Model Feature Extraction Statistical learning Classification Database Performance Measures Approach Feature Vector Decision
    13. 13. Standard Model Features from a neuroscience view. Retina Complexity Approach
    14. 14. Standard Model Features from a neuroscience view. <ul><li>Simple units (S) – increase selectivity </li></ul><ul><li>Complex units (C) – increase invariance </li></ul><ul><li>In our model we use 4 units: </li></ul><ul><ul><li>Image –> S1 –> C1 –> S2 –> C2 </li></ul></ul>C1 S2 C2 S1
    15. 15. Overview <ul><li>Introduction </li></ul><ul><li>The model </li></ul><ul><li>Results </li></ul>Approach
    16. 16. S1 - Gabor filter <ul><li>θ represents the orientation </li></ul><ul><li>λ represents the wavelength of the cosine factor </li></ul><ul><li>ψ is the phase offset in degrees (ψ=0) </li></ul><ul><li>γ is the spatial aspect ratio (γ=0.3) </li></ul>Approach
    17. 17. Gabor filter - rotation Input sample Thetha = 0 Thetha = 90 Approach We use 4 different orientations: 0, 45, 90, 135
    18. 18. Gabor filter - scaling Lambda = 3.5 Lambda = 22.8 Lambda = 10.3 Approach We use 16 different scales from Lambda=3.5 to 22.8
    19. 19. S1 Input Image <ul><li>4 orientations (0, 45, 90, 135) </li></ul>C1 S2 C2 Approach Apply Gebor filter to gray scale image
    20. 20. Apply Gebor filter to gray scale image Input Image S1 C1 S2 C2 Approach <ul><li>4 orientations (0, 45, 90, 135) </li></ul><ul><li>16 scales for each orientation </li></ul>
    21. 21. S1 S1 C1 S2 C2 Approach <ul><li>4 orientations </li></ul><ul><li>16 scales for each orientation </li></ul><ul><li>Total: 64 S1 units </li></ul>
    22. 22. Input Image S1 C1 S1 C1 S2 C2 Local maximization takes place in each orientation channel separately, and also over nearby scales. Approach
    23. 23. C1 S1 C1 S2 C2 Approach <ul><li>Local maximum over position and scale </li></ul><ul><li>4 orientations </li></ul><ul><li>8 scales for each orientation </li></ul>
    24. 24. S1 -> C1 Approach S1 C1 S2 C2
    25. 25. S1 C1 S2 C2 Approach <ul><li>P i is one of the N features </li></ul><ul><li>X is image patch from C1 </li></ul><ul><li>r is computed for each pixel in the image, for each scale and each Pi </li></ul>C1 S2 Prototype  =
    26. 26. S2 Approach <ul><li>Filter C1 units with N previously seen patches (Pi) </li></ul><ul><li>Pi are in C1 format and is nXnX4 dimensions </li></ul><ul><li>Each orientation in Pi is matched to the corresponding orientation in C1 </li></ul><ul><li>The result is one image per C1 per Pi </li></ul>S1 C1 S2 C2
    27. 27. C2 C2 is simply the global maximum of the S2 response image. S1 C1 S2 C2 Each Prototype gives rise to one C2 value. C2 = max ( ) Size of patch, sampling rate, etc. are Parameters of the system. Approach
    28. 28. Overview <ul><li>Make classification using SVM </li></ul>Approach
    29. 29. The learning stage <ul><li>Where do we get the Pi from? </li></ul><ul><li>Input: a collection of images (Task specific of general dictionary) </li></ul><ul><li>Each Pi is from nXnX4 dimensions. Where n can be 4, 8, 12, or 16. Pi selection: </li></ul><ul><ul><li>Select random image </li></ul></ul><ul><ul><li>Convert the image to C1 </li></ul></ul><ul><ul><li>Select nXn dimensional patch randomly from the C1 and this is our Pi </li></ul></ul>Approach
    30. 30. Model overview <ul><li>At the learning stage compute N prototype templates (Pi) from training images. </li></ul><ul><li>Object recognition: </li></ul><ul><ul><li>S1: apply 64 different Gabor filters to the image </li></ul></ul><ul><ul><li>C1: maximize the output of the filters locally </li></ul></ul><ul><ul><li>S2: measure “correlation” with Pi </li></ul></ul><ul><ul><li>C2: maximization over entire image per Pi </li></ul></ul>Approach
    31. 31. Overview <ul><li>Goals </li></ul><ul><li>The model </li></ul><ul><li>Results </li></ul>
    32. 32. StreetScenes Database. Subjective Results Results
    33. 33. StreetScenes Database. Subjective Results Results
    34. 34. C2 vs. Sift – number of features Results
    35. 35. C2 vs. Sift – number of training examples Results
    36. 36. Object specific vs. universal features Results
    37. 37. Conclusion <ul><li>A general framework for the recognition of complex visual scenes </li></ul><ul><li>The system follows the organization of visual cortex </li></ul><ul><li>Texture and shape based object recognition </li></ul><ul><li>Capability of learning from only a few training examples </li></ul>
    38. 38. Thanks!

    ×