Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fcv bio cv_simoncelli

249 views

Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Fcv bio cv_simoncelli

  1. 1. Synthesis for understandingand evaluating vision systems Eero Simoncelli Howard Hughes Medical Institute, Center for Neural Science, and Courant Institute of Mathematical Sciences New York University Frontiers in Computer Vision Workshop MIT, 21-24 Aug 2011
  2. 2. Computer graphics Visual Optics/imaging perception Computer Image Visual visionprocessing neuroscience Machine Robotics learning
  3. 3. Computer graphics Visual Optics/imaging perception Computer Image Visual visionprocessing neuroscience Machine Robotics learning
  4. 4. Optic Visual Retina Nerve Cortex LGN Optic TractWhy should computer vision care about biological vision?
  5. 5. Optic Visual Retina Nerve Cortex LGN Optic TractWhy should computer vision care about biological vision? • Optimized for general-purpose vision
  6. 6. Optic Visual Retina Nerve Cortex LGN Optic TractWhy should computer vision care about biological vision? • Optimized for general-purpose vision • Determines/limits what is perceived
  7. 7. Optic Visual Retina Nerve Cortex LGN Optic TractWhy should computer vision care about biological vision? • Optimized for general-purpose vision • Determines/limits what is perceived • Useful scientific testing methodologies
  8. 8. Illustrative example: building a classifier1. Transform input to some feature space2. Use ML to learn parameters on a large (labelled) data set3. Test on another data set4. Repeat
  9. 9. Illustrative example: building a classifier1. Transform input to some feature space2. Use ML to learn parameters on a large (labelled) data set3. Test on another data set4. Repeat
  10. 10. Which features? [Adelson & Bergen, 1985]
  11. 11. Which features?Oriented filters: capture stimulus-dependency of neuralresponses in primary visual cortex (area V1) Simple cell Complex cell + [Adelson & Bergen, 1985]
  12. 12. Which features?Oriented filters: capture stimulus-dependency of neuralresponses in primary visual cortex (area V1) Simple cell Complex cell + [Adelson & Bergen, 1985]
  13. 13. Which features?Oriented filters: capture stimulus-dependency of neuralresponses in primary visual cortex (area V1) Simple cell Complex cell + [Adelson & Bergen, 1985]
  14. 14. Retinal imageThe normalization model of simple cells Firing rate Retinal image Other cortical cellsRC circuit implementation Firing Retinal image rate Other cortical cells [Carandini, Heeger, and Movshon, 1996]
  15. 15. Retinal imageThe normalization model of simple cells Firing rate Retinal image Other cortical cellsRC circuit implementation Firing Retinal image rate Other cortical cells [Carandini, Heeger, and Movshon, 1996]
  16. 16. Dynamic retina/LGN model [Mante, Bonin & Carandini 2008]
  17. 17. 2-stage MT model Input: image intensities Input: V1 afferents 1 Linear Receptive ... ... Field Half-squaring ... Rectification ... 2 2 1 + + Divisive ... ...Normalization Output: V1 neurons tuned for Output: MT neurons tuned for spatio-temporal orientation local image velocity [Simoncelli & Heeger, 1998]
  18. 18. 2-stage MT model Input: image intensities Input: V1 afferents 1 Linear Receptive ... ... Field Half-squaring ... Rectification ... 2 2 1 + + Divisive ... ...Normalization Output: V1 neurons tuned for Output: MT neurons tuned for spatio-temporal orientation local image velocity [Simoncelli & Heeger, 1998]
  19. 19. Biology uses cascades of canonical operations....• Linear filters (local integrals and derivatives): selectivity/invariance• Static nonlinearities (rectification, exponential, sigmoid): dynamic range control• Pooling (sum of squares, max, etc): invariance• Normalization: preservation of tuning curves, suppression by non-optimal stimuli
  20. 20. Improved object recognition?“In many recent object recognition systems, feature extractionstages are generally composed of a filter bank, a non-lineartransformation, and some sort of feature pooling layer [...]We show that using non-linearities that include rectificationand local contrast normalization is the single most importantingredient for good accuracy on object recognitionbenchmarks. We show that two stages of feature extractionyield better accuracy than one....”- From the abstract of“What is the Best Multi-Stage Architecture for Object Recognition?”Kevin Jarrett, Koray Kavukcuoglu, Marc’Aurelio Ranzato and Yann LeCunICCV-2009
  21. 21. Using synthesis to test models I: Gender classification• 200 face images (100 male, 100 female)• Labeled by 27 human subjects• Four linear classifiers trained on subject data [Graf & Wichmann, NIPS*03]
  22. 22. Linear classifiersSVM RVM Prot FLD
  23. 23. Linear classifiersSVM RVM Prot FLD
  24. 24. Linear classifiers SVM RVM Prot FLD SVM RVM Prot FLD trained on! trueW data classifier vectors may be visualized as images:! subjW data
  25. 25. Validation by “gender-morphing” Subtract classifier Add classifier !=−21 !=−14 !=−7 !=0 !=7 !=14 !=21SVMRVMProtFLD [Wichmann, Graf, Simoncelli, Bülthoff, Schölkopf, NIPS*04]
  26. 26. Human subject responses Perceptual validation 100 SVM RVM % Correct Proto FLD 50 0.25 0.5 1.0 2.0 4.0 8.0 Amount of classifier image added/subtracted (arbitrary units) [Wichmann, Graf, Simoncelli, Bülthoff, Schölkopf, NIPS*04]
  27. 27. rates of an IT population of 200 neurons, despite variation evidence suggests that the ventral stream transforin object position and size [19]. It is important to note that (culminating in IT) solves object recognition by unta Using synthesis to test models II:using ‘stronger’ (e.g. non-linear) classifiers did not substan-tially improve recognition performance and the same object manifolds. For each visual image striking the e total transformation happens progressively (i.e. st Ventral stream representation [DiCarlo Cox, 2007]
  28. 28. Fa re fie V4 fieReceptive field size (deg) 25 V2 V 20 (1 ec 15 d 10 (b V1 si 5 T o 0 b 0 5 10 15 20 25 30 35 40 45 50 et Eccentricity, receptive center (deg) Receptive field field center (deg) ec lab [Gattass et. al., 1981; o Gattass et. al., 1988] th
  29. 29. V1 V4 V2 ITV1 V2 V4 IT [Freeman Simoncelli, Nature Neurosci, Sep 2011]
  30. 30. 2 1 1 Canonical computation Ventral stream “complex” cell Ventral stream V1 cells receptive fields + [Freeman Simoncelli, Nature Neurosci, Sep 2011]
  31. 31. 2 1 1 Canonical computation Ventral stream “complex” cell Ventral stream V1 cells receptive fields 3.1 1.4 + 12.5 . . . [Freeman Simoncelli, Nature Neurosci, Sep 2011]
  32. 32. 2 1 1 Canonical computation Ventral stream “complex” cell Ventral stream V1 cells receptive fields 3.1 1.4 + 12.5 . . . How do we test this? [Freeman Simoncelli, Nature Neurosci, Sep 2011]
  33. 33. Model model Original image responses model Synthesized image 3.1 1.4 12.5 . . 250 . 150 25 170 40Idea: synthesize random samples from the equivalenceclass of images with identical model responsesScientific prediction: such images should look the same(“Metamers”) [Freeman Simoncelli, Nature Neurosci, Sep 2011]
  34. 34. Model Original image responses Synthesized image 3.1 1.4 12.5 . . .Idea: synthesize random samples from the equivalenceclass of images with identical model responsesScientific prediction: such images should look the same(“Metamers”) [Freeman Simoncelli, Nature Neurosci, Sep 2011]
  35. 35. Model Original image responses Synthesized image 3.1 1.4 12.5 . . .Idea: synthesize random samples from the equivalenceclass of images with identical model responsesScientific prediction: such images should look the same(“Metamers”) [Freeman Simoncelli, Nature Neurosci, Sep 2011]
  36. 36. original image
  37. 37. synthesized image: should look thesame when you fixate on the red dot
  38. 38. Readingab [Freeman Simoncelli, Nature Neurosci, Sep 2011]
  39. 39. Camouflagec [Freeman Simoncelli, Nature Neurosci, Sep 2011]
  40. 40. Cascades of linear filtering, squaring/products,averaging over local regions....
  41. 41. Cascades of linear filtering, squaring/products,averaging over local regions.... Can this really lead to object recognition?
  42. 42. Cascades of linear filtering, squaring/products,averaging over local regions.... Can this really lead to object recognition?“Perhaps texture, somewhat redefined, is theprimitive stuff out of which form isconstructed” - Jerome Lettvin, 1976

×