Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AAAI08 tutorial: visual object recognition

4,626 views

Published on

Published in: Education
  • part 1: 21
    part 2: 69
    part 3: 113
    part 4: 138
    part 5: 186
    part 6: 211
    part 7: 252
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

AAAI08 tutorial: visual object recognition

  1. 1. Visual Object RecognitionVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Bastian Leibe & Kristen Grauman Computer Vision Laboratory Department of Computer Sciences ETH Zurich University of Texas in Austin Chicago, 14.07.2008
  2. 2. Perceptual and Sensory Augmented Visual Object Recognition Tutorial ComputingK. Grauman, B. Leibe ? ??? Identification vs. Categorization 2
  3. 3. Object Categorization • How to recognize ANY carVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented • How to recognize ANY cow 3 K. Grauman, B. Leibe
  4. 4. What could be done with recognition algorithms? There is a wide range of applications, including…Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Autonomous robots Navigation, driver safety Situated search Content-based retrieval and analysis for Medical image images and videos analysis
  5. 5. Object Categorization • Task Description “Given a small number of training images of a category,Visual Object Recognition Tutorial Computing recognize a-priori unknown instances of that category and assign the correct category label.” • Which categories are feasible visually?Perceptual and Sensory Augmented Extensively studied in Cognitive Psychology, e.g. [Brown’58] “Fido” German dog animal living shepherd being 5 K. Grauman, B. Leibe
  6. 6. Visual Object Categories • Basic Level Categories in human categorization [Rosch 76, Lakoff 87]Visual Object Recognition Tutorial Computing The highest level at which category members have similar perceived shape The highest level at which a single mental image reflects thePerceptual and Sensory Augmented entire category The level at which human subjects are usually fastest at identifying category members The first level named and understood by children The highest level at which a person uses similar motor actions for interaction with category members 6 K. Grauman, B. Leibe
  7. 7. Visual Object Categories • Basic-level categories in humans seem to be defined predominantly visually.Visual Object Recognition Tutorial Computing • There is evidence that humans (usually) … start with basic-level categorization before doing identification.Perceptual and Sensory Augmented animal ⇒ Basic-level categorization is easier Abstract and faster for humans than object levels … … identification! quadruped ⇒ Most promising starting point … for visual classification Basic level dog cat cow German Doberman shepherd Individual … “Fido” … level 7 K. Grauman, B. Leibe
  8. 8. Other Types of Categories • Functional Categories e.g. chairs = “something you can sit on”Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented 8 K. Grauman, B. Leibe
  9. 9. Other Types of Categories • Ad-hoc categories e.g. “something you can find in an office environment”Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented 9 K. Grauman, B. Leibe
  10. 10. Levels of Object Categorization “cow”Visual Object Recognition Tutorial Computing “car”Perceptual and Sensory Augmented “motorbike” • Different levels of recognition Which object class is in the image? ⇒ Obj/Img classification Where is it in the image? ⇒ Detection/Localization Where exactly ― which pixels? ⇒ Figure/Ground segmentation 10 K. Grauman, B. Leibe
  11. 11. Challenges: robustnessVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Illumination Object pose Clutter Occlusions Intra-class Viewpoint appearance K. Grauman, B. Leibe
  12. 12. Challenges: robustnessVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented • Detection in Crowded Scenes Learn object variability – Changes in appearance, scale, and articulation Compensate for clutter, overlap, and occlusion 12 K. Grauman, B. Leibe
  13. 13. Perceptual and Sensory Augmented Visual Object Recognition Tutorial ComputingK. Grauman, B. Leibe Challenges: context and human experience
  14. 14. Challenges: context and human experienceVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Context cues Dynamics Image credit: D. Hoeim Video credit: J. Davis
  15. 15. Challenges: scale, efficiency • Thousands to millions of pixels in an image • Estimated 30 Gigapixels of image/video content generated per secondVisual Object Recognition Tutorial Computing • About half of the cerebral cortex in primates is devoted to processing visual information [Felleman and van Essen 1991]Perceptual and Sensory Augmented • 3,000-30,000 human recognizable object categories • 30+ degrees of freedom in the pose of articulated objects (humans) • Billions of images indexed by Google Image Search • 18 billion+ prints produced from digital camera images in 2004 • 295.5 million camera phones sold in 2005 K. Grauman, B. Leibe
  16. 16. Perceptual and Sensory Augmented Visual Object Recognition Tutorial Computing LessK. Grauman, B. Leibe More Challenges: learning with minimal supervision
  17. 17. Rough evolution of focus in recognition researchVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented 1980s 1990s to early 2000s Currently K. Grauman, B. Leibe
  18. 18. This tutorial • Intended for broad AAAI audience Assuming basic familiarity with machine learning, linear algebra,Visual Object Recognition Tutorial Computing probability Not assuming significant vision backgroundPerceptual and Sensory Augmented • Our goals Describe main approaches to recognition Highlight past successes and future challenges Provide the pointers (to literature and tools) that would allow you to take advantage of existing techniques in your research • Questions welcome 18 K. Grauman, B. Leibe
  19. 19. Outline 1. Detection with Global Appearance & Sliding WindowsVisual Object Recognition Tutorial Computing 2. Local Invariant Features: Detection & Description 3. Specific Object Recognition with Local FeaturesPerceptual and Sensory Augmented ― Coffee Break ― 4. Visual Words: Indexing, Bags of Words Categorization 5. Matching Local Features 6. Part-Based Models for Categorization 7. Current Challenges and Research Directions 19 K. Grauman, B. Leibe
  20. 20. Visual Object RecognitionVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Bastian Leibe & Kristen Grauman Computer Vision Laboratory Department of Computer Sciences ETH Zurich University of Texas in Austin Chicago, 14.07.2008
  21. 21. Outline 1. Detection with Global Appearance & Sliding Windows 2. Local Invariant Features: Detection & DescriptionVisual Object Recognition Tutorial Computing 3. Specific Object Recognition with Local FeaturesPerceptual and Sensory Augmented ― Coffee Break ― 4. Visual Words: Indexing, Bags of Words Categorization 5. Matching Local Features 6. Part-Based Models for Categorization 7. Current Challenges and Research Directions 2 K. Grauman, B. Leibe
  22. 22. Detection via classification: Main idea Basic component: a binary classifierVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Car/non-car Classifier No, notcar. Yes, a car. K. Grauman, B. Leibe
  23. 23. Detection via classification: Main idea If object may be in a cluttered scene, slide a window around looking for it.Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Car/non-car Classifier K. Grauman, B. Leibe
  24. 24. Detection via classification: Main idea Fleshing out this pipeline a bit more, we need to:Visual Object Recognition Tutorial Computing 1. Obtain training data 2. Define features 3. Define classifierPerceptual and Sensory Augmented Training examples Car/non-car Classifier Feature extraction K. Grauman, B. Leibe
  25. 25. Detection via classification: Main idea • Consider all subwindows in an image Sample at multiple scales and positionsVisual Object Recognition Tutorial Computing • Make a decision per window: “Does this contain object category X or not?”Perceptual and Sensory Augmented • In this section, we’ll focus specifically on methods using a global representation (i.e., not part-based, not local features). 6 K. Grauman, B. Leibe
  26. 26. Feature extraction: global appearance Feature extractionVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Simple holistic descriptions of image content grayscale / color histogram vector of pixel intensities K. Grauman, B. Leibe
  27. 27. Eigenfaces: global appearance description An early appearance-based approach to face recognition Generate low-Visual Object Recognition Tutorial Computing dimensional representation Mean of appearance with a linearPerceptual and Sensory Augmented Eigenvectors computed Training images from covariance matrix subspace. ... Project new images to “face ≈ space”. + + ++ Mean Recognition via nearest neighbors in face space Turk & Pentland, 1991 K. Grauman, B. Leibe
  28. 28. Feature extraction: global appearance • Pixel-based representations sensitive to small shiftsVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented • Color or grayscale-based appearance description can be sensitive to illumination and intra-class appearance variation Cartoon example: an albino koala K. Grauman, B. Leibe
  29. 29. Gradient-based representations • Consider edges, contours, and (oriented) intensity gradientsVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented K. Grauman, B. Leibe
  30. 30. Gradient-based representations: Matching edge templates • Example: Chamfer matchingVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Input Edges Distance Template Best image detected transform shape match At each window position, compute average min distance between points on template (T) and input (I). Gavrila & Philomin ICCV 1999 K. Grauman, B. Leibe
  31. 31. Gradient-based representations: Matching edge templates • Chamfer matchingVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Hierarchy of templates Gavrila & Philomin ICCV 1999 K. Grauman, B. Leibe
  32. 32. Gradient-based representations • Consider edges, contours, and (oriented) intensity gradientsVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented • Summarize local distribution of gradients with histogram Locally orderless: offers invariance to small shifts and rotations Contrast-normalization: try to correct for variable illumination K. Grauman, B. Leibe
  33. 33. Gradient-based representations: Histograms of oriented gradients (HoG)Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Map each grid cell in the input window to a histogram counting the gradients per orientation. Code available: http://pascal.inrialpes.fr/soft/olt/ Dalal & Triggs, CVPR 2005 K. Grauman, B. Leibe
  34. 34. Gradient-based representations: SIFT descriptorVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Local patch descriptor (more on this later) Code: http://vision.ucla.edu/~vedaldi/code/sift/sift.html Binary: http://www.cs.ubc.ca/~lowe/keypoints/ Lowe, ICCV 1999 K. Grauman, B. Leibe
  35. 35. Gradient-based representations: Biologically inspired featuresVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Convolve with Gabor filters at multiple orientations Pool nearby units (max) Intermediate layers compare input to prototype patches Serre, Wolf, Poggio, CVPR 2005 Mutch & Lowe, CVPR 2006 K. Grauman, B. Leibe
  36. 36. Gradient-based representations: Rectangular featuresVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Compute differences between sums of pixels in rectangles Captures contrast in adjacent spatial regions Similar to Haar wavelets, efficient to compute Viola & Jones, CVPR 2001 K. Grauman, B. Leibe
  37. 37. Gradient-based representations: Shape context descriptor Count the number of points inside each bin, e.g.:Visual Object Recognition Tutorial Computing Count = 4Perceptual and Sensory Augmented ... Count = 10 Log-polar binning: more precision for nearby points, more flexibility for farther points. Local descriptor Belongie, Malik & Puzicha, ICCV 2001 (more on this later) K. Grauman, B. Leibe
  38. 38. Classifier construction • How to compute a decision for eachVisual Object Recognition Tutorial Computing subwindow?Perceptual and Sensory Augmented Image feature K. Grauman, B. Leibe
  39. 39. Discriminative vs. generative models Pr(image, car ) Pr(image, ¬car ) Generative: separatelyVisual Object Recognition Tutorial Computing 0.1 0.05 model class-conditional and prior densities 0 0 10 20 30 40 50 60 70 image featurePerceptual and Sensory Augmented Pr(car | image) Pr(¬car | image) Discriminative: directly 1 x = data model posterior 0.5 0 0 10 20 30 40 50 60 70 image feature Plots from Antonio Torralba 2007 K. Grauman, B. Leibe
  40. 40. Discriminative vs. generative models • Generative: + possibly interpretableVisual Object Recognition Tutorial Computing + can draw samples - models variability unimportant to classification task - often hard to build good model with few parametersPerceptual and Sensory Augmented • Discriminative: + appealing when infeasible to model data itself + excel in practice - often can’t provide uncertainty in predictions - non-interpretable 21 K. Grauman, B. Leibe
  41. 41. Discriminative methods Nearest neighbor Neural networksVisual Object Recognition Tutorial Computing 106 examples Shakhnarovich, Viola, Darrell 2003 LeCun, Bottou, Bengio, Haffner 1998 Berg, Berg, Malik 2005... Rowley, Baluja, Kanade 1998Perceptual and Sensory Augmented … Support Vector Machines Boosting Conditional Random Fields Guyon, Vapnik Viola, Jones 2001, McCallum, Freitag, Pereira Heisele, Serre, Poggio, Torralba et al. 2004, 2000; Kumar, Hebert 2003 2001,… Opelt et al. 2006,… … K. Grauman, B. Leibe Slide adapted from Antonio Torralba
  42. 42. Boosting • Build a strong classifier by combining number of “weak classifiers”, which need only be better than chanceVisual Object Recognition Tutorial Computing • Sequential learning process: at each iteration, add a weak classifier • Flexible to choice of weak learnerPerceptual and Sensory Augmented including fast simple classifiers that alone may be inaccurate • We’ll look at Freund & Schapire’s AdaBoost algorithm Easy to implement Base learning algorithm for Viola-Jones face detector 23 K. Grauman, B. Leibe
  43. 43. AdaBoost: Intuition Consider a 2-d feature space with positive andVisual Object Recognition Tutorial Computing negative examples. Each weak classifier splitsPerceptual and Sensory Augmented the training examples with at least 50% accuracy. Examples misclassified by a previous weak learner are given more emphasis at future rounds. Figure adapted from Freund and Schapire 24 K. Grauman, B. Leibe
  44. 44. Visual Object Recognition Tutorial Computing Perceptual and Sensory Augmented AdaBoost: IntuitionK. Grauman, B. Leibe 25
  45. 45. AdaBoost: IntuitionVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Final classifier is combination of the weak classifiers 26 K. Grauman, B. Leibe
  46. 46. AdaBoost Algorithm Start with uniform weights on training examples {x1,…xn}Visual Object Recognition Tutorial Computing EvaluatePerceptual and Sensory Augmented weighted error for each feature, pick best. Incorrectly classified -> more weight Correctly classified -> less weight Final classifier is combination of the weak ones, weighted according to error they had. Freund & Schapire 1995
  47. 47. Cascading classifiers for detection For efficiency, apply less accurate but faster classifiers first to immediately discardVisual Object Recognition Tutorial Computing windows that clearly appear to be negative; e.g.,Perceptual and Sensory Augmented Filter for promising regions with an initial inexpensive classifier Build a chain of classifiers, choosing cheap ones with low false negative rates early in the chain Fleuret & Geman, IJCV 2001 Rowley et al., PAMI 1998 Viola & Jones, CVPR 2001 28 K. Grauman, B. Leibe Figure from Viola & Jones CVPR 2001
  48. 48. Example: Face detection • Frontal faces are a good example of a class where global appearance models + a sliding windowVisual Object Recognition Tutorial Computing detection approach fit well: Regular 2D structure Center of face almost shaped like a “patch”/windowPerceptual and Sensory Augmented • Now we’ll take AdaBoost and see how the Viola- Jones face detector works 29 K. Grauman, B. Leibe
  49. 49. Feature extraction “Rectangular” filters Feature output is difference between adjacent regionsVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Value at (x,y) is sum of pixels Efficiently computable above and to the with integral image: any left of (x,y) sum can be computed in constant time Avoid scaling images scale features directly Integral image for same cost Viola & Jones, CVPR 2001 30 K. Grauman, B. Leibe
  50. 50. Large library of filters Considering all possible filter parameters:Visual Object Recognition Tutorial Computing position, scale, and type:Perceptual and Sensory Augmented 180,000+ possible features associated with each 24 x 24 window Use AdaBoost both to select the informative features and to form the classifier Viola & Jones, CVPR 2001
  51. 51. AdaBoost for feature+classifier selection • Want to select the single rectangle feature and threshold that best separates positive (faces) and negative (non- faces) training examples, in terms of weighted error.Visual Object Recognition Tutorial Computing Resulting weak classifier:Perceptual and Sensory Augmented For next round, reweight the … examples according to errors, Outputs of a possible choose another filter/threshold rectangle feature on combo. faces and non-faces. Viola & Jones, CVPR 2001
  52. 52. Viola-Jones Face Detector: Summary Train cascade of classifiers withVisual Object Recognition Tutorial Computing AdaBoost ow h ind eac Faces bw o New image su ply tPerceptual and Sensory Augmented Ap Selected features, Non-faces thresholds, and weights • Train with 5K positives, 350M negatives • Real-time detector using 38 layer cascade • 6061 features in final layer • [Implementation available in OpenCV: http://www.intel.com/technology/computing/opencv/] 33 K. Grauman, B. Leibe
  53. 53. Viola-Jones Face Detector: ResultsVisual Object Recognition Tutorial Computing First two features selectedPerceptual and Sensory Augmented 34 K. Grauman, B. Leibe
  54. 54. Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Viola-Jones Face Detector: Results
  55. 55. Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Viola-Jones Face Detector: Results
  56. 56. Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Viola-Jones Face Detector: Results
  57. 57. Profile Features Detecting profile faces requires training separate detector with profile examples.Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented
  58. 58. Visual Object Recognition Tutorial Computing Perceptual and Sensory AugmentedPaul Viola, ICCV tutorial Viola-Jones Face Detector: Results
  59. 59. Example application Frontal facesVisual Object Recognition Tutorial Computing detected and then tracked, characterPerceptual and Sensory Augmented names inferred with alignment of script and subtitles. Everingham, M., Sivic, J. and Zisserman, A. "Hello! My name is... Buffy" - Automatic naming of characters in TV video, BMVC 2006. http://www.robots.ox.ac.uk/~vgg/research/nface/index.html 40 K. Grauman, B. Leibe
  60. 60. Pedestrian detection • Detecting upright, walking humans also possible using sliding window’s appearance/texture; e.g.,Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented SVM with Haar wavelets Space-time rectangle SVM with HoGs [Dalal & [Papageorgiou & Poggio, IJCV features [Viola, Jones & Triggs, CVPR 2005] 2000] Snow, ICCV 2003] K. Grauman, B. Leibe
  61. 61. Highlights • Sliding window detection and global appearance descriptors:Visual Object Recognition Tutorial Computing Simple detection protocol to implement Good feature choices critical Past successes for certain classesPerceptual and Sensory Augmented 42 K. Grauman, B. Leibe
  62. 62. Limitations • High computational complexity For example: 250,000 locations x 30 orientations x 4 scales = 30,000,000 evaluations!Visual Object Recognition Tutorial Computing If training binary detectors independently, means cost increases linearly with number of classes • With so many windows, false positive rate better be lowPerceptual and Sensory Augmented 43 K. Grauman, B. Leibe
  63. 63. Limitations (continued) • Not all objects are “box” shapedVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented 44 K. Grauman, B. Leibe
  64. 64. Limitations (continued) • Non-rigid, deformable objects not captured well with representations assuming a fixed 2d structure; or must assume fixed viewpointVisual Object Recognition Tutorial Computing • Objects with less-regular textures not captured well with holistic appearance-based descriptionsPerceptual and Sensory Augmented 45 K. Grauman, B. Leibe
  65. 65. Limitations (continued) • If considering windows in isolation, context is lostVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Sliding window Detector’s view 46 Figure credit: Derek Hoiem K. Grauman, B. Leibe
  66. 66. Limitations (continued) • In practice, often entails large, cropped training set (expensive)Visual Object Recognition Tutorial Computing • Requiring good match to a global appearance description can lead to sensitivity to partial occlusionsPerceptual and Sensory Augmented 47 Image credit: Adam, Rivlin, & Shimshoni K. Grauman, B. Leibe
  67. 67. Outline 1. Detection with Global Appearance & Sliding Windows 2. Local Invariant Features: Detection & DescriptionVisual Object Recognition Tutorial Computing 3. Specific Object Recognition with Local FeaturesPerceptual and Sensory Augmented ― Coffee Break ― 4. Visual Words: Indexing, Bags of Words Categorization 5. Matching Local Features 6. Part-Based Models for Categorization 7. Current Challenges and Research Directions 48 K. Grauman, B. Leibe
  68. 68. Visual Object RecognitionVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Bastian Leibe & Kristen Grauman Computer Vision Laboratory Department of Computer Sciences ETH Zurich University of Texas in Austin Chicago, 14.07.2008
  69. 69. Outline 1. Detection with Global Appearance & Sliding WindowsVisual Object Recognition Tutorial Computing 2. Local Invariant Features: Detection & Description 3. Specific Object Recognition with Local FeaturesPerceptual and Sensory Augmented ― Coffee Break ― 4. Visual Words: Indexing, Bags of Words Categorization 5. Matching Local Features 6. Part-Based Models for Categorization 7. Current Challenges and Research Directions 2 K. Grauman, B. Leibe
  70. 70. Motivation • Global representations have major limitations • Instead, describe and match only local regionsVisual Object Recognition Tutorial Computing • Increased robustness to OcclusionsPerceptual and Sensory Augmented Articulation d dq φ φ θq θ Intra-category variations 3 K. Grauman, B. Leibe
  71. 71. Approach 1. Find a set of distinctive key- pointsVisual Object Recognition Tutorial Computing A1 2. Define a region around each A2 A3 keypointPerceptual and Sensory Augmented 3. Extract and normalize the region content fA Similarity fB measure 4. Compute a local N pixels descriptor from the e.g. color e.g. color normalized region N pixels d ( f A, fB ) < T 5. Match local descriptors 4 K. Grauman, B. Leibe
  72. 72. Requirements • Region extraction needs to be repeatable and precise Translation, rotation, scale changesVisual Object Recognition Tutorial Computing (Limited out-of-plane (≈affine) transformations) ≈ Lighting variationsPerceptual and Sensory Augmented • We need a sufficient number of regions to cover the object • The regions should contain “interesting” structure 5 K. Grauman, B. Leibe
  73. 73. Many Existing Detectors Available • Hessian & Harris [Beaudet ‘78], [Harris ‘88] • Laplacian, DoG [Lindeberg ‘98], [Lowe 1999]Visual Object Recognition Tutorial Computing • Harris-/Hessian-Laplace [Mikolajczyk & Schmid ‘01] • Harris-/Hessian-Affine [Mikolajczyk & Schmid ‘04] • EBR and IBR [Tuytelaars & Van Gool ‘04]Perceptual and Sensory Augmented • MSER [Matas ‘02] • Salient Regions [Kadir & Brady ‘01] • Others… 6 K. Grauman, B. Leibe
  74. 74. Keypoint LocalizationVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented • Goals: Repeatable detection Precise localization Interesting content ⇒ Look for two-dimensional signal changes 7 K. Grauman, B. Leibe
  75. 75. Hessian Detector [Beaudet78] • Hessian determinant IxxVisual Object Recognition Tutorial Computing  I xx I xy  Hessian ( I ) =   I xy I yy  Perceptual and Sensory Augmented Iyy Ixy Intuition: Search for strong derivatives in two orthogonal directions 8 K. Grauman, B. Leibe
  76. 76. Hessian Detector [Beaudet78] • Hessian determinant IxxVisual Object Recognition Tutorial Computing  I xx I xy  Hessian ( I ) =   I xy I yy  Perceptual and Sensory Augmented Iyy Ixy 2 det( Hessian( I )) = I xx I yy − I xy In Matlab: I xx . ∗ I yy − ( I xy )^ 2 9 K. Grauman, B. Leibe
  77. 77. Hessian Detector – Responses [Beaudet78]Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Effect: Responses mainly on corners and strongly textured areas. 10
  78. 78. Perceptual and Sensory Augmented Visual Object Recognition Tutorial Computing Hessian Detector – Responses [Beaudet78]11
  79. 79. Harris Detector [Harris88] • Second moment matrix (autocorrelation matrix)Visual Object Recognition Tutorial Computing  I x2 (σ D ) I x I y (σ D ) µ (σ I , σ D ) = g (σ I ) ∗  2    I x I y (σ D ) I y (σ D )  Perceptual and Sensory Augmented Intuition: Search for local neighborhoods where the image content has two main directions (eigenvectors). 12 K. Grauman, B. Leibe
  80. 80. Harris Detector [Harris88] • Second moment matrix (autocorrelation matrix)Visual Object Recognition Tutorial Computing  I x2 (σ D ) I x I y (σ D ) µ (σ I , σ D ) = g (σ I ) ∗  2    I x I y (σ D ) I y (σ D )  Perceptual and Sensory Augmented Ix Iy 1. Image derivatives gx(σD), gy(σD), 13 K. Grauman, B. Leibe
  81. 81. Harris Detector [Harris88] • Second moment matrix (autocorrelation matrix)Visual Object Recognition Tutorial Computing  I x2 (σ D ) I x I y (σ D ) µ (σ I , σ D ) = g (σ I ) ∗  2    I x I y (σ D ) I y (σ D )  Perceptual and Sensory Augmented Ix Iy 1. Image derivatives gx(σD), gy(σD), Ix2 Iy2 IxIy 2. Square of derivatives 14 K. Grauman, B. Leibe
  82. 82. Harris Detector [Harris88] • Second moment matrix (autocorrelation matrix)Visual Object Recognition Tutorial Computing  I x2 (σ D ) I x I y (σ D ) µ (σ I , σ D ) = g (σ I ) ∗  2    I x I y (σ D ) I y (σ D )   Ix Iy 1. ImagePerceptual and Sensory Augmented derivatives Iy 2. Square of Ix2 Iy2 IxIy 1. Image derivatives derivatives gx(σD), gy(σD), 2.3. Square of Gaussian filter g(σI) derivatives g(Ix2) g(Iy2) g(IxIy) 15
  83. 83. Harris Detector [Harris88] • Second moment matrix (autocorrelation matrix)  I x2 (σ D ) I x I y (σ D )Visual Object Recognition Tutorial Computing µ (σ I , σ D ) = g (σ I ) ∗  2  Ix Iy  I x I y (σ D ) I y (σ D )  1. Image   derivatives Ix2 Iy2 IxIyPerceptual and Sensory Augmented 2. Square of derivatives Iy 3. Gaussian filter g(σI) g(Ix2) g(Iy2) g(IxIy) 4. Cornerness function – both eigenvalues are strong har = det[µ (σ I ,σ D)] − α [trace(µ (σ I ,σ D))] = g ( I x2 ) g ( I y ) − [ g ( I x I y )]2 − α [ g ( I x2 ) + g ( I y )]2 2 2 g(IxIy) 5. Non-maxima suppression har 16
  84. 84. Harris Detector – Responses [Harris88]Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Effect: A very precise corner detector. 17
  85. 85. Perceptual and Sensory Augmented Visual Object Recognition Tutorial Computing Harris Detector – Responses [Harris88]18
  86. 86. Automatic Scale SelectionVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented f ( I i1Kim ( x, σ )) = f ( I i1Kim ( x′, σ ′)) Same operator responses if the patch contains the same image up to scale factor How to find corresponding patch sizes? 19 K. Grauman, B. Leibe
  87. 87. Automatic Scale Selection • Function responses for increasing scale (scale signature)Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented f ( I i1Kim ( x, σ )) f ( I i1Kim ( x′, σ )) 20 K. Grauman, B. Leibe
  88. 88. Automatic Scale Selection • Function responses for increasing scale (scale signature)Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented f ( I i1Kim ( x, σ )) f ( I i1Kim ( x′, σ )) 21 K. Grauman, B. Leibe
  89. 89. Automatic Scale Selection • Function responses for increasing scale (scale signature)Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented f ( I i1Kim ( x, σ )) f ( I i1Kim ( x′, σ )) 22 K. Grauman, B. Leibe
  90. 90. Automatic Scale Selection • Function responses for increasing scale (scale signature)Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented f ( I i1Kim ( x, σ )) f ( I i1Kim ( x′, σ )) 23 K. Grauman, B. Leibe
  91. 91. Automatic Scale Selection • Function responses for increasing scale (scale signature)Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented f ( I i1Kim ( x, σ )) f ( I i1Kim ( x′, σ )) 24 K. Grauman, B. Leibe
  92. 92. Automatic Scale Selection • Function responses for increasing scale (scale signature)Visual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented f ( I i1Kim ( x, σ )) f ( I i1Kim ( x′, σ ′)) 25 K. Grauman, B. Leibe
  93. 93. What Is A Useful Signature Function? • Laplacian-of-Gaussian = “blob” detectorVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented 26 K. Grauman, B. Leibe
  94. 94. Laplacian-of-Gaussian (LoG) • Local maxima in scale σ5 space of Laplacian-of-Visual Object Recognition Tutorial Computing Gaussian σ4Perceptual and Sensory Augmented Lxx (σ ) + Lyy (σ ) σ3 σ2 ⇒ List of σ (x, y, s) 27 K. Grauman, B. Leibe
  95. 95. Perceptual and Sensory Augmented Visual Object Recognition Tutorial ComputingK. Grauman, B. Leibe Results: Laplacian-of-Gaussian 28
  96. 96. Difference-of-Gaussian (DoG) • Difference of Gaussians as approximation of the Laplacian-of-GaussianVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented - = 29 K. Grauman, B. Leibe
  97. 97. DoG – Efficient Computation • Computation in Gaussian scale pyramidVisual Object Recognition Tutorial ComputingPerceptual and Sensory Augmented Sampling with step σ4 =2 σ σ 1 σ Original image σ =2 4 σ 30 K. Grauman, B. Leibe
  98. 98. Perceptual and Sensory Augmented Visual Object Recognition Tutorial Computing Results: Lowe’s DoGK. Grauman, B. Leibe 31
  99. 99. Harris-Laplace [Mikolajczyk ‘01] 1. Initialization: Multiscale Harris corner detectionVisual Object Recognition Tutorial Computing σ4Perceptual and Sensory Augmented σ3 σ2 σ Computing Harris function Detecting local maxima 32
  100. 100. Harris-Laplace [Mikolajczyk ‘01] 1. Initialization: Multiscale Harris corner detection 2. Scale selection based on LaplacianVisual Object Recognition Tutorial Computing (same procedure with Hessian ⇒ Hessian-Laplace) Harris pointsPerceptual and Sensory Augmented Harris-Laplace points 33 K. Grauman, B. Leibe
  101. 101. Maximally Stable Extremal Regions [Matas ‘02] • Based on Watershed segmentation algorithm • Select regions that stay stable over a large parameterVisual Object Recognition Tutorial Computing rangePerceptual and Sensory Augmented 34 K. Grauman, B. Leibe
  102. 102. Perceptual and Sensory Augmented Visual Object Recognition Tutorial Computing Example Results: MSERK. Grauman, B. Leibe 35
  103. 103. You Can Try It At Home… • For most local feature detectors, executables are available online:Visual Object Recognition Tutorial Computing • http://robots.ox.ac.uk/~vgg/research/affine • http://www.cs.ubc.ca/~lowe/keypoints/ • http://www.vision.ee.ethz.ch/~surfPerceptual and Sensory Augmented 36 K. Grauman, B. Leibe
  104. 104. Orientation Normalization • Compute orientation histogram [Lowe, SIFT, 1999] • Select dominant orientationVisual Object Recognition Tutorial Computing • Normalize: rotate to fixed orientationPerceptual and Sensory Augmented 0 2π 37 T. Tuytelaars, B. Leibe
  105. 105. Local Descriptors • The ideal descriptor should be RepeatableVisual Object Recognition Tutorial Computing Distinctive Compact EfficientPerceptual and Sensory Augmented • Most available descriptors focus on edge/gradient information Capture texture information Color still relatively seldomly used (more suitable for homogenous regions) 38 K. Grauman, B. Leibe

×