SIFT: Scale Invariant Feature Transform Presenter: Michal Erel David G. Lowe,   " Distinctive image features from scale-invariant keypoints ,“ International Journal of Computer Vision,  60, 2 (2004), pp. 91-110
Object Recognition Find a particular object we've encountered before. Search for local features based on the appearance of the object at particular interest points
Why do we care about matching features? Object Recognition Location Recognition Image Alignment & Matching Stereo Matching Robot self localization Image retrieval by similarity (from large database)
Location Recognition
Panoramic Image Matching
We want invariance!!! Good features should be robust to all sorts of nastiness that can occur between images.
Types of invariance Illumination
Types of invariance Illumination Scale
Types of invariance Illumination Scale Rotation
Types of invariance Illumination Scale Rotation Affine
Types of invariance Illumination Scale Rotation Affine Perspective
SIFT- Scale Invariant Feature Transform The features are: Invariant to image scaling Invariant to rotation Partially invariant to: Change in illumination  Change in 3D camera viewpoint Occlusion, clutter, or noise
Step I: Detection of Scale-Space Extrema Identify locations and scales that can be assigned under different views of the same object Scale-Space function!
Scale-Space
Scale-Space To scale: take every second pixel in each row and column (another approach: average 4 pixels)
Difference of Gaussians (DOG) Sigma 4 Sigma2-Sigma4 Sigma 2
Scale-Space with DOG
Scale-Space with DOG
Local Extrema Detection Compare each pixel to: 8 neighbours in current image 9 neighbours in scale above 9 neighbours in scale below Take pixel if larger / smaller than all of them   This is called a  Keypoint
Keypoints Too many keypoints, some are unstable
Step II: Keypoint Localization Reject points with low contrast  Reject points that are localized along an edge.
Step II: Keypoint Localization Fit keypoint to nearby data for location, scale and ratio of principal curvatures. Reject points with low contrast & points that are localized along an edge.
Keypoint Localization Initial approach: locate keypoints at location and scale of the central sample point. New approach: try to calculate the interpolated location of the maximum. Improves matching and stability
Keypoint Localization Use Quadric Taylor Expansion of the scale-space function, so that the origin is at the sample point: (x is the offset from this point) Calculate extermum: if X > 0.5: the extermum lies closer to a different point (Need to recalculate…) Otherwise: add offset to the sample point location to get the estimated extremum ^
Reject Low Contrast Keypoints Calculate value of D at extremum point X: if |D(X)| < 0.03: discard keypoint for having a low contrast
Reject Low Contrast Keypoints
Eliminate Edge Responses: DoG function might have strong response along edges, even if unstable to small amounts of noise Edge identification: large principal curvature across the edge, but small one in perpendicular direction. Note  ♥ :  It's easy to show that the two principle curvatures (i.e., the min and max curvatures) are always along directions perpendicular to each other.  In general, finding the principle directions amounts to solving a nxn eigenvalue problem
Eliminate Edge Responses: No need to explicitly calculate the eigenvalues – we only need their ratio!! a = small eigenvalue  b = large eigenvalue r = ratio between large and small eigenvalues (r=a/b) (r+1)^2/r is at min when a=b, and increases as the ratio increases
Eliminate Edge Responses: To check if the ratio of the principal curvatures is below a threshold r, we only need to check if:  Use r = 10 to reject keypoints that lay along an edge
Reject Near-Edge Keypoints
832 keypoints 729 keypoints (eliminate low contrast) 536 keypoints (eliminate edge keypoints)
Step III: Orientation Assignment Each keypoint is assigned 1 or more orientations, based on local image gradient directions.  Data is trasformed relative to the assigned orientation, scale and location hence providing invariance to these transformations
Gradient Calculation The scale of the keypoint is used to select the Gaussian image L we’ll work on (image with closest scale) –  All computations are performed in a scale-invariant manner.  We calculate gradient magnitue and orientation using pixel differences:
Gradient Calculation
Gradient Calculation
Orientation Histogram Orientation histogram with 36 bins (each bin covers 10 degrees) Each sample added to the histogram bin is weighted by its gradient magnitude and by a Gaussian weighted circular window with theta = 1.5 times that of the keypoint scale
Orientation Histogram: Detect highest peak and local peaks that are within 80% of the highest peak. Use these to assign (1 or more) orientations
Step IV: Local Image Descriptor Previous operations imposed a local 2D coordination system, which provides invariance to image location, scale and orientation  We wish to compute descriptors for the local image regions: 1. Highly distinctive 2. Invariant as possible to remaining variations (illumination, 3D viewpoint…)
Descriptor Representation Use the scale of the keypoint to select the level of Gaussian blur. Sample the gradient magnitude and orientation around the keypoint Assign weight to magnitude using a Gaussian weighted function with theta = ½ width of descriptor window  (provides gradual change & gives less emphasis to gradients far from the keypoint Use a descriptor array with  histogram bins
Descriptor Representation
Descriptor Representation :
Invariance to Affine Illumination Changes: * Multiplication by a constant: Normalize vector to unit length: A change in each pixel: pixel -> a * pixel (each pixel multiplied by a constant) will result – gradient -> gradient * a. This will be canceled by the normalization * Addition of  a constant: pixel -> pixel + a  Has no effect on the gradient
Partial Invariance To Non Affine Illumination changes: Will cause large change in relative magnitude, but is unlikely to affect gradient orientations. Solution: reduce the influence of large gradient magintudes by thresholding the values to be no larger than 0.2, then normalize them to unit length.
Partial Invariance To Affine Change In Viewpoint Angle:
Object Recognition: Best candidate match for each keypoint is  nearest neighbour in database Problem: many background features will not have a matching pair in database resulting in a false match Global threshold to descriptors does not perform well since some descriptors are more discriminating than others Solution: Compare distance to closet neighbour to that of the second closet neighbour (that comes from a different object)
Results:
More Results:
More Results  (not as successful…):
Image matching:
Sources / Web Sources: Article: David G. Lowe, &quot;Distinctive image features from scale-invariant keypoints,&quot;  International Journal of Computer Vision,  60, 2 (2004), pp. 91-110  http://citeseer.ist.psu.edu/654168.html Some slides were adopted from: Matching with Invariant Features: Darya Frolova, Denis Simakov www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt
Slide / Web Sources Continued: Matching Features: Prof. Bill Freeman courses.csail.mit.edu/6.869/lectnotes/lect8/lect8-slides-6up.pdf Object Recognition Using Local Descriptors:  Javier Ruiz-del-Solar  www.ciw.cl/material/compression2005/ruiz.pdf Scale Invariant Feature Transform: Tom Duerig www-cse.ucsd.edu/classes/fa06/cse252c/tduerig1.ppt
Slide / Web Sources Continued: Object Recognition with Invariant Features: David Lowe www.cs.ubc.ca/~lowe/425/slides/10-sift-6up.pdf Local Feature Tutorial: courses.csail.mit.edu/6.869/handouts/tutSIFT04.pdf F. Estrada et al Introduction to SIFT features: www.danet.dk/sensor_fusion/SIFT features.ppt   More on Features: Yung-Yu Chaung www.csie.ntu.edu.tw/~cyy/courses/vfx/06spring/lectures/handouts/lec05_feature_4up.pdf
The  End…

Michal Erel's SIFT presentation

  • 1.
    SIFT: Scale InvariantFeature Transform Presenter: Michal Erel David G. Lowe, &quot; Distinctive image features from scale-invariant keypoints ,“ International Journal of Computer Vision, 60, 2 (2004), pp. 91-110
  • 2.
    Object Recognition Finda particular object we've encountered before. Search for local features based on the appearance of the object at particular interest points
  • 3.
    Why do wecare about matching features? Object Recognition Location Recognition Image Alignment & Matching Stereo Matching Robot self localization Image retrieval by similarity (from large database)
  • 4.
  • 5.
  • 6.
    We want invariance!!!Good features should be robust to all sorts of nastiness that can occur between images.
  • 7.
    Types of invarianceIllumination
  • 8.
    Types of invarianceIllumination Scale
  • 9.
    Types of invarianceIllumination Scale Rotation
  • 10.
    Types of invarianceIllumination Scale Rotation Affine
  • 11.
    Types of invarianceIllumination Scale Rotation Affine Perspective
  • 12.
    SIFT- Scale InvariantFeature Transform The features are: Invariant to image scaling Invariant to rotation Partially invariant to: Change in illumination Change in 3D camera viewpoint Occlusion, clutter, or noise
  • 13.
    Step I: Detectionof Scale-Space Extrema Identify locations and scales that can be assigned under different views of the same object Scale-Space function!
  • 14.
  • 15.
    Scale-Space To scale:take every second pixel in each row and column (another approach: average 4 pixels)
  • 16.
    Difference of Gaussians(DOG) Sigma 4 Sigma2-Sigma4 Sigma 2
  • 17.
  • 18.
  • 19.
    Local Extrema DetectionCompare each pixel to: 8 neighbours in current image 9 neighbours in scale above 9 neighbours in scale below Take pixel if larger / smaller than all of them This is called a Keypoint
  • 20.
    Keypoints Too manykeypoints, some are unstable
  • 21.
    Step II: KeypointLocalization Reject points with low contrast Reject points that are localized along an edge.
  • 22.
    Step II: KeypointLocalization Fit keypoint to nearby data for location, scale and ratio of principal curvatures. Reject points with low contrast & points that are localized along an edge.
  • 23.
    Keypoint Localization Initialapproach: locate keypoints at location and scale of the central sample point. New approach: try to calculate the interpolated location of the maximum. Improves matching and stability
  • 24.
    Keypoint Localization UseQuadric Taylor Expansion of the scale-space function, so that the origin is at the sample point: (x is the offset from this point) Calculate extermum: if X > 0.5: the extermum lies closer to a different point (Need to recalculate…) Otherwise: add offset to the sample point location to get the estimated extremum ^
  • 25.
    Reject Low ContrastKeypoints Calculate value of D at extremum point X: if |D(X)| < 0.03: discard keypoint for having a low contrast
  • 26.
  • 27.
    Eliminate Edge Responses:DoG function might have strong response along edges, even if unstable to small amounts of noise Edge identification: large principal curvature across the edge, but small one in perpendicular direction. Note ♥ : It's easy to show that the two principle curvatures (i.e., the min and max curvatures) are always along directions perpendicular to each other.  In general, finding the principle directions amounts to solving a nxn eigenvalue problem
  • 28.
    Eliminate Edge Responses:No need to explicitly calculate the eigenvalues – we only need their ratio!! a = small eigenvalue b = large eigenvalue r = ratio between large and small eigenvalues (r=a/b) (r+1)^2/r is at min when a=b, and increases as the ratio increases
  • 29.
    Eliminate Edge Responses:To check if the ratio of the principal curvatures is below a threshold r, we only need to check if: Use r = 10 to reject keypoints that lay along an edge
  • 30.
  • 31.
    832 keypoints 729keypoints (eliminate low contrast) 536 keypoints (eliminate edge keypoints)
  • 32.
    Step III: OrientationAssignment Each keypoint is assigned 1 or more orientations, based on local image gradient directions. Data is trasformed relative to the assigned orientation, scale and location hence providing invariance to these transformations
  • 33.
    Gradient Calculation Thescale of the keypoint is used to select the Gaussian image L we’ll work on (image with closest scale) – All computations are performed in a scale-invariant manner. We calculate gradient magnitue and orientation using pixel differences:
  • 34.
  • 35.
  • 36.
    Orientation Histogram Orientationhistogram with 36 bins (each bin covers 10 degrees) Each sample added to the histogram bin is weighted by its gradient magnitude and by a Gaussian weighted circular window with theta = 1.5 times that of the keypoint scale
  • 37.
    Orientation Histogram: Detecthighest peak and local peaks that are within 80% of the highest peak. Use these to assign (1 or more) orientations
  • 38.
    Step IV: LocalImage Descriptor Previous operations imposed a local 2D coordination system, which provides invariance to image location, scale and orientation We wish to compute descriptors for the local image regions: 1. Highly distinctive 2. Invariant as possible to remaining variations (illumination, 3D viewpoint…)
  • 39.
    Descriptor Representation Usethe scale of the keypoint to select the level of Gaussian blur. Sample the gradient magnitude and orientation around the keypoint Assign weight to magnitude using a Gaussian weighted function with theta = ½ width of descriptor window (provides gradual change & gives less emphasis to gradients far from the keypoint Use a descriptor array with histogram bins
  • 40.
  • 41.
  • 42.
    Invariance to AffineIllumination Changes: * Multiplication by a constant: Normalize vector to unit length: A change in each pixel: pixel -> a * pixel (each pixel multiplied by a constant) will result – gradient -> gradient * a. This will be canceled by the normalization * Addition of a constant: pixel -> pixel + a Has no effect on the gradient
  • 43.
    Partial Invariance ToNon Affine Illumination changes: Will cause large change in relative magnitude, but is unlikely to affect gradient orientations. Solution: reduce the influence of large gradient magintudes by thresholding the values to be no larger than 0.2, then normalize them to unit length.
  • 44.
    Partial Invariance ToAffine Change In Viewpoint Angle:
  • 45.
    Object Recognition: Bestcandidate match for each keypoint is nearest neighbour in database Problem: many background features will not have a matching pair in database resulting in a false match Global threshold to descriptors does not perform well since some descriptors are more discriminating than others Solution: Compare distance to closet neighbour to that of the second closet neighbour (that comes from a different object)
  • 46.
  • 47.
  • 48.
    More Results (not as successful…):
  • 49.
  • 50.
    Sources / WebSources: Article: David G. Lowe, &quot;Distinctive image features from scale-invariant keypoints,&quot; International Journal of Computer Vision, 60, 2 (2004), pp. 91-110 http://citeseer.ist.psu.edu/654168.html Some slides were adopted from: Matching with Invariant Features: Darya Frolova, Denis Simakov www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt
  • 51.
    Slide / WebSources Continued: Matching Features: Prof. Bill Freeman courses.csail.mit.edu/6.869/lectnotes/lect8/lect8-slides-6up.pdf Object Recognition Using Local Descriptors: Javier Ruiz-del-Solar www.ciw.cl/material/compression2005/ruiz.pdf Scale Invariant Feature Transform: Tom Duerig www-cse.ucsd.edu/classes/fa06/cse252c/tduerig1.ppt
  • 52.
    Slide / WebSources Continued: Object Recognition with Invariant Features: David Lowe www.cs.ubc.ca/~lowe/425/slides/10-sift-6up.pdf Local Feature Tutorial: courses.csail.mit.edu/6.869/handouts/tutSIFT04.pdf F. Estrada et al Introduction to SIFT features: www.danet.dk/sensor_fusion/SIFT features.ppt More on Features: Yung-Yu Chaung www.csie.ntu.edu.tw/~cyy/courses/vfx/06spring/lectures/handouts/lec05_feature_4up.pdf
  • 53.