The document discusses the SIFT (Scale-Invariant Feature Transform) algorithm. SIFT detects and describes local features in images to allow object recognition despite changes in scale, rotation, and illumination. It works by detecting keypoints, assigning orientations based on local image gradients, and generating descriptors of the gradients around each keypoint. Keypoints are matched between images by comparing their descriptors. SIFT has become widely used for tasks like object recognition, image stitching, and 3D reconstruction due to its robustness to transformations.
2. Presenters
Roll No Name
1. 23201009-011 Sheraz ul Hassan
2. 23201009-004 Ali Imran Cheema
3. 23201009-002 Khuram Shahzad
Let’s Start
3. SIFT Algorithm
SIFT Stands for
Scale-Invariant Feature Transform
Definition
It is an algorithm used in computer vision for
detecting and describing local features in images.
4. SIFT Algorithm
Algorithm Proposed by David G. Lowe in 1999
Importance of Algorithm It has become one of the
most widely used and robust algorithms for feature extraction
in various computer vision applications.
5. SIFT Algorithm
The Key Properties of SIFT Algorithm
1. Scale Invariance
2. Distinctive Feature Description
1. Scale Invariance :
Scale invariance means that the SIFT
algorithm can detect and describe image features at different scales
(sizes) without affecting the accuracy of the feature extraction. It allows
the algorithm to recognize the same visual pattern at various sizes,
making it robust to changes in the scale of objects or scenes in the
images.
6. SIFT Algorithm
2. Distinctive Feature Description :
Distinctive Feature
Description is to create a compact and informative representation of
local image patterns that is invariant to various transformations such as
scale, rotation, and changes in illumination. This ensures that the same
feature can be recognized in different images, even when the viewpoint
or conditions vary.
7. SIFT Advantages
• Scale and Rotation Invariance.
• Illumination Invariance.
• Distinctive and Robust Features.
• Robustness to Noise.
• Versatile Real-World Applicability.
9. Working of SIFT Algorithm Stages
1. Scale-space extrema detection :
SIFT first constructs a scale-space
representation of the input image by applying Gaussian blurring at different
scales. It then identifies the local extrema (keypoints) in this scale-space
representation, which corresponds to potential keypoint locations in the image.
Explanation of Scale-Space & Local Extrema :
Scale-space in the first stage
of the SIFT algorithm refers to the creation of a multi-scale representation of the
input image by applying Gaussian blurring at different scales. Local extrema are
keypoint candidates obtained by identifying points in this scale-space where the
image's intensity is either the highest or lowest compared to neighboring points in
the same scale and across different scales. These local extrema represent potential
feature locations in the image.
10. The 1st equation is a mathematical expression for a Gaussian blur.
The equation has three terms:
• L(x, y, σ) is the blurred image.
• G(x, y, σ) is the Gaussian kernel.
• I(x, y) is the original image.
The 2nd equation has three terms:
1/2πσ2 is a normalization constant. It ensures that the Gaussian function integrates to 1.
e is the mathematical constant e, which is approximately 2.71828.
(x2+y2)/2σ2 is a quadratic term. It determines the shape of the Gaussian function.
11.
12.
13.
14. Working of SIFT Algorithm Stages
2. Keypoint localization :
The algorithm refines the keypoint locations to sub-
pixel accuracy to make them more stable and accurate. It removes low-contrast
keypoints and those located on edges to enhance the robustness of the feature
extraction process.
Explanation :
In simple terms, "refining keypoint locations to sub-pixel
accuracy" means that the algorithm makes very precise adjustments to the positions
of the keypoints, going beyond just whole pixel coordinates. It ensures that the
keypoints are located more accurately, allowing them to be stable and reliable, even
if they are not exactly aligned with the grid of pixels in the image. This sub-pixel
accuracy helps to improve the overall performance and robustness of the algorithm
in detecting and matching features in images.
15. For Example, consider a 3x3 image grid with pixel coordinates (0,0), (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), and
(2,2):
Pixel Coordinates:
(0,0) (0,1) (0,2)
(1,0) (1,1) (1,2)
(2,0) (2,1) (2,2)
The intensity values at these pixel coordinates may look like this:
Pixel Intensity Values:
150 80 200
50 180 70
230 120 100
Now, let's talk about sub-pixel accuracy. In some cases, the features we want to
detect might not be located precisely at a pixel coordinate, but they could be
somewhere in between pixels. Sub-pixel accuracy allows us to estimate the position
of these features with higher precision, beyond just the whole pixel values.
16. The line "It removes low-contrast keypoints and those located on edges" means that
during the SIFT algorithm's keypoint detection process, certain keypoints are
discarded or not considered as reliable keypoints for further processing.
Low-contrast keypoints: These are keypoints located in regions of the image
where there is very little variation in intensity or color. Such keypoints do not have
well-defined features, and matching them across different images might be
challenging or less informative. Therefore, they are removed from the set of
detected keypoints to improve the algorithm's performance.
Keypoints on edges: These are keypoints located on sharp edges or boundaries in
the image. Keypoints on edges may have unstable or ambiguous responses due to
significant intensity variations along the edge. Since these keypoints might not
provide consistent and reliable feature representations, they are also discarded from
the set of keypoints.
17.
18. Working of SIFT Algorithm Stages
3. Orientation assignment:
For each keypoint, SIFT computes a dominant orientation
based on the gradient directions of the image in the region surrounding the
keypoint. This step helps in making the descriptors rotation-invariant.
Explanation :
In simple terms, for each keypoint detected in the image, the
SIFT algorithm calculates the main direction in which the image intensity changes
the most around that keypoint. This direction is called the "dominant orientation." It
is determined by looking at how the image's brightness or color changes in the local
area surrounding the keypoint. This step helps make the SIFT descriptor rotation-
invariant, meaning it can still recognize the same features even if the image is
slightly rotated.
19. For Example : Imagine you have a grayscale image with a keypoint located at the center of a circle. The brightness
of the image changes as you move from the center of the circle towards its edge. The SIFT algorithm wants to find the
main direction in which this brightness changes.
30 40 50 60
20 O O O 70
10 O X O 80
0 O O O 90
10 20 30 40
In this example, "X" represents the keypoint location. The numbers represent the pixel intensities in the image.
To compute the dominant orientation, SIFT looks at the image gradient, which is the direction of the most rapid
change in intensity. In this case, it will calculate the gradient directions at each pixel around the keypoint.
Gradient Directions (Approximate):
30 40 50 60
20 NW N NE 70
10 W X E 80
0 SW S SE 90
10 20 30 40
20. Next, SIFT accumulates the gradient directions into orientation histograms for the region around the keypoint. It
creates bins (e.g., 0-45 degrees, 45-90 degrees, etc.) and counts how many gradient directions fall into each bin.
Orientation Histogram:
0-45°: 1 (Northwest)
45-90°: 2 (North, East)
90-135°: 1 (East)
135-180°: 2 (Southeast, South)
180-225°: 0
225-270°: 0
270-315°: 0
315-360°: 0
In this example, the bin with the highest count is the one for 45-90 degrees (North and East). Therefore, the dominant
orientation for this keypoint is approximately in the direction of 45 degrees.
By computing the dominant orientation, SIFT ensures that the descriptor captures the local image structure
consistently, even if the image is slightly rotated. This rotation-invariant property makes SIFT more robust and useful
for various computer vision tasks, such as object recognition and image matching.
21. Working of SIFT Algorithm Stages
4. Feature descriptor calculation:
A descriptor is computed for each keypoint
based on the local image gradient information. The descriptor captures the intensity
gradient distribution around the keypoint and is robust to changes in scale, rotation, and
illumination.
Explanation :
In simple terms, for each keypoint detected in the image, the SIFT
algorithm calculates a unique and compact representation called a "descriptor." This
descriptor is derived from the information about how the image intensity changes in the
local area surrounding the keypoint, which is determined by the image gradient.
The image gradient represents the direction and magnitude of the rapid changes in pixel
intensity in different directions around the keypoint. The descriptor captures this local
structure by summarizing the distribution of these gradients in a way that is invariant to
translation, rotation, and changes in illumination. This makes the SIFT descriptor a
powerful and distinctive feature that can be used to identify and match keypoints across
different images, even under various transformations and conditions.
22.
23. Working of SIFT Algorithm Stages
5. Keypoint matching:
Finally, SIFT descriptors from different images are
compared using distance metrics like Euclidean distance or Hamming distance, and
keypoints with similar descriptors are matched, forming correspondences between
different images.
Explanation :
Euclidean distance is a distance metric used to measure the
straight-line distance between two points in a multidimensional space.
Mathematically, the Euclidean distance between two points (x1, y1, ..., xn) and (x2,
y2, ..., xn) in an n-dimensional space is given by the formula:
24. Applications of SIFT Algorithm
The SIFT algorithm has been used for a variety of computer vision tasks, including:
Object Recognition
Object Tracking
Image Stitching
3D Reconstruction
Face Recognition
Action Recognition
25. Object Recognition :
Object recognition is the task of identifying objects in
images or videos. It is a challenging problem because objects can appear in different
sizes, orientations, and lighting conditions. However, object recognition is a
valuable tool for a variety of applications, such as robotics, surveillance, and image
search.
The SIFT algorithm can be used for object recognition by matching SIFT
descriptors between images. The SIFT descriptors are invariant to changes in scale,
rotation, and illumination, making them well-suited for matching images of the
same object taken from different viewpoints or under different lighting conditions.
To match SIFT descriptors, the descriptors from two images are first compared. The
descriptors that are most similar are then used to establish a correspondence
between the two images. This correspondence can then be used to identify the
objects in the images.
26.
27. SIFT Algorithm
Conclusion :
The SIFT algorithm is a powerful tool for detecting and describing local
features in images. SIFT features are invariant to changes in scale, rotation, and illumination, making
them useful for a variety of computer vision tasks. The SIFT algorithm has been shown to be
effective for object recognition, and it is a widely used technique in the field of computer vision.
28. Future Work
Here are some of the future work that has been done in SIFT after David Lowe:
Improved feature detection:
Several researchers have proposed improvements to the SIFT
feature detector. For example, the SURF feature detector is a variant of SIFT that is more robust
to changes in illumination and rotation.
New feature descriptors:
Several new feature descriptors have been proposed that are more
discriminative than the original SIFT descriptor. For example, the ORB descriptor is a binary
descriptor that is more efficient to compute than the SIFT descriptor.
Applications to new tasks:
SIFT has been used for a variety of tasks in computer vision,
including object recognition, image matching, and scene understanding. Researchers have also
proposed using SIFT for new tasks, such as 3D reconstruction and action recognition.
29. Future Work
Here are some of the researchers who have done future work in SIFT :
David Lowe:
Lowe himself continued to work on SIFT after his original paper, and he published
a number of follow-up papers that improved the algorithm.
Terry Sim:
Sim developed the SURF feature detector, which is a variant of SIFT that is more
robust to changes in illumination and rotation.
Ethan Rublee:
Rublee developed the ORB descriptor, which is a binary descriptor that is
more efficient to compute than the SIFT descriptor.
Salvador Garcia-Fidalgo:
Garcia-Fidalgo developed the BRISK descriptor, which is another
binary descriptor that is efficient to compute and has been shown to be effective for a variety of
tasks.
31. Credits
• Zhou, H., Yuan, Y., & Shi, C. (2009). Object tracking using SIFT features and mean shift.
Computer Vision and Image Understanding, 113(3), 345–352.
https://doi.org/10.1016/j.cviu.2008.08.006
• Bay, H., Tuytelaars, T., & van Gool, L. (2006). SURF: Speeded Up Robust Features (pp. 404–417).
https://doi.org/10.1007/11744023_32
• Lowe, D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints. International
Journal of Computer Vision, 60(2), 91–110.
https://doi.org/10.1023/B:VISI.0000029664.99615.94