2. − REGION FEATURE DESCRIPTORS
Spectral Approaches
Moment invariants
− Principal Components as feature descriptors
− Whole image features
THE HARRIS-STEPHENS CORNER DETECTOR
MAXIMALLY STABLE EXTREMAL REGIONS (MSERs)
− SCALE-INVARIANT FEATURE TRANSFORM (SIFT)
SCALE SPACE
DETECTING LOCAL EXTREMA
Finding the Initial Keypoints
Improving the Accuracy of Keypoint Locations
Eliminating Edge Responses
KEYPOINT ORIENTATION
KEYPOINT DESCRIPTORS
SUMMARY OF THE SIFT ALGORITHM
Outline
2
4. Spectral Approaches
4
− Fourier spectrum is ideally suited for describing the directionality of periodic or semi periodic 2-D patterns in an
image.
− These global texture patterns are easily distinguishable as concentrations of high-energy bursts in the spectrum.
− three features of the Fourier spectrum that are useful for texture description:
• prominent peaks in the spectrum give the principal direction of the texture patterns
• the location of the peaks in the frequency plane gives the fundamental spatial period of the
patterns
• eliminating any periodic components via filtering leaves nonperiodic image elements, which can then be
described by statistical techniques
− spectrum is symmetric about the origin, so only half of the frequency plane needs to be considered
− every periodic pattern is associated with only one peak in the spectrum, rather than two.
5. Spectral Approaches
5
− Analyzing 𝑆(𝑟) for a fixed value of yields the behavior of the spectrum (e.g., the presence of peaks) along a
radial direction from the origin, whereas analyzing 𝑆(𝑟) for a fixed value of r yields the behavior along a circle
centered on the origin.
S 𝑟 = 𝜃=0
𝜋
𝑆(𝑟)
S 𝜃 = 𝜃=0
𝑅0
𝑆𝑟(𝜃)
radius of a circle centered at the origin
(11-32)
(11-33)
spectrum function
direction
frequency
Indicating a spectral-energy description of texture for an entire
image or region under consideration
6. Spectral Approaches
6
FIGURE 11.35
(a) and (b) Images of
random and ordered
objects. (c) and (d)
Corresponding Fourier
spectra. All images are of
size 600 × 600 pixels.
7. Spectral Approaches
7
FIGURE 11.36
(a) and (b) Plots of S(r) and
S() for Fig. 11.35(a). (c) and (d)
Plots of S(r) and S() for Fig.
1.35(b). All vertical axes are
×105
.
corresponding to the
periodic horizontal
repetition of the ligh
random nature
at 90° and 180°
8. MOMENT INVARIANTS
8
− Moments Invariant are features of an image that are unchanged under translation, rotation, or scaling of the
image, and are very useful in pattern-recognition problems.
(11-34)
central
moment
Normalized
central
moment
9. MOMENT INVARIANTS
9
A set of seven, 2-D moment invariants can be derived from the second and third normalized central
moments
10. MOMENT INVARIANTS
10
FIGURE 11.37
(a) Original image. (b)–(f) Images translated, scaled by one-half, mirrored, rotated by 45°, and
rotated by 90°, respectively.
12. − REGION FEATURE DESCRIPTORS
Spectral Approaches
Moment invariants
− Principal Components as feature descriptors
− Whole image features
THE HARRIS-STEPHENS CORNER DETECTOR
MAXIMALLY STABLE EXTREMAL REGIONS (MSERs)
− SCALE-INVARIANT FEATURE TRANSFORM (SIFT)
SCALE SPACE
DETECTING LOCAL EXTREMA
Finding the Initial Keypoints
Improving the Accuracy of Keypoint Locations
Eliminating Edge Responses
KEYPOINT ORIENTATION
KEYPOINT DESCRIPTORS
SUMMARY OF THE SIFT ALGORITHM
Outline
12
14. 14
− Suppose that we are given the three component images of a color image.
− The three images can be treated as a unit by expressing each group of three corresponding pixels as a vector.
− This can also be in form of 𝑋 = ( 𝑥1, 𝑥2, 𝑥3, … )𝑇
where T indicates the transpose
− In Vectors We have mean vectors and covariance matrices
− Element 𝐶𝑖𝑖 of 𝐶𝑥 is the variance of 𝑥𝑖, the ith component of the x vectors in the population, and element 𝐶𝑖𝑗of 𝐶𝑥
is the covariance between elements 𝑥𝑖 and 𝑥𝑗 of these vectors.
− If elements 𝑥𝑖 and 𝑥𝑗 are uncorrelated, their Covariance is ZERO and resulting in a diagonal covariance matrix
− principal components can be used also to normalize regions or boundaries for variations in size, translation, and
rotation.
same spatial location
Principal Components
15. 15
− Let 𝑒𝑖 and 𝜆𝑖 be eigenvectors and corresponding eigenvalues of 𝐶𝑥
− Let A be a matrix whose rows are formed from the eigenvectors of 𝐶𝑥 . so that the first row of A is the eigenvector
corresponding to the largest eigenvalue.
− use A as a transformation matrix to map the x’s into vectors denoted by y’s
Hotelling transform
Thus, 𝐶𝑥 and 𝐶𝑦 have the same eigenvalues
rows of A are orthonormal vectors 𝐴−1
= 𝐴𝑇
reconstruction
Principal Components
16. 16
− But suppose that instead of using all the eigenvectors of 𝐶𝑥 ,we form a matrix 𝐴𝑘 from the k eigenvectors
corresponding to the k largest eigenvalues. yielding a transformation matrix of order k × n. y vectors would then be
k dimensional.
− For k = n the error is zero
− The error can be minimized by selecting the k eigenvectors associated with the largest eigenvalues. Thus, the
Hotelling transform is optimal in the sense that it minimizes the mean squared error between the vectors x and their
approximations 𝑥
− Due to this idea of using the eigenvectors corresponding to the largest eigenvalues, the Hotelling transform also is
known as the principal components transform.
Reconstruction Using 𝐴𝑘 Mean Square Error Between x and 𝑥
Principal Components
17. Principal Components
17
FIGURE 11.38
Multispectral images in the (a) visible blue, (b) visible green, (c) visible red, (d) near infrared, (e) middle infrared, and (f)
thermal infrared bands. (Images courtesy of NASA.)
450–520 nm 520–600 nm 630–690 nm
760–900 nm 1550–1,750 nm 10,400–12,500 nm
564
564
18. Principal Components
18
size 564 × 564 pixels, so the
population consisted of
(564)2
= 318 096 vectors from
which the mean vector,
covariance matrix, and
corresponding eigenvalues and
eigenvectors were computed
19. Principal Components
19
The eigenvectors were then used as the rows of matrix A, and a set of y vectors
A set of principal component images was generated using the y vectors (images are constructed from vectors by applying
Fig. 11.39 in reverse). Figure 11.40 shows the results
20. Principal Components
20
FIGURE 11.40
The six principal component images obtained from vectors computed using Eq. (11-49). Vectors are converted to images
by applying Fig. 11.39 in reverse.
formed from the
first component
of the 318,096 y
vectors 1
1 2 3
4 5 6
The most obvious feature in the principal component
images is that a significant portion of the contrast detail is
contained in the first two images, and it decreases rapidly
from there
Because the eigenvalues are the variances of the elements
21. Principal Components
21
− The first two images account for about 89% of the total variance. The other four images have low contrast
detail because they account for only the remaining 11%.
− If we used all the eigenvectors in matrix A we could reconstruct the original images from the principal
component images with zero error between the original and reconstructed images (i.e., the images would
be identical). If the objective is to store and/or transmit the principal component images and the
transformation matrix for later reconstruction of the original images, it would make no sense to store
and/or transmit all the principal component images because nothing would be gained. Suppose, however,
that we keep and/or transmit only the two principal component images. Then there would be significant
savings in storage and/or transmission
22. Principal Components
22
FIGURE 11.41
Multispectral images reconstructed using only the two principal component images corresponding to the two principal
component vectors with the largest eigenvalues. Compare these images with the originals in Fig. 11.38.
The reason is that the original sixth
image is actually blurry, but the two
principal component images used in
the reconstruction are sharp,
therefore, the blurry “detail” is lost
23. Principal Components
23
FIGURE 11.42
Differences between the original and reconstructed images. All images were enhanced by scaling them to the full [0, 255]
range to facilitate visual analysis.
24. Principal Components
24
FIGURE 11.43
(a) An object. (b) Object
showing eigenvectors of
its covariance matrix. (c)
Transformed object,
obtained using Eq. (11-
49). (d) Object translated
so that all its coordinate
values are greater than
0.
Eigen Vectors
25. Principal Components
25
FIGURE 11.44
A manual example. (a) Original points.
(b) Eigenvectors of the covariance
matrix of the points in (a). (c)
Transformed points obtained using Eq.
(11-49). (d) Points from (c), rounded
and translated so that all coordinate
values are integers greater than 0. The
dashed lines are included to facilitate
viewing. They are not part of the data.
(1, 1), (2, 4), (4, 2), (5, 5)
1 = 5.333
2 = 1.333
27. − REGION FEATURE DESCRIPTORS
Spectral Approaches
Moment invariants
− Principal Components as feature descriptors
− Whole image features
THE HARRIS-STEPHENS CORNER DETECTOR
MAXIMALLY STABLE EXTREMAL REGIONS (MSERs)
− SCALE-INVARIANT FEATURE TRANSFORM (SIFT)
SCALE SPACE
DETECTING LOCAL EXTREMA
Finding the Initial Keypoints
Improving the Accuracy of Keypoint Locations
Eliminating Edge Responses
KEYPOINT ORIENTATION
KEYPOINT DESCRIPTORS
SUMMARY OF THE SIFT ALGORITHM
Outline
27
29. THE HARRIS-STEPHENS (HS) CORNER DETECTOR
29
− The state of the art in image processing is such that as the complexity of the task increases, the number of
techniques suitable for addressing those tasks decreases.
− Intuitively, we think of a corner as a rapid change of direction in a curve
− Corners are highly effective features because they are distinctive and reasonably invariant to viewpoint.
− Because of these characteristics, corners are used routinely for matching image features in applications such as
tracking for autonomous navigation, stereo machine vision algorithms, and image database queries.
− In HS Corner Detector, basic approach is this: Corners are detected by running a small window over an image
− The detector window is designed to compute intensity changes. three scenarios:
− (A) Areas of zero (or small) intensity changes in all directions, which happens when the window is located in a
constant (or nearly constant) region, as in location A
− (B) areas of changes in one direction but no (or small) changes in the orthogonal direction, which this happens
when the window spans a boundary between two regions, as in location B; and
− (C) areas of significant changes in all directions, a condition that happens when the window contains a corner (or
isolated points), as in location C.
30. THE HARRIS-STEPHENS CORNER DETECTOR
30
FIGURE 11.45 Illustration of
how the Harris-Stephens
corner detector operates
in the three types of
subregions indicated by A
(flat), B (edge), and C
(corner). The wiggly
arrows indicate graphically
a directional response in
the detector as it moves in
the three areas shown.
The detector window is designed to compute intensity changes
31. weighting function
THE HARRIS-STEPHENS (HS) CORNER DETECTOR
31
− Let f denote an image, and let f (s , t ) denote a patch of the image defined by the values of (s , t )
− A patch of the same size, but shifted by (x , y ) is given by f (s + x, t + y)
− Weighted Sum of squared differences between the two patches is given by:
shifted patch
𝜕𝑓
𝜕𝑥
𝜕𝑓
𝜕𝑦
Matrix Form
Exponential when data smoothing is important
OR it can be BOX lowpass filter when computational
speed needed And the NOISE level is low
Taylor expansion
Harrison Matrix
32. THE HARRIS-STEPHENS CORNER DETECTOR
32
FIGURE 11.46
(a)–(c) Noisy images and image patches (small squares) encompassing image regions similar in content to those in Fig. 11.45.
(d)–(f) Plots of value pairs (𝑓𝑥, 𝑓𝑦) showing the characteristics of the eigenvalues of M that are useful for
detecting the presence of a corner in an image patch.
Kernel
𝑤𝑦= [-1 0 1]
𝑤𝑥 = 𝑤𝑦
𝑇
D e r I v a t I v e s
33. THE HARRIS-STEPHENS (HS) CORNER DETECTOR
− we conclude that: (1) two small eigenvalues indicate nearly constant intensity; (2) one small and one large
eigenvalue imply the presence of a vertical or horizontal boundary; and (3) two large eigenvalues imply the
presence of a corner or (unfortunately) isolated bright points.
− Thus, we see that the eigenvalues of the matrix formed from derivatives in the image patch can be used to
differentiate between the three scenarios of interest
− However, instead of using the eigenvalues (which are expensive to compute), the HS detector utilizes a measure
of corner response based on the fact that the trace of a square matrix is equal to the sum of its eigenvalues,
and its determinant is equal to the product of its eigenvalues
− Measure R has large positive values when both eigenvalues are large, indicating the presence of a corner. ; it
has large negative values when one eigenvalue is large and the other small, indicating an edge; and its absolute
value is small when both eigenvalues are small, indicating that the image patch under consideration is flat.
− You can interpret k as a “sensitivity factor;” the smaller it is, the more likely the detector is to find corners.
− The advantage of this formulation is that the trace is the sum of the main diagonal terms of M (just two
numbers). The determinant of a 2 × 2 matrix is the product of the main diagonal elements minus the product of
the cross elements. These are trivial computations.
sensitivity factor
33
34. THE HARRIS-STEPHENS CORNER DETECTOR
34
k = 0.04,T = 0.01 k = 0.1,T = 0.01
k = 0.1,T = 0.1 k = 0.04,T = 0.1 k = 0.04,T = 0.3
A corner at an image location has been detected only if R > T for a patch at that location
FALSE
Detection
note that all errors
occurred on the right
side of the image,
where the difference in
intensity between
squares is less
only the corner of the
squares with larger
intensity differences
600
600
Additive
Gaussian
Noise
35. THE HARRIS-STEPHENS CORNER DETECTOR
35
k = 0.04,T = 0.01
k = 0.25,T = 0.1 k = 0.04,T = 0.15
much worse than before
37. THE HARRIS-STEPHENS CORNER DETECTOR
37
FIGURE 11.50
(a) Image rotated 5°. (b)
Corners detected using
the parameters used to
obtain Fig. 11.49(f).
Rotated
38. − The Harris-Stephens corner detector discussed in the previous section is useful in applications characterized by
sharp transitions of intensities, such as the intersection of straight edges, that result in corner-like features in an
image Conversely, the maximally stable extremal regions (MSERs) are more “blob” oriented. As with the HS
corner detector, MSERs are intended to yield whole image features for the purpose of establishing
correspondence between two or more images.
− Imagine that we start thresholding an 8-bit grayscale image one intensity level at a time.
− The result of each thresholding is a binary image in which we show the pixels at or above the threshold in
white, and the pixels below the threshold as black.
− When the threshold, T, is 0, the result is a white image (all pixel values are at or above 0). As we start increasing
T in increments of one intensity level, we will begin to see black components in the resulting binary images.
These correspond to local minima in the topographic map view of the image. These black regions may begin
to grow and merge, but they never get smaller from image to image.
− Finally, when we reach T = 255, the resulting image will be black (there are no pixel values above this level).
− Because each stage of thresholding results in a binary image, there will be one or more connected components
of white pixels in each image
− The set of all such components resulting from all thresholdings is the set of extremal regions. Extremal regions
that do not change size (number of pixels) appreciably over a range of threshold values are called maximally
stable extremal regions.
MAXIMALLY STABLE EXTREMAL REGIONS (MSERs)
38
39. − The procedure just discussed can be cast in the form of a rooted, connected tree called a component tree, where
each level of the tree corresponds to a value of the threshold discussed in the previous slide. Each node of this tree
represents an extremal region, R, defined as
− where I is the image under consideration, and p and q are image points. This equation indicates that an extremal
region R is a region of I, with the property that the intensity of any point in the region is higher than the intensity at
any point in the boundary of the region
− MSREs are the regions corresponding to the nodes in the tree that have a stability value that is a local minimum
along the path of the tree containing that region
MAXIMALLY STABLE EXTREMAL REGIONS (MSERs)
stability measure
size of the area (number of pixels) threshold
Threshold
increment
Parent Child
38
maximally stable regions are regions whose sizes do not change appreciably across two, 2T
neighboring thresholded images
40. MAXIMALLY STABLE EXTREMAL REGIONS (MSERs)
40
FIGURE 11.51
Detecting MSERs. Top: Grayscale image. Left: Thresholded images using T = 10 and T = 50. Right: Component tree, showing
the individual regions. Only one MSER was detected (see dashed tree node on the rightmost branch of the tree). Each level of
the tree is formed from the thresholded image on the left, at that same level. Each node of the tree contains one extremal
region (connected component) shown in white, and denoted by a subscripted R.
although MSERs are based
on intensity, they also
depend on the nature of the
background surrounding a
region. In this case, R6 was
surrounded by a darker
background than R7
Because we need to check size
variations between parent and child
regions to determine stability, only the
two middle regions
has a parent
and child of
similar size
43. 43
FIGURE 11.54
(a) Building image rotated 5° counterclockwise. (b) Smoothed image using the same kernel as in Fig. 11.53(b). (c)
Composite MSER detected using the same parameters we used to obtain Fig. 11.53(e). The MSERs of the original
and rotated images are almost identical.
MAXIMALLY STABLE EXTREMAL REGIONS (MSERs)
Rotated
44. 44
FIGURE 11.55
(a) Building image reduced to half-size. (b) Image smoothed with a 3 × 3 box kernel. (c) Composite MSER obtained with the
same parameters as Fig. 11.53(e), but using a valid MSER region size range of 2,500 -–7,500 pixels.
MAXIMALLY STABLE EXTREMAL REGIONS (MSERs)
Scaled
0.5 3 × 3 box kernel
45. − REGION FEATURE DESCRIPTORS
Spectral Approaches
Moment invariants
− Principal Components as feature descriptors
− Whole image features
THE HARRIS-STEPHENS CORNER DETECTOR
MAXIMALLY STABLE EXTREMAL REGIONS (MSERs)
− SCALE-INVARIANT FEATURE TRANSFORM (SIFT)
SCALE SPACE
DETECTING LOCAL EXTREMA
Finding the Initial Keypoints
Improving the Accuracy of Keypoint Locations
Eliminating Edge Responses
KEYPOINT ORIENTATION
KEYPOINT DESCRIPTORS
SUMMARY OF THE SIFT ALGORITHM
Outline
45
47. − SIFT is an algorithm developed by Lowe [2004] for extracting invariant features from an image. It is called a
transform because it transforms image data into scale-invariant coordinates relative to local image features
− When images are similar in nature (same scale, similar orientation, etc), corner detection and MSERs are suitable as
whole image features. However, in the presence of variables such as scale changes, rotation, changes in illumination,
and changes in viewpoint, we are forced to use methods like SIFT
− SIFT features (called keypoints) are invariant to image scale and rotation, and are robust across a range of affine
distortions, changes in 3-D viewpoint, noise, and changes of illumination
− The input to SIFT is an image. Its output is an n-dimensional feature vector whose elements are the invariant feature
descriptors
− The first stage of the SIFT algorithm is to find image locations that are invariant to scale change
− This is achieved by searching for stable features across all possible scales, using a function of scale known as scale
space, which is a multi-scale representation suitable for handling image structures at different scales in a consistent
manner.
− In SIFT, Gaussian kernels are used to implement smoothing, so the scale parameter is the standard deviation. the
only smoothing kernel that meets a set of important constraints, such as linearity and shift-invariance, is the
Gaussian lowpass kernel
− Scale space represents an image as a one-parameter family of smoothed images, with the objective of simulating
the loss of detail that would occur as the scale of an image decreases. The parameter controlling the smoothing is
referred to as the scale parameter.
Scale-Invariant Feature Transform (SIFT)
47
48. SCALE SPACE
48
FIGURE 11.56
Scale space, showing three
octaves. Because s=2 in
this case, each octave has
five smoothed images. A
Gaussian kernel was used
for smoothing, so the
space parameter is .
s + 3
s + 3
s + 3
“stack” of Gaussian-
filtered (smoothed)
images
49. − FIG 11.56
− The first image of the new octave is formed by: (1) downsampling the original image enough times to
achieve half the size of the image in the previous octave, and (2) smoothing the downsampled image
with a new standard deviation that is twice the standard deviation of the previous octave.
− The rest of the images in the new octave are obtained by smoothing the downsampled image with the
new standard deviation multiplied by the same sequence of values of k as before
scale space
Gaussian kernel grayscale image
SCALE SPACE
49
50. Finding the Initial Keypoints
50
FIGURE 11.57 Illustration
using images of the first
three octaves of scale
space in SIFT. The
entries in the table are
values of standard
deviation used at each
scale of each octave.
For example the
standard deviation used
in scale 2 of octave 1 is
k𝜎1, which is equal to
1.0. (The images of
octave 1 are shown
slightly overlapped to
fit in the figure space.)
s = 2
fewer details
more blurred
51. − SIFT initially finds the locations of keypoints using the Gaussian filtered images, then refines the locations
and validity of those keypoints
− Keypoint locations in scale space are found initially by SIFT by detecting extrema in the difference of
Gaussians of two adjacent scale-space images in an octave, convolved with the input image that
corresponds to that octave
− For example, to find keypoint locations related to the first two levels of octave 1 in scale space:
− To form function D(x, y, ) s is subtract the first two images of octave 1.
− the difference of Gaussians is an approximation to the Laplacian of a Gaussian (LoG)
DETECTING LOCAL EXTREMA
51
53. 53
FIGURE 11.59
Extrema (maxima or minima)
of the D ( x, y, ) images in an
octave are detected by
comparing a pixel (shown in
black) to its 26 neighbors
(shown shaded) in 3 × 3
regions at the current and
adjacent scale images.
eight neighbors
nine neighbors
nine neighbors
Finding the Initial Keypoints
The point is selected
as an extremum
(maximum or
minimum) point if its
value is larger than the
values of all its
neighbors, or smaller
than all of them
No extrema can be
detected in the first
(last) scale of an
octave because it has
no lower (upper)
scale image of the
same size.
54. − When a continuous function is sampled, its true maximum or minimum may actually be located between
sample points. The usual approach used to get closer to the true extremum (to achieve subpixel accuracy) is to
fit an interpolating function at each extremum point found in the digital function, then look for an improved
extremum location in the interpolated function. SIFT uses the linear and quadratic terms of a Taylor series
expansion of D(x ,y , ), shifted so that the origin is located at the sample point being examined
Improving the Accuracy of Keypoint Locations
gradient operator
54
Hessian Matrix
Function Value at
the extremum
55. − Keypoints of interest in SIFT are “corner-like” features, which are significantly more localized.
− To quantify the difference between edges and corners, we can look at local curvature. An edge is characterized
by high curvature in one direction, and low curvature in the orthogonal direction
− Curvature at a point in an image can be estimated from the 2 × 2 Hessian matrix evaluated at that point
− Thus, to estimate local curvature of the DoG at any level in scalar space
Eliminating Edge Responses
Smallest magnitude eigenvalues of H
Largest magnitude eigenvalues of H
H is symmetric and of size 2 × 2
55
From HS we can compute
Trace and Determinants
instead of Eigenvalues
because of computational
load
56. − If the determinant is negative, the curvatures have different signs and the keypoint in question cannot be
an extremum, so it is discarded.
− Let r denote the ratio of the largest to the smallest eigenvalue r
− which depends on the ratio of the eigenvalues, rather than their individual values.
− to check that the ratio of principal curvatures is below some threshold, r, we only need to check
− a value of r = 10 was used, meaning that keypoints with ratios of curvature greater than 10 were
eliminated.
Eliminating Edge Responses
56
Advantage: easy to compute.
57. 57
FIGURE 11.60
SIFT keypoints detected in
the building image. The
points were enlarged
slightly to make them
easier to see.
Eliminating Edge Responses
58. − we have computed keypoints that SIFT considers stable. Because we know the location of each keypoint in scale space,
we have achieved scale independence
− The next step is to assign a consistent orientation to each keypoint based on local image properties. This allows us to
represent a keypoint relative to its orientation and thus achieve invariance to image rotation.
− The scale of the keypoint is used to select the Gaussian smoothed image, L, that is closest to that scale. In this way, all
orientation computations are performed in a scale-invariant manner
− A histogram of orientations is formed from the gradient orientations of sample points in a neighborhood of each
keypoint. The histogram has 36 bins covering the 360° range of orientations on the image plane. Each sample added to
the histogram is weighed by its gradient magnitude, and by a circular Gaussian function with a standard deviation 1.5
times the scale of the keypoint.
− Peaks in the histogram correspond to dominant local directions of local gradients. The highest peak in the histogram is
detected and any other local peak that is within 80% of the highest peak is used also to create another keypoint with
that orientation. Thus, for the locations with multiple peaks of similar magnitude, there will be multiple keypoints
created at the same location and scale, but with different orientations
− Finally, a parabola is fit to the three histogram values closest to each peak to interpolate the peak position for better
accuracy 58
KEYPOINT ORIENTATION
gradient magnitude
orientation angle
59. KEYPOINT ORIENTATION
59
FIGURE 11.61
The keypoints from Fig.
11.60 superimposed on the
original image. The arrows
indicate keypoint
orientations.
FIGURE 11.60
The lengths of the arrows
vary, depending on
illumination and image
content, but their direction is
unmistakably consistent
60. KEYPOINT DESCRIPTORS
60
− The procedures discussed up to this point are used for assigning an image location, scale, and
orientation to each keypoint, thus providing invariance to these three variables. The next step is to
compute a descriptor for a local region around each keypoint that is highly distinctive, but is at the
same time as invariant as possible to changes in scale, orientation, illumination, and image viewpoint.
The idea is to be able to use these descriptors to identify matches (similarities) between local regions in
two or more images.
− SIFT performs interpolation that distributes a histogram entry among all bins proportionally, depending
on the distance from that value to the center of each bin.
− In order to reduce the effects of illumination, a feature vector is normalized in two stages. First, the
vector is normalized to unit length by dividing each component by the vector norm. A change in image
contrast resulting from each pixel value being multiplied by a constant will multiply the gradients by the
same constant, so the change in contrast will be cancelled by the first normalization. A brightness
change caused by a constant being added to each pixel will not affect the gradient values because they
are computed from pixel differences
− SIFT reduces the influence of large gradient magnitudes by thresholding the values of the normalized
feature vector
62. 62
− 1. Construct the scale space. This is done using the procedure outlined in Figs. 11.56 and 11.57. The parameters that
need to be specified are , s, (k is computed from s), and the number of octaves. Suggested values are = 1 6. , s
= 2, and three octaves.
− 2. Obtain the initial keypoints. Compute the difference of Gaussians, D(x, y, ) from the smoothed images in scale
space, as explained in Fig. 11.58 and Eq. (11-69). Find the extrema in each D(x, y, ) image using the method
explained in Fig. 11.59. These are the initial keypoints.
− 3. Improve the accuracy of the location of the keypoints. Interpolate the values of D(x, y, ) via a Taylor expansion.
The improved key point locations are given by Eq. (11-74).
− 4. Delete unsuitable keypoints. Eliminate keypoints that have low contrast and/or are poorly localized. This is done
by evaluating D from Step 3 at the improved locations, using Eq. (11-75). All keypoints whose values of D are lower
than a threshold are deleted. A suggested threshold value is 0.03. Keypoints associated with edges are deleted also,
using Eq. (11-79). A value of 10 is suggested for r.
− 5. Compute keypoint orientations. Use Eqs. (11-80) and (11-81) to compute the magnitude and orientation of each
keypoint using the histogram-based procedure discussed in connection with these equations.
− 6. Compute keypoint descriptors. Use the method summarized in Fig. 11.62 to compute a feature (descriptor)
vector for each keypoint. If a region of size 16 × 16 around each keypoint is used, the result will be a 128-
dimensional feature vector for each keypoint.
SIFT SUMMARY
63. SIFT SUMMARY
63
FIGURE 11.63
(a) Keypoints and their directions (shown as gray arrows) for the building image and for a section of the right corner of
the building. The subimage is a separate image and was processed as such. (b) Corresponding key points between the
building and the subimage (the straight lines shown connect pairs of matching points). Only three of the 36 matches
found are incorrect.
643 54
Original
64. SIFT SUMMARY
64
FIGURE 11.64
(a) Keypoints for the rotated (by 5°) building image and for a section of the right corner of the building. The subimage is a
separate image and was processed as such. (b) Corresponding keypoints between the corner and the building. Of the 26
matches found, only two are in error.
Rotated
547
49
Rotated
65. SIFT SUMMARY
65
FIGURE 11.65
(a) Keypoints for the half-sized building and a section of the right corner. (b) Corresponding keypoints between the corner and
the building. Of the seven matches found, only one is in error.
Despite the fact that SIFT has the capability to
handle some degree of changes in intensity,
this example indicates that performance can
be improved by enhancing the contrast of an
image prior to processing
AT FIRST
no matches
were found
Then
Increased
Contrast
Scaled
195
24
Scaled
66. SIFT SUMMARY
66
FIGURE 11.66
(a) Matches between the original building image and a rotated version of a segment of its right corner. Ten matches were found,
of which two are incorrect. (b) Matches between the original image and a half-scaled version of a segment of its right corner.
Here, 11 matches were found, of which four were incorrect.
Original
Rotated
Scaled
Original
67. 67
− Feature extraction is a fundamental process in the operation of most automated image processing
applications. As indicated by the range of feature detection and description techniques covered in this
chapter, the choice of one method over another is determined by the problem under consideration
− The objective is to choose feature descriptors that “capture” essential differences between objects, or
classes of objects, while maintaining as much independence as possible to changes in variables such as
location, scale, orientation, illumination, and viewing angle
Summary, References, and Further Reading