2. Content
• Image Features
• Types of Features
• Components of Feature Detection and Matching
• Feature detection (extraction)
• Feature Description
• Feature Matching
• Haar wavelets
• Feature tracking
• Invariant Local Features
• Point Features / Corners
• Eigenvalue/eigenvector in feature detection
• Harris operator / Harris Corner Detector
• Adaptive non-maximal suppression
3. Content
• Feature descriptors
• Bias and gain normalization or multi-scale oriented patches
• Scale Invariant Feature Transform (SIFT)
• PCA-SIFT
• Gradient location-orientation histogram (GLOH)
• Edge Detection
• Gradient – based operator
– Sobel operator, Prewitt operator, Robert operator
• Gaussian – based operator
4. Image Features
• Feature is the piece of information.
Two classes: 1. Keypoint features 2. Edges
1. Keypoint Features are specific locations in the images, such as mountain
peaks, building corners, doorways, or interestingly shaped patches.
• These localized features are called keypoint features or interest points or corners.
• The keypoint features are described by the appearance of patches of pixels
surrounding the point location.
point
patch
5. Image Features
2. Edges.
• These features can be matched based on their orientation and local
appearance (edge profiles).
• Edge features are good indicators of object boundaries and
occlusion (blockage) events in image sequences.
• Edges can be grouped into longer curves and straight line
segments, which can be directly matched or analyzed to find
vanishing points, internal and external camera parameters.
• Point, Edges and lines features provide information to both
keypoint and region-based descriptors to describe object boundaries
and man-made objects.
6. Components of Feature Detection and Matching
Feature detection (extraction)
• Each image is searched for locations (interest point) that are likely
to match well in other images.
Feature Description
• Each region around detected keypoint locations is converted into a
more compact and stable (invariant) descriptor under changes in
illumination, scale, translation, rotation that can be matched against
other descriptors.
Feature Matching
• Efficient searches of likely matching candidates (feature
descriptors) in other images.
Feature tracking
• Alternative to the feature matching that only searches a small
neighborhood around each detected feature. More suitable for video
processing.
7. Invariant Local Features
• Features invariant to transformations are:
1. Geometric Invariance: translation, rotation, scale
2. Photometric Invariance: brightness, exposure, etc.
Advantages of Local Features:
• Locality
– features are local, so robust to occlusion and clutter
• Distinctiveness:
– can differentiate a large database of objects
• Quantity
– hundreds or thousands in a single image
• Efficiency
– real-time performance achievable
• Generality
– exploit different types of features in different situations
8. (Interest) Point Features / Corners
• Interest point is the point at which the direction of the boundary of the
object changes abruptly or intersection point between two or more edge
segments.
• Point features are used to find a sparse set of corresponding locations in
different images as a pre-cursor to computing camera pose.
• It is a prerequisite for computing a denser set of correspondences using
stereo matching.
• Denser set of correspondences are used to align different images, e.g.,
when stitching image mosaics or performing video stabilization.
• They are also used extensively to perform object instance and category
recognition.
• Advantage of keypoints is matching the images even in the presence of
clutter (occlusion), large scale and orientation changes.
• Feature-based correspondence techniques used in stereomatching,
image-stitching, automated 3D modeling applications.
9. Point Features
Finding feature points and their correspondences in 2 approaches:
1. Find features in one image that is accurately tracked using a local
search technique, such as correlation or least squares.
• more suitable when images are taken from nearby viewpoints or
in rapid succession.
• e.g., video sequences.
2. Independently detect features in all the images under consideration
and then match features based on their local appearance.
• more suitable when a large amount of motion or appearance
change is expected.
• e.g. stitching together panoramas, establishing correspondences
in wide baseline stereo (Estimation of the fundamental matrix),
or performing object recognition.
10. Feature detectors
Find the good features for matching with other images:
• Figure shows with Three sample patches in that how well they might be matched
or tracked.
• From the figure observe, textureless patches are nearly impossible to localize.
Patches with large contrast changes (gradients) are easier to localize.
11. Feature detectors
Figure shows aperture problem for various images.
• The two images I0 (yellow) and I1 (red) are overlaid.
• The red vector u indicates the displacement between the patch centers
• w(xi) weighting function (patch window) is shown as a dark circle.
• Patches with gradients in at least two (significantly) different orientations are the
easiest to localize.( Fig a).
• Although straight line segments at a single orientation suffer from the aperture
problem i.e., it is only possible to align the patches along the direction normal to
the edge direction (Fig b).
Patch with stable
(point –like) flow
Classic aperture
problem (barber-pole
illusion)
Textureless region
Corner
Feature
12. Feature detectors
Find unique features i.e. look for unusual regions in images.
• Consider the small window of pixels in the image.
“flat” region:
no change in all
directions
“edge”:
no change along the
edge direction
“corner”:
significant change in
all directions
• Shifting the window in any direction causes a big change
13. Feature detection
Shifting the window W by (u, v)
• Compare each pixel before and after shift image by
Summing the Squared Differences (SSD).
• I0 and I1 are the two images being compared, (u, v) is the
displacement vector, w(x, y) is a spatially varying
weighting (or window) function, the summation i is over
all the pixels in the patch.
• This defines an Weighted SSD “Error” of E(u, v):
W (u, v):
(x, y)
Auto-Correlaton function or surface - Compute the stable of image
with respect to small variation in position Δu by comparing an image
patch against itself.
2
0
0 )]
,
(
)
,
(
)[
,
(
)
,
( i
i
i
i
i
i
i
WSSD y
x
I
v
y
u
x
I
y
x
w
v
u
E −
+
+
=
2
0
0 )]
,
(
)
,
(
)[
,
(
)
( i
i
i
i
i
i
i
AC y
x
I
v
y
u
x
I
y
x
w
u
E −
+
+
=
14. W
Feature detection
(u, v):
(x, y)
Using Taylor Series, expansion of image I:
Approximate the Auto-correlation function as
If the motion (u, v) is small, then first order approx is good i.e. sufficient
)
,
).(
,
(
)
,
(
)
,
( 0
0
0 v
u
y
x
I
y
x
I
v
y
u
x
I i
i
i
i
i
i
+
+
+
v
x
I
u
x
I
y
x
I
v
y
u
x
I i
i
+
+
+
+ 0
0
0 )
,
(
)
,
(
+
v
u
I
I
y
x
I y
x
)
,
(
)
,
(
)
,
(
)
(
)]
,
).(
,
(
)[
,
(
)
(
)]
,
(
)
,
).(
,
(
)
,
(
)[
,
(
)
(
)]
,
(
)
,
(
)[
,
(
)
(
2
0
2
0
0
0
2
0
0
v
u
A
v
u
u
E
v
u
y
x
I
y
x
w
u
E
y
x
I
v
u
y
x
I
y
x
I
y
x
w
u
E
y
x
I
v
y
u
x
I
y
x
w
u
E
T
AC
i
i
i
i
i
AC
i
i
i
i
i
i
i
i
i
AC
i
i
i
i
i
i
i
AC
−
+
−
+
+
=
is image gradient at (xi, yi)
)
,
(
0 i
i y
x
I
15. Feature detection
In this example,
• Can move the center of the green window to anywhere on the blue unit circle.
• Which directions will result in the largest and smallest E values?
• Find these directions by looking at the eigenvectors of H
−
+
i i
i
i
y
i
x
i
i
i
i
i
i
i
i
y
i
x
i
i
i
i
v
u
I
I
y
x
w
y
x
I
v
u
I
I
y
x
I
y
x
w
2
2
)
,
(
)]
,
(
)
,
(
)[
,
(
2
0
0 )]
,
(
)
,
(
)[
,
(
)
( i
i
i
i
i
i
i
AC y
x
I
v
y
u
x
I
y
x
w
u
E −
+
+
=
i i
i
i
y
i
x
i
y
i
y
i
x
i
x
i
i
i
i
v
u
I
I
I
I
I
I
v
u
y
x
w 2
2
)
,
(
H
Auto-correlation matrix A can be written as
16. Quick eigenvalue/eigenvector
• The eigenvectors of a matrix A are the vectors x that satisfy:
• The scalar is the eigenvalue corresponding to x
– The eigenvalues are found by solving:
– Assume, A = H is a 2x2 matrix, so
– The solution:
Once found , find x by solving
17. Feature detection
xmin
xmax
Eigenvalues and eigenvectors of H
• Define shifts with the smallest and largest change (E value)
• xmax = direction of largest increase in E.
• max = amount of increase in direction xmax
• xmin = direction of smallest increase in E.
• min = amount of increase in direction xmin
This can be rewritten as:
i i
i
i
y
i
x
i
y
i
y
i
x
i
x
i
i
i
i
v
u
I
I
I
I
I
I
v
u
y
x
w 2
2
)
,
(
H
18. Feature detection
How are max, xmax, min, and xmin relevant for feature detection?
E(u, v) need to be large for small shifts in all directions
• The minimum of E(u, v) should be large, over all unit vectors [u v]
• This minimum is given by the smaller eigenvalue (min) of H.
Auto-correlation-based Keypoint Detector Algorithm
• Compute the gradient at each point in the image
• Create the H matrix from the entries in the gradient
• Compute the eigenvalues.
• Find points with large response (min > threshold)
• Choose those points where min is a local maximum as features.
Choose points where - is a
local maximum as features.
19. Harris operator / Harris Corner Detector
min is a variant of the “Harris operator” for feature (f) detection
• The trace is the sum of the diagonals, i.e., trace(H) = h11 + h22
• Very similar to - but less expensive (due to no square root)
• Harris Corner Detector uses local maxima in rotationally invariant scalar
measures.
• This is derived from the auto-correlation matrix to locate keypoints for sparse
features matching.
• It is using a Gaussian weighting window instead of square patches, which
makes the detector response insensitive to in-plane image rotations.
• The minimum eigenvalue λ0 is not only quantity that can be used to find
keypoints
21. Harris detector example
The circle sizes and colors indicate the scale at which each interest point was detected.
Sample image Harris response
22. Feature detector properties
Adaptive non-maximal suppression (ANMS)
• Most of the feature detectors simply look for local maxima in the
interest function.
• It leads to an uneven distribution of feature points across the image.
e.g., points will be denser in regions of higher contrast.
• To mitigate this problem, only detect features that are both local
maxima and whose response value is significantly (10%) greater
than all of its neighbors within a radius r.
• They devise an efficient way to associate suppression radii with all
local maxima by first sorting them with its response strength and
then creating a second list sorted by decreasing suppression radius.
23. Feature detector properties
• The upper two images show the strongest 250 and 500 interest points.
• Lower two images show the interest points selected with ANMS along with the
corresponding suppression radius r.
• In ANMS, features have a much more uniform spatial distribution across the
image.
Qualitative
comparison of
selecting the top n
features using
ANMS
24. Feature detector properties
Measuring repeatability
• Measuring repeatability is the ratio between the number of keypoints
simultaneously present in all the images of the series (repeated keypoints)
over the total number of detections.
• It is used to assess keypoint detection performance i.e. feature detector
performance.
• Measures the detector’s ability to identify the same features (i.e., repeated
detections) despite variations in the viewing conditions.
Scale invariance
• Scale invariance is a feature of objects that do not change by variying the
scales of length.
• In image matching, Sufficient features may not exist in images with little
high frequency detail (e.g., clouds).
• So, extract features at a variety of scales, e.g., performing the same
operations at multiple resolutions.
25. Feature detector properties
Rotational invariance and orientation estimation
• Orientation is estimated by the average gradient within a region around the
keypoint.
• Gradient is computed using the histogram of orientations around the keypoint.
• A dominant orientation is computed by creating a Histogram of all the Gradient
Orientations (weighted by its magnitudes or after thresholding out small
gradients) and then finding the significant peaks in this distribution.
Affine invariance
• The surfaces are considered the same under affine (An affine function is the
composition of a linear function with a translation, so while the linear part fixes
the origin, the translation can map it somewhere else.) transformations, i.e., linear
transformations x ↦ Ax + b, including squeezing and shearing.
26. Feature descriptors
• A feature descriptor is an algorithm which takes an image and
outputs feature descriptors/feature vectors.
• Feature descriptors encode interesting information of image into a series
of numbers that can be used to differentiate one feature from another.
• After detecting features (keypoints), must match them, i.e., determine
which features come from corresponding locations in different images.
Some of Feature descriptors are:
1. Bias and gain normalization or multi-scale oriented patches
2. Scale Invariant Feature Transform (SIFT)
3. PCA-SIFT
4. Gradient location-orientation histogram (GLOH)
Bias and gain normalization or multi-scale oriented patches (MOPS) do
not exhibit large amounts of foreshortening such as image stitching. Patch
intensities are re-scaled to compensate the affine photometric variations i.e
making mean is zero and variance is one.
27. Feature descriptors
Scale Invariant Feature Transform (SIFT)
• SIFT is invariance to image scale and rotation. SIFT features are formed by
computing the gradient at each pixel in a 16x16 window around the detected
keypoint, using the appropriate level of the Gaussian pyramid at which the
keypoint was detected.
• The gradient magnitudes are down-weighted by a Gaussian fall-off function to
reduce the influence of gradients far from the center, as these are more affected
by small mis-registrations.
• In each 4x4 quadrant, a gradient orientation histogram is formed by
(conceptually) adding the weighted gradient value to one of eight orientation
histogram bins.
28. Scale Invariant Feature Transform (SIFT)
• To reduce the effects of location and dominant orientation mis-estimation, each of
the original 256 weighted gradient magnitudes is softly added to 2x2 histogram
bins using trilinear interpolation.
• It gives 128 non-negative values that form a raw version of the SIFT descriptor
vector.
• To reduce the effects of contrast or gain (additive variations are already removed
by the gradient), the 128-D vector is normalized to unit length.
PCA-SIFT - It computes the x and y (gradient) derivatives over a 39 x39 patch and
then reduces the resulting 3042-dimensional vector to 36 using principal component
analysis.
Gradient location-orientation histogram (GLOH) – GLOH is a variant on SIFT
that uses a log-polar binning structure instead of the four quadrants.
29. Feature Matching
• Once features and their descriptors are extracted from two or more
images, match the preliminary features between the images.
Matching strategy and error rates
• Determining which feature matches are reasonable to process further
depends on the context in which the matching is being performed.
• Consider two images that overlap to a fair amount.
• Find the most features in one image that are likely to match the other
image, although some may not match because they are occluded or
their appearance has changed too much.
Euclidean distance
• Euclidean(vector magnitude) distances in feature space is used for
ranking potential matches.
• If certain parameters (axes) in a descriptor are more reliable than
others, then re-scale these axes ahead of time to determine how much
they vary when compared against other known good matches.
30. Feature Matching
• Its the simplest matching strategy is to set a threshold (maximum distance)
and to return all matches from other images within this threshold.
• Setting the threshold too high results in too many false positives, i.e.,
incorrect matches being returned. Setting the threshold too low results in
too many false negatives, i.e., too many correct matches being missed.
• Evaluate the performance of a matching algorithm at a particular threshold
by first counting the number of true and false matches and match failures
using the following metrics.
• TP: true positives, i.e., number of correct matches;
• FN: false negatives, matches that were not correctly detected;
• FP: false positives, proposed matches that are incorrect;
• TN: true negatives, non-matches that were correctly rejected.
• Accuracy, Receiver Operating Characteristic (ROC) Curve, Area Under the
Curve
31. Feature Matching
• True Positive Rate TPR = TP/(TP+FN)
• False Positive Rate FPR =
• Positive Predictive Rate value (PPR) = TP / (TP+TN)
• Accuracy = (TP+TN) / ,
• Confusion matrix: Describe the performance of a classification model (or
"classifier") on a set of test data for which the true values are known.
• When varying the feature matching threshold, we obtain a family of TP and TN points.
These are collectively known as the Receiver Operating Characteristic (ROC) curve.
• The closer this curve lies to the upper left corner, i.e., the larger the area under the curve
(AUC) provides better performance.
Actual
YES
Actual
NO
Predicted
YES
TP FP
Predicted
NO
FN TN
32. Feature Matching
Efficient Matching Using indexing structure Approach
• Indexing structure Approach is multi-dimensional search tree or a
hash table, to rapidly search for features near a given feature.
• Such indexing structures can either be built for each image
independently or globally for all the images in a given database,
which can potentially be faster, since it removes the need to iterate
over each image.
• It maps descriptors into fixed size buckets based on some function
applied to each descriptor vector.
• At matching time, each new feature is hashed into a bucket, and a
search of nearby buckets is used to return potential candidates, which
can then be sorted or graded to determine which are valid matches.
e.g. Haar wavelets.
33. Feature Tracking
• The process of selecting good features to track closely related
features for more general recognition applications.
• In practice, regions containing high gradients in both directions, i.e.,
which have high eigenvalues in the auto-correlation matrix, provide
stable locations at which to find correspondences.
• One of the applications of fast feature tracking is performance-
driven animation, i.e., the interactive deformation of a 3D graphics
model based on tracking a user’s motions.
34. Feature Matching
Haar wavelets
• During the matching structure construction, each 8x8 scaled, oriented, and
normalized MOPS patch is converted into a three-element index by performing
sums over different quadrants of the patch.
• The resulting three values are normalized by their expected standard deviations and
then mapped to the two nearest 1D bins.
• The three-dimensional indices formed by concatenating the three quantized values
are used to index the 23 = 8 bins where the feature is stored (added). At query time,
only the primary (closest) indices are used, so only a single three-dimensional bin
needs to be examined. The coefficients in the bin can then be used to select k
approximate nearest neighbors for further processing
Locality sensitive hashing
• It uses unions of independently computed hashing functions to index
the features.
35. Edge Detection
• Edges are significant local changes of intensity in a digital image. An edge can be
defined as a set of connected pixels that forms a boundary between two disjoint
regions.
• There are three types of edges:
1. Horizontal edges
2. Vertical edges
3. Diagonal edges
• Edge Detection is a method of segmenting an image into regions of
discontinuity.
• It is used for pattern recognition, image morphology, feature extraction techniques.
• Edge detection allows users to observe the features of an image for a significant
change in the gray level. This texture indicating the end of one region in the image
and the beginning of another. It reduces the amount of data in an image and
preserves the structural properties of an image.
36. Edge Detection
• Edge Detection Operators are of two types:
• Gradient – based operator which computes first-order derivations in a digital
image like, Sobel operator, Prewitt operator, Robert operator
• Gaussian – based operator which computes second-order derivations in a digital
image like, Canny edge detector, Laplacian of Gaussian
37. Edge Detection - Gradient based operator
Sobel Operator:
• It is a discrete differentiation operator.
• It computes the gradient approximation of image intensity function for image edge
detection. At the pixels of an image, the Sobel operator produces either the normal to a
vector or the corresponding gradient vector.
• It uses two 3 x 3 kernels or derivative masks which are convolved with the input image to
calculate the vertical and horizontal derivative approximations respectively.
Advantages:
• Simple and time efficient computation
• Very easy at searching for smooth edges
Limitations:
• Diagonal direction points are not preserved always.
• Highly sensitive to noise
• Not very accurate in edge detection
• Detect with thick and rough edges does not give appropriate results
38. Edge Detection - Gradient based operator
Prewitt Operator: This operator is almost similar to the sobel operator.
It also detects vertical and horizontal edges of an image. It is one of the best ways to detect the
orientation and magnitude of an image. It uses the kernels or masks.
Advantages:
• Good performance on detecting vertical and horizontal edges
• Best operator to detect the orientation of an image
Limitations:
• The magnitude of coefficient is fixed and cannot be changed
• Diagonal direction points are not preserved always
39. Edge Detection - Gradient based operator
Robert Operator: This gradient-based operator computes the sum of squares of the differences
between diagonally adjacent pixels in an image through discrete differentiation.
Then the gradient approximation is made. It uses the following 2 x 2 kernels or masks.
Advantages:
• Detection of edges and orientation are very easy
• Diagonal direction points are preserved
Limitations:
• Very sensitive to noise
• Not very accurate in edge detection
40. Edge Detection - Gaussian based operator
Marr-Hildreth Operator or Laplacian of Gaussian (LoG):
• It uses the Laplacian to take the second derivative of an image. This really works well when
the transition of the grey level seems to be abrupt.
• It works on the zero-crossing method i.e when the second-order derivative crosses zero, then
that particular location corresponds to a maximum level. It is called an edge location.
• Here the Gaussian operator reduces the noise and Laplacian operator detects the sharp edges.
• The Gaussian function is defined as
• Sigma – Standard deviation
• LoG operator is computed from
Advantages:
• Easy to detect edges and their various orientations
• There is fixed characteristics in all directions
Limitations:
• Very sensitive to noise
• The localization error may be severe at curved edges
• It generates noisy responses that do not correspond to edges, so-called “false edges”
41. Edge Detection - Gaussian based operator
Canny Operator:
• This operator is not susceptible to noise.
• It extracts image features without affecting or altering the feature.
• Canny edge detector have advanced algorithm derived from the previous work of Laplacian
of Gaussian operator.
• It is widely used an optimal edge detection technique.
• It detects edges based on three criteria:
1. Low error rate
2. Edge points must be accurately localized
3. There should be just one single edge response
Advantages:
• It has good localization
• It extract image features without altering the features
• Less Sensitive to noise
Limitations:
• There is false zero crossing
• Complex computation and time consuming
Real-world Applications of Image Edge Detection:
• Medical imaging (study of anatomical structure), Locate an object in satellite images,
Automatic traffic controlling systems, Face recognition, and Fingerprint recognition