SIFT/SURF can achieve scale, rotation and illumination invariant during image matching

Name: Muhammad Irsyadi Firdaus
Student ID: P66067055
Course: Digital Photogrammetry
Questions for Digital Photogrammetry Course
Chapter 4
1. Please explain why and how SIFT/SURF can achieve scale, rotation and illumination
invariant during image matching.
SIFT (Scale Invariant Feature Transform) proposed by Lowe solves the image
rotation, affine transformations, intensity, and viewpoint change in matching
features. The SIFT algorithm has 4 basic steps. First is to estimate a scale space
extrema using the Difference of Gaussian (DoG) to identify the potential interest
points, which were invariant to scale and orientation. DOG was used instead of
Gaussian to improve the computation speed.
𝐷𝐷(𝑥𝑥, 𝑦𝑦, 𝜎𝜎) = �𝐺𝐺(𝑥𝑥, 𝑦𝑦, 𝑘𝑘 𝜎𝜎) − 𝐺𝐺(𝑥𝑥, 𝑦𝑦, 𝜎𝜎)� ∗ 𝐼𝐼(𝑥𝑥, 𝑦𝑦) = 𝐿𝐿(𝑥𝑥, 𝑦𝑦, 𝑘𝑘𝑘𝑘) − 𝐿𝐿(𝑥𝑥, 𝑦𝑦, 𝜎𝜎)
Given a digital image 𝐼𝐼(𝑥𝑥, 𝑦𝑦), its scale space representation will be 𝐿𝐿(𝑥𝑥, 𝑦𝑦, 𝜎𝜎).
𝐺𝐺(𝑥𝑥, 𝑦𝑦, 𝜎𝜎) is the variablescale Gaussian kernel with the standard deviation. A pixel
compared of 3×3 neighborhood to detect the local maxima and minima of 𝐷𝐷(𝑥𝑥, 𝑦𝑦, 𝜎𝜎).
Secondly, a key point localization where the key point candidates are localized
and refined by eliminating the low contrast points. Hessian matrix was used to
compute the principal curvatures and eliminate the keypoints that have a ratio
between the principal curvatures that are greater than the ratio. Thirdly, a key point
orientation assignment based on local image gradient and lastly a descriptor
generator to compute the local image descriptor for each key point based on image
gradient magnitude and orientation at each image sample point in a region centered
at key point.
SURF (Speed Up Robust Feature) SURF (Speed Up Robust Features) algorithm, is
base on multi-scale space theory and the feature detector is base on Hessian matrix.
Since Hessian matrix has good performance and accuracy. In image I, 𝑥𝑥 = (𝑥𝑥, 𝑦𝑦) is the
given point, the Hessian matrix H(x, σ) in x at scale σ, it can be define as.
𝐻𝐻 (𝑥𝑥, 𝜎𝜎) = �
𝐿𝐿𝑥𝑥𝑥𝑥(𝑥𝑥, 𝜎𝜎) 𝐿𝐿𝑥𝑥𝑥𝑥(𝑥𝑥, 𝜎𝜎)
𝐿𝐿𝑦𝑦𝑦𝑦(𝑥𝑥, 𝜎𝜎) 𝐿𝐿𝑦𝑦𝑦𝑦(𝑥𝑥, 𝜎𝜎)
�
Where 𝐿𝐿𝑥𝑥𝑥𝑥(𝑥𝑥, 𝜎𝜎) is the convolution result of the second order derivative of Gaussian filter
𝜕𝜕2
𝜕𝜕𝜕𝜕2 𝑔𝑔(𝜎𝜎) with the image I in point 𝑥𝑥, and similarly for 𝐿𝐿𝑥𝑥𝑥𝑥(𝑥𝑥, 𝜎𝜎) and 𝐿𝐿𝑦𝑦𝑦𝑦(𝑥𝑥, 𝜎𝜎).
SURF approximates the DoG with box filters. Instead of Gaussian averaging the
image, squares are used for approximation since the convolution with square is
much faster if the integral image is used. Also this can be done in parallel for
different scales. The SURF approach can be divided into three main steps. First,
keypoints are selected at distinctive locations in the image, such as corners, blobs,
and T-junctions. Next, the neighborhood of every keypoint is represented by a
feature vector. This descriptor has to be distinctive. At the same time, it should be
robust to noise, detection errors, and geometric and photometric deformations.
Finally, the descriptor vectors are matched among the different images. Keypoints
are found by using a so-called Fast-Hessian Detector that is based on the
approximation of the Hessian matrix of a given image point.

Figure 1. The matching of varying intensity images using (a) SIFT (b) SURF
Figure 2. The matching of the original image with its rotated image using: (a) SIFT (b) SURF

Figure 3. The matching of the original image with its scaled image using: (a) SIFT (b) SURF
2. Please describe the concept and procedure of Semi-Global Matching.
Semi-Global Matching (SGM) is a robust stereo method that has proven its
usefulness in various applications ranging from aerial image matching to driver
assistance systems. It supports pixelwise matching for maintaining sharp object
boundaries and fine structures and can be implemented efficiently on different
computation hardware. Furthermore, the method is not sensitive to the choice of
parameters.
The Semi-Global Matching method performs a pixel-wise matching allowing to
shape efficiently object boundaries and fine details. The algorithm works with a pair
of images with known interior and exterior orientation and epipolar geometry (i.e.
assumes that corresponding points lie on the same horizontal image line). It realizes
the minimization of a global smoothness constraint, combining matching costs along
independent one-dimensional paths trough the image.
The first scanline developments, exploiting a single global matching cost for each
individual image line were prone to streaking effects, being the optimal solution of
each scan not connected with the neighboring ones. SGM algorithm allows to
overcome these problems thanks to the innovative idea of symmetrically compute
the pixel matching cost through several paths in the image. With a known disparity
value, the costs extract by the each path are summed for each pixel and disparity
value. Finally, the algorithm choses the pixel matching solution with the minimum
cost, usually using a dynamic programming approach. The cost 𝐿𝐿𝑟𝑟
′
(𝒑𝒑, 𝑑𝑑) of the pixel
𝒑𝒑 at disparity 𝑑𝑑, along the path direction 𝑟𝑟 is defined as:
𝐿𝐿𝑟𝑟
′ (𝒑𝒑, 𝑑𝑑) = 𝐶𝐶(𝒑𝒑, 𝑑𝑑) + min(𝐿𝐿𝑟𝑟(𝒑𝒑 − 𝒓𝒓, 𝑑𝑑),
𝐿𝐿𝑟𝑟(𝒑𝒑 − 𝒓𝒓, 𝑑𝑑 − 1) + 𝑝𝑝1

𝑚𝑚𝑚𝑚𝑚𝑚𝑖𝑖 𝐿𝐿𝑟𝑟(𝒑𝒑 − 𝒓𝒓, 𝑖𝑖) + 𝑝𝑝2) − 𝑚𝑚𝑚𝑚𝑚𝑚𝑘𝑘 𝐿𝐿𝑟𝑟(𝒑𝒑 − 𝒓𝒓, 𝑘𝑘)
where the first term is the similarity cost (i.e. a value that penalize, using
appropriate metrics, solutions where different radiometric values are encountered in
the neighbor area of the corresponding points) , whereas the second term evaluates
the regularity of the disparity field, adding a penalty term 𝑝𝑝1 for little changes and 𝑝𝑝2
for all larger disparity change with respect to the previous point in the considered
matching path. The two penalty values allow to describe curved surfaces and to
preserve disparity discontinuities, respectively. Since the cost gradually increases
during cost aggregation along the path, the last term allows reducing the final value
subtracting the minimum path cost of the previous pixel from the amount.
Minimization operation is performed efficiently with Dynamic Programing (Van
Meerbergen, et al., 2002) but, in order to avoid streaking effects, SGM strategy has
introduced the innovative idea of computing the optimization combining several
individual path, symmetrically from all directions through the image. Summing the
path costs in all directions and searching the disparity with the minimal cost for each
image pixel , produce the final disparity map. The aggregated cost is defined as
𝑆𝑆(𝒑𝒑, 𝑑𝑑) = � 𝐿𝐿𝑟𝑟(𝒑𝒑, 𝑑𝑑)
𝑟𝑟
and, for sub-pixel estimation of the final disparity solution, the position of the
minimum is calculated fitting a quadratic curve through the cost values of the
neighbor's pixels. Similar approaches, where the surface reconstruction is solved
through an energy minimization problem has been evaluated in (Pierrot-Deseilligny
& Paparoditis, 2006). He has implemented a Semi-Global matching-like method
identifying the formulation of an energy function 𝐸𝐸(𝑍𝑍) described as:
𝐸𝐸(𝑍𝑍) = � 𝐴𝐴(𝑥𝑥, 𝑦𝑦, 𝑍𝑍(𝑥𝑥, 𝑦𝑦)) + 𝛼𝛼 ∗ 𝐹𝐹 (𝐺𝐺⃑(𝑍𝑍))
where
o 𝑍𝑍 is the disparity function;
o 𝐴𝐴(𝑥𝑥, 𝑦𝑦, 𝑍𝑍(𝑥𝑥, 𝑦𝑦)) represents the similarity term;
o 𝐹𝐹 (𝐺𝐺⃑(𝑍𝑍)) is the positive function expressing the initial parameters which
characterize the surface regularity;
o 𝛼𝛼 represents the weight to permit the data adaptation to the image content (i.e.
the weight of disparity regularization enforcement).
3. Please compare the differences between Feature-based Matching and Dense Image
Matching, including their characteristics and suitability to certain kinds of
applications.
The real innovation that has been introduced in several dense image matching
methods during the last years regards the integration of different basic correlation
algorithms, consistency measures and constraints into a multi-step procedure, which

in many cases works through a multiresolution approach. Indeed, local correlation
algorithms assume constant disparities within a correlation window. The larger is the
size of this window, the greater is the robustness of matching. But this implicit
assumption about constant disparity inside the area is violated for elements like
geometric discontinuities, which lead to blurred object boundaries and smoothing
results. Furthermore, the matching phase, as commonly based on intensity
differences, is very sensitive to recording and illumination differences and is not
reliable in poorly textured or homogeneous regions.
A dense matching algorithm should be able to extract 3D points with a sufficient
resolution to describe the object’s surface and its discontinuities. Two critical issues
should be considered for an optimal approach: (i) the point resolution must be
adaptively tuned to preserve edges and to avoid too many points in flat areas; (ii) the
reconstruction must be guaranteed also in regions with poor textures or illumination
and scale changes.
A rough surface model of the object is often required by some techniques in
order to initialize the matching procedure. Such models can be derived in different
ways, e.g. by using a point cloud interpolated on the basis of tie points obtained
from the orientation stage, from already existing 3D models, or from low-resolution
range data. Other methods are organized in a hierarchical framework which
generates first a rough surface reconstruction, which is refined and made denser at a
later stage.
Many algorithms are based on normalized and distortion free images, whose
adoption simplifies and speeds up the search of correspondences. Possible outliers
are generally removed following two opposite strategies: (i) the use of multi-image
techniques to discard possible blunders by intersecting the homologous rays of the
matched point in object space; (ii) by computing a surface model as dense as
possible without any care of outliers and applying different filtering / smoothing
methods.
The most intuitive classification of image matching algorithms is based on the
used primitives - image intensity patterns (windows composed of grey values
around a point of interest) or features ( e.g. edges and regions) - which are then
transformed into 3D information through a mathematical model (e.g. collinearity
model or camera projection matrix). According to these primitives, the resulting
matching algorithms are generally classified as area - Based matching (ABM) or
feature - based matching (FBM). FBM is often used as alternative or combined with
ABM. FBM techniques are more flexible with respect to surface discontinuities, less
sensitive to image noise and require less approximate values. Because of the
sparse and irregularly distributed nature of the extracted features, the matching
results are in general sparse point clouds which are ten used as seeds to grow
additional matches.
The following properties are important for utilizing a feature detector in
computer vision applications:
• Robustness, the feature detection algorithm should be able to detect the
same feature locations independent of scaling, rotation, shifting, photometric
deformations, compression artifacts, and noise.
• Repeatability, the feature detection algorithm should be able to detect the
same features of the same scene or object repeatedly under variety of

viewing conditions.
• Accuracy, the feature detection algorithm should accurately localize the
image features (same pixel locations), especially for image matching tasks,
where precise correspondences are needed to estimate the epipolar
geometry.
• Generality, the feature detection algorithm should be able to detect features
that can be used in different applications.
• Efficiency, the feature detection algorithm should be able to detect features
in new images quickly to support real-time applications.
• Quantity, the feature detection algorithm should be able to detect all or most
of the features in the image. Where, the density of detected features should
reflect the information content of the image for providing a compact image
representation.

SIFT/SURF can achieve scale, rotation and illumination invariant during image matching

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SIFT/SURF can achieve scale, rotation and illumination invariant during image matching

Similar to SIFT/SURF can achieve scale, rotation and illumination invariant during image matching (20)

More from National Cheng Kung University

More from National Cheng Kung University (20)

Recently uploaded

Recently uploaded (20)

SIFT/SURF can achieve scale, rotation and illumination invariant during image matching