Morton john canty image analysis and pattern recognition for remote sensing with algorithms in envi-idl
1. Image Analysis and Pattern Recognition
for Remote Sensing
with Algorithms in ENVI/IDL
Morton John Canty
Forschungszentrum J¨ulich GmbH
m.canty@fz-juelich.de
March 21, 2005
9. Chapter 1
Images, Arrays and Vectors
1.1 Multispectral satellite images
There are a number of multispectral satellite-based sensors currently in orbit which are used
for earth observation. Representative of these we mention here the Landsat ETM+ system.
The ETM+ instrument on the Landsat 7 spacecraft contains sensors to measure radiance
in three spectral intervals:
• visible and near infrared (VNIR) bands - bands 1,2,3,4, and 8 (PAN) with a spectral
range between 0.4 and 1.0 micrometer.
• short wavelength infrared (SWIR) bands - bands 5 and 7 with a spectral range between
1.0 and 3.0 micrometer.
• thermal long wavelength infrared (LWIR) band - band 6 with a spectral range between
8.0 and 12.0 micrometer.
In addition a panchromatic (PAN) image (band 8) covering the visible spectrum is provided.
Ground resolutions are 15m (PAN), 30m (VNIR,SWIR) and 60m (LWIR). Figure 1.1 shows
a color composite image of a Landsat 7 scene over Morocco acquired in 1999.
A single multispectral image can be represented as an array of gray-scale values or digital
numbers
gk(i, j), 1 ≤ i ≤ c, 1 ≤ j ≤ r,
where c is the number of pixel columns and r is the number of pixel rows. If we are dealing
with an N-band multispectral image, then the index k, 1 ≤ k ≤ N, denotes the spectral
band. Often a pixel intensity is stored in a single byte, so that 0 ≤ gk ≤ 255.
The gray-scale values are the result of sampling along an array of sensors the at-sensor
radiance fλ(x, y) at wavelength λ due to sunlight reflected from some point (x, y) on the
Earth’s surface and focussed by the satellite’s optical system at the sensors. Ignoring atmo-
spheric effects this radiance is given roughly by
fλ(x, y) ∼ iλ(x, y)rλ(x, y),
where iλ(x, y) is the sun’s irradiance at the surface in units of watt/m2
µm, and rλ(x, y)
is the surface reflectance, a number between 0 and 1. The conversion between gray-scale
1
10. 2 CHAPTER 1. IMAGES, ARRAYS AND VECTORS
Figure 1.1: Color composite of bands 4 (red), 5 (green) and 7 (blue) for a Landsat ETM+
image over Morocco.
11. 1.1. MULTISPECTRAL SATELLITE IMAGES 3
or digital number g and at-sensor radiance f is determined by the sensor calibration as
measured (and maintained) by the satellite image provider:
f = Cg(i, j) + fmin
where C = (fmax − fmin)/255, in which fmax and fmin are maximum and minimum mea-
surable radiances at the sensor.
Atmospheric scattering and absorption models are used to calculate surface reflectance
from the observed at-sensor radiance, as it is the reflectance which is directly related to the
physical properties of the surface being examined.
Various conventions can be used for storing the image array g(i, j) in computer memory
or on storage media. In band interleaved by pixel (BIP) format, for example, a two-channel,
3 × 3 pixel image would be stored as
g1(1, 1) g2(1, 1) g1(2, 1) g2(2, 1) g1(3, 1) g2(3, 1)
g1(1, 2) g2(1, 2) g1(2, 2) g2(2, 2) g1(3, 2) g2(3, 2)
g1(1, 3) g2(1, 3) g1(2, 3) g2(2, 3) g1(3, 3) g2(3, 3),
whereas in band interleaved by line (BIL) it would be stored as
g1(1, 1) g1(2, 1) g1(3, 1) g2(1, 1) g2(2, 1) g2(3, 1)
g1(1, 2) g1(2, 2) g1(3, 2) g2(2, 1) g1(2, 2) g2(2, 3)
g1(1, 3) g2(2, 3) g1(3, 3) g2(3, 1) g1(2, 3) g2(3, 3),
and in band sequential (BSQ) format it is stored as
g1(1, 1) g1(2, 1) g1(3, 1)
g1(1, 2) g1(2, 2) g1(3, 2)
g1(1, 3) g1(2, 3) g1(3, 3)
g2(1, 1) g2(2, 1) g2(3, 1)
g2(1, 2) g2(2, 2) g2(3, 2)
g2(1, 3) g2(2, 3) g2(3, 3).
In the computer language IDL, so-called row major indexing is used for arrays and the
elements in an array are numbered from zero. This means that, if a gray-scale image g is
stored in an IDL array variable G, then the intensity value g(i, j) is addressed as G[i-1,j-1].
An N-band multispectral image is stored in BIP format as an N × c × r array in IDL, in
BIL format as a c × N × r and in BSQ format as an c × r × N array.
Auxiliary information, such as image acquisition parameters and georeferencing, is nor-
mally included with the image data on the same file, and the format may or may not make
use of compression algorithms. Examples are the geoTIFF1
file format used for example by
Space Imaging Inc. for distributing Carterra(c) imagery and which includes lossless compres-
sion, the HDF (Hierachical Data Format) in which for example ASTER images are distributed
and the cross-platform PCDSK format employed by PCI Geomatics with its image process-
ing software, which is in plain ASCII code and not compressed. ENVI uses a simple “flat
binary” file structure with an additional ASCII header file.
1geoTIFF refers to TIFF files which have geographic (or cartographic) data embedded as tags within the
TIFF file. The geographic data can then be used to position the image in the correct location and geometry
on the screen of a geographic information display.
12. 4 CHAPTER 1. IMAGES, ARRAYS AND VECTORS
1.2 Algebra of vectors and matrices
It is very convenient to use a vector representation for multispectral images, namely
g(i, j) =
g1(i, j)
...
gN (i, j)
, (1.1)
which is a column vector of multispectral gray-scale values at the position (i, j).
Since we will be making extensive use of the vector notation of Eq. (1.1) we review
here some of the basic properties of vectors and matrices. We can illustrate most of these
properties in just two dimensions.
x
x2
x1
Figure 1.2: A vector in two dimensions.
The transpose of the two-dimensional column vector shown in Fig. 1.2,
x =
x1
x2
,
is the row vector
x = (x1, x2).
The sum of two vectors is given by
x + y =
x1
x2
+
y1
y2
=
x1 + y1
x2 + y2
,
and the inner product by
x y = (x1, x2)
y1
y2
= x1y1 + x2y2.
The length or norm of the vector x is
x = |x| = x2
1 + x2
2 =
√
x x .
The programming language IDL is especially good at manipulating vectors and matrices:
13. 1.2. ALGEBRA OF VECTORS AND MATRICES 5
IDL x=[[1],[2]]
IDL print,x
1
2
IDL print,transpose(x)
1 2
b
X
x
yθ
x cos θ
Figure 1.3: The inner product.
The inner product can be written in terms of the vector lengths and the angle θ between
the two vectors as
x y = |x||y| cos θ = xy cos θ,
see Fig. 1.3. If θ = 90o
the vectors are orthogonal so that
x y = 0.
Any vector can be decomposed into orthogonal unit vectors:
x =
x1
x2
= x1
1
0
+ x2
0
1
.
A two-by-two matrix is written
A =
a11 a12
a21 a22
.
When a matrix is multiplied with a vector the result is another vector, e.g.
Ax =
a11 a12
a21 a22
x1
x2
=
a11x1 + a12x2
a21x1 + a22x2
.
The IDL operator for matrix and vector multiplication is ##.
IDL a=[[1,2],[3,4]]
IDL print,a
1 2
3 4
IDL print,a##x
5
11
14. 6 CHAPTER 1. IMAGES, ARRAYS AND VECTORS
Matrices also have a transposed form, obtained by interchanging their rows and columns:
A =
a11 a21
a12 a22
.
The product of two matrices is given by
AB =
a11 a12
a21 a22
b11 b12
b21 b22
=
a11b11 + a12b21 · · ·
· · · · · ·
and is another matrix. The determinant of a two-dimensional matrix is
|A| = det A = a11a22 − a12a21.
The outer product of two vectors is a matrix:
xy =
x1
x2
(y1, y2) =
x1 0
x2 0
y1 y2
0 0
=
x1y1 x1y2
x2y1 x2y2
The identity matrix is given by
I =
1 0
0 1
, IA = AI = A.
The matrix inverse A−1
is defined in terms of the identity matrix according to
A−1
A = AA−1
= I.
In two dimensions it is easy to verify that
A−1
=
1
|A|
a22 −a12
−a21 a11
.
IDL print, determ(float(a))
-2.00000
IDL print, invert(a)
-2.00000 1.00000
1.50000 -0.500000
IDL print, a##invert(a)
1.00000 0.000000
0.000000 1.00000
If |A| = 0, then A has no inverse and is said to be a singular matrix. The trace of a
square matrix is the sum of its diagonal elements:
Tr A = a11 + a22.
1.3 Eigenvalues and eigenvectors
The statistical properties of ensembles of pixel intensities (for example entire images or
specific land-cover classes) are often approximated by their mean values and covariance
15. 1.3. EIGENVALUES AND EIGENVECTORS 7
matrices. As we will see later, covariance matrices are always symmetric. A matrix A is
symmetric if it doesn’t change when it is transposed, i.e. if
A = A .
Very often we have to solve the so-called eigenvalue problem, which is to find eigenvectors x
and eigenvalues λ that satisfy the equation
Ax = λx
or, equivalently,
a11 a12
a21 a22
x1
x2
= λ
x1
x2
.
This is the same as the two equations
(a11 − λ)x1 + a12x2 = 0
a21x1 + (a22 − λ)x2 = 0.
(1.2)
If we eliminate x1 and make use of the symmetry a12 = a21, we obtain
[(a11 − λ)(a22 − λ) − a2
12]x2 = 0.
In general x2 = 0, so we must have
(a11 − λ)(a22 − λ) − a2
12 = 0,
which is known as the characteristic equation for the eigenvalue problem. It is a quadratic
equation in λ with solutions
λ(1)
=
1
2
a11 + a22 + (a11 + a22)2 − 4(a11a22 − a2
12)
λ(2)
=
1
2
a11 + a22 − (a11 + a22)2 − 4(a11a22 − a2
12) .
(1.3)
Thus there are two eigenvalues and, correspondingly, two eigenvectors x(1)
and x(2)
, which
can be obtained by substituting λ(1)
and λ(2)
into (1.2) and solving for x1 and x2. It is easy
to show that the eigenvalues are orthogonal
(x(1)
) x(2)
= 0.
The matrix formed by the two eigenvectors,
u = (x(1)
, x(2)
) =
x
(1)
1 x
(2)
1
x
(1)
2 x
(2)
2
,
is said to diagonalize the matrix a. That is
u Au =
λ(1)
0
0 λ(2) . (1.4)
We can illustrate the whole procedure in IDL as follows:
16. 8 CHAPTER 1. IMAGES, ARRAYS AND VECTORS
IDL a=float([[1,2],[2,3]])
IDL print,a
1.00000 2.00000
2.00000 3.00000
IDL print,eigenql(a,eigenvectors=u,/double)
4.2360680 -0.23606798
IDL print,transpose(u)##a##u
4.2360680 -2.2204460e-016
-1.6653345e-016 -0.23606798
Note that, after diagonalization, the off-diagonal elements are not precisely zero due to
rounding errors in the computation.
All of the above properties generalize easily to N dimensions.
1.4 Finding minima and maxima
In order to maximize some desirable property of a multispectral image, such as signal to
noise or spread in intensity, we often need to take derivatives of vectors. A vector (partial)
derivative in two dimensions is written ∂
∂x and is defined as the vector
∂
∂x
=
1
0
∂
∂x1
+
0
1
∂
∂x2
.
Many of the operations with vector derivatives correspond exactly to operations with or-
dinary scalar derivatives (They can all be verified easily by writing out the expressions
component-by component):
∂
∂x
(x y) = y analogous to
∂
∂x
xy = y
∂
∂x
(x x) = 2x analogous to
∂
∂x
x2
= 2x
The scalar expression
x Ay,
where A is a matrix, is called a quadratic form. We have
∂
∂x
(x Ay) = Ay
∂
∂y
(x Ay) = A x
and
∂
∂x
(x Ax) = Ax + A x.
Note that, if A is a symmetrix matrix, this last equation can be written
∂
∂x
(x Ax) = 2Ax.
Suppose x∗
is a critical point of the function f(x), i.e.
d
dx
f(x∗
) =
d
d
f(x)
x=x∗
= 0, (1.5)
17. 1.4. FINDING MINIMA AND MAXIMA 9
x∗
x
f(x)
d
dx f(x∗
) = 0
Figure 1.4: A function of one variable.
see Fig. 1.4. Then f(x∗
) is a local minimum if d2
dx2 f(x∗
) 0. This becomes obvious if we
express f(x) as a Taylor series about x∗
f(x) = f(x∗
) + (x − x∗
)
d
dx
f(x∗
) + (x − x∗
)2 d2
dx2
f(x∗
) + . . . .
For |x − x∗
| sufficiently small this is equivalent to
f(x) ≈ f(x∗
) + (x − x∗
)2 d2
dx2
f(x∗
).
The situation is similar for scalar functions of a vector:
f(x) ≈ f(x∗
) + (x − x∗
)
∂f(x∗
)
∂x
+
1
2
(x − x∗
) H(x − x∗
). (1.6)
where H is called the Hessian matrix:
(H)ij =
∂2
∂xi∂xj
f(x∗
). (1.7)
In the neighborhood of the critical point, since ∂f(x∗
)
∂x = 0, we get the approximation
f(x) ≈ f(x∗
) + (x − x∗
) H(x − x∗
).
Now the condition for a local minimum is that the Hessian matrix be positive definite at the
point x∗
. Positive definiteness means that
x Hx 0 for all x = 0. (1.8)
Suppose we want to find a minimum (or maximum) of a scalar function f(x) of the
vector x. If there are no constraints, then we solve the set of equations
∂f(x)
∂xi
= 0, i = 1, 2,
or, in terms of our notation for vector derivatives,
∂f(x)
∂x
= 0 =
0
0
.
18. 10 CHAPTER 1. IMAGES, ARRAYS AND VECTORS
However suppose that x is constrained by the equation
g(x) = 0.
For example, we might have
g(x) = x2
1 + x2
2 − 1 = 0
which constrains x to lie on a circle of radius 1.
Finding an minimum of f subject to g = 0 is equivalent to finding an unconstrained
minimum of
f(x) + λg(x), (1.9)
where λ is called a Lagrange multiplier and is treated like an additional variable, see [Mil99].
That is, we solve the set of equations
∂
∂xi
(f(x) + λg(x)) = 0, i = 1, 2
∂
∂λ
(f(x) + λg(x)) = 0.
(1.10)
The latter equation is just g(x) = 0.
For example, let f(x) = ax2
1 + bx2
2 and g(x) = x1 + x2 − 1. Then we get the three
equations
∂
∂x1
(f(x) + λg(x)) = 2ax1 + λ = 0
∂
∂x2
(f(x) + λg(x)) = 2bx2 + λ = 0
∂
∂λ
(f(x) + λg(x)) = x1 + x2 − 1 = 0
The solution is
x1 =
b
a + b
, x2 =
a
a + b
.
19. 1.4. FINDING MINIMA AND MAXIMA 11
Exercises
1. Show that the outer product of two 2-dimensional vectors is a singular matrix.
2. Prove that the eigenvectors or a 2 × 2 symmetric matrix are orthogonal.
3. Differentiate the function
1
(x · a · y)
with respect to y.
4. Verify the following matrix identity in IDL:
(A · B) = B · A .
5. Calculate the eigenvalues and eigenvectors of a non-symmetric matrix with IDL.
6. Plot the function f(x) = x2
1 − x2
2 with IDL. Find its minima and maxima subject to
the constraint g(x) = x2
1 + x2
2 − 1 = 0.
21. Chapter 2
Image Statistics
It is useful to think of image pixel intensities g(x) as realizations of a random vector G(x)
drawn independently from some probability distribution.
2.1 Random variables
A random variable can be used to represent some quantity which changes in an unpredictable
way each time it is observed. If there is a discrete set of M possible events {Ei}, i = 1 . . . M,
associated with some random process, let pi be the probability that the ith event Ei will
occur. If ni represents the number of times Ei occurs in n trials, we expect that pi → ni/n
in the limit n → ∞ and that
M
i=1
pi = 1.
For example, on the throw of a pair of dice,
{Ei} = (1, 1), (1, 2), (2, 1) . . . (6, 6)
and each event is equally probable
pi = 1/36, i = 1 . . . 36.
Formally, a random variable X is a real function on the set of possible events:
X = f(Ei).
If, for example, X is the sum of the points on the dice,
X = f(E1) = 2, X = f(E2) = 3, X = f(E3) = 3, . . . X = f(E36) = 12.
On the basis of the probabilities of the individual events, we can associate a distribution
function P(x) with the random variable X, defined by
P(x) = Pr(X ≤ x).
For the dice example,
P(1) = 0, P(2) = 1/36, P(3) = 1/12, . . . P(12) = 1.
13
22. 14 CHAPTER 2. IMAGE STATISTICS
For continuous random variables, such as the measured radiance at a satellite sensor, the
distribution function is not expressed in terms of discrete probabilities, but rather in terms
of a probability density function p(x), where p(x)dx is the probability that the value of the
random variable X lies in the interval [x, x + dx]. Then
P(x) = Pr(X ≤ x) =
x
−∞
p(t)dt
and, of course,
P(−∞) = 0, P(∞) = 1.
Two random variables X and Y are said to be independent when
Pr(X ≤ x and Y ≤ y) = Pr(X ≤ x, Y ≤ y) = P(x)P(y).
The mean or expected value of a random variable X is written X and is defined in
terms of the probability density function:
X =
∞
−∞
xp(x)dx.
The variance of X, written var(X) is defined as the expected value of the random variable
(X − X )2
, i.e.
var(X) = (X − X )2
.
In terms of the probability density function, it is given by
var(X) =
∞
−∞
(x − X )2
p(x)dx.
Two simple but very useful identities follow from the definition of variance:
var(X) = X2
− X 2
var(aX) = a2
var(X).
(2.1)
2.2 The normal distribution
It is often the case that random variables are well-described by the normal or Gaussian
probability density function
p(x) =
1
√
2πσ
exp(−
1
2σ2
(x − µ)2
).
In that case
X = µ, var(X) = σ2
.
The expected value of pixel intensities
G(x) =
G1(x)
G2(x)
...
GN (x)
,
23. 2.2. THE NORMAL DISTRIBUTION 15
where x denotes the pixel coordinates, i.e. x = (i, j), is estimated by averaging over all of
the pixels in the image,
G(x) ≈
1
cr
c,r
i,j=1
g(i, j),
referred to as the sample mean vector. It is usually assumed to be independent of x, i.e.
G(x) = G .
The covariance between bands k and is defined according to
cov(Gk, G ) = (Gk − Gk )(G − G )
and is estimated again by averaging over the pixels:
cov(Gk, G ) ≈
1
cr
c,r
i,j=1
(gk(i, j) − Gk )(g (i, j) − G ),
which is called the sample covariance. The covariance is also usually assumed to be inde-
pendent of x. The variance for bands k is given by
var(Gk) = cov(Gk, Gk) = (Gk − Gk )2
.
The random vector G is often assumed to be described by a multivariate normal proba-
bility density function p(g), given by
p(g) =
1
(2π)N/2 |Σ|
exp −
1
2
(g − µ) Σ−1
(g − µ) .
We indicate this by writing
G ∼ N(µ, Σ).
The distribution function of the multi-spectral pixels is then completely determined by the
expected value G = µ and by the covariance matrix Σ. In two dimensions, for example,
Σ =
var(G1) cov(G1, G2)
cov(G2, G1) var(G2)
=
σ2
1 σ12
σ21 σ2
2
.
Note that, since cov(Gk, G ) = cov(G , Gk), the covariance matrix is symmetric, Σ = Σ .
The covariance matrix can also be written as an outer product:
Σ = (G − G )(G − G ) .
as can its estimated value:
Σ ≈
1
cr
c,r
i,j=1
(g(i, j) − G )(g(i, j) − G ) .
If G = 0, we can write simply
Σ = GG .
Another useful identity applies to any linear combination a G of the random vector G,
namely
var(a G) = a Σa. (2.2)
24. 16 CHAPTER 2. IMAGE STATISTICS
This is obvious in two dimensions, since we have
var(a G) = cov(a1G1 + a2G2, a1G1 + a2G2)
= a2
1var(G1) + a1a2cov(G1, G2) + a1a2cov(G2, G1) + a2
2var(G2)
= (a1, a2)
var(G1) cov(G1, G2)
cov(G2, G1) var(G2)
a1
a2
.
Variance is always nonnegative and the vector a in (2.2) is arbitrary, so we have
a Σa ≥ 0 for all a.
The covariance matrix is therefore said to be positive semi-definite.
The correlation matrix C is similar to the covariance matrix, except that each matrix
element (i, j) is normalized to var(Gi)var(Gj). In two dimensions
C =
1 ρ12
ρ21 1
=
1 cov(G1,G2)
√
var(G1)var(G2)
cov(G2,G1)
√
var(G1)var(G2)
1
=
1 σ12
σ1σ2
σ21
σ1σ2
1
.
The following ENVI/IDL program calculates and prints out the covariance matrix of a
multispectral image:
envi_select, title=’Choose multispectral image’,fid=fid,dims=dims,pos=pos
if (fid eq -1) then return
num_cols = dims[2]-dims[1]+1
num_rows = dims[4]-dims[3]+1
num_pixels = (num_cols*num_rows)
num_bands = n_elements(pos)
samples=intarr(num_bands,n_elements(num_pixels))
for i=0,num_bands-1 do samples[i,*]=envi_get_data(fid=fid,dims=dims,pos=pos[i])
print, correlate(samples,/covariance,/double)
end
ENVI .GO
111.46663 82.123236 159.58377 133.80637
82.123236 64.532431 124.84815 104.45298
159.58377 124.84815 246.18004 205.63420
133.80637 104.45298 205.63420 192.70367
2.3 A special function
If n is an integer, the factorial of n is defined by
n! = n(n − 1) · · · 1, 1! = 0! = 1.
The generalization of this to non-integers z is the gamma function
Γ(z) =
∞
0
tz−1
e−t
dt.
It has the property
Γ(z + 1) = zΓ(z).
25. 2.4. CONDITIONAL PROBABILITIES AND BAYES THEOREM 17
The factorial is a special case, i.e. for integer n
Γ(n) = n!
A further generalization is the incomplete gamma function
ΓP (a, x) =
1
Γ(a)
x
0
ta−1
s−t
dt.
It has the properties
ΓP (a, 0) = 0, ΓP (a, ∞) = 1.
Here is a plot of ΓP for a = 3 in IDL:
x=findgen(100)/10
envi_plot_dtat,x,igamma(3,x)
Figure 2.1: The incomplete gamma function.
We are interested in this function for the following reason. Suppose that the random
variables Xi, i = 1 . . . n, are independent normally distributed with zero mean and variance
σ2
i . Then the random variable
Z =
n
i=1
Xi
σi
2
has the distribution function
P(z) = Pr(Z ≤ z) = ΓP (n/2, z/2),
and is said to be chi-square distributed with n degrees of freedom.
2.4 Conditional probabilities and Bayes
Theorem
If A and B are two events such that the probability of A andB occurring simultaneously is
P(A, B), then the conditional probability of A occuring given that B has occurred is
P(A | B) =
P(A, B)
P(B)
.
26. 18 CHAPTER 2. IMAGE STATISTICS
Bayes’ Theorem (named after Rev. Thomas Bayes, an 18th century mathematician who
derived a special case) is the basic starting point for inference problems using probability
theory as logic. We will use it in the following form. Let X be a random variable describing
a pixel intensity, and let {Ck | k = 1 . . . M} be a set of possible classes for the pixels. Then
the a posteriori conditional probability for class Ck, given the measured pixel intensity x is
P(Ck|x) =
P(x|Ck)P(Ck)
P(x)
, (2.3)
where
P(Ck) is the prior-probability for class Ck,
P(x|Ck) is the conditional probability of observing the value x, if it belongs to class Ck,
P(x) =
M
k=1 p(x|Ck)p(Ck) is the total probability for x.
2.5 Linear regression
Applying radiometric corrections to digital images often involves fitting a set of m data
points (xi, yi) to a straight line:
y(x) = a + bx + .
Suppose that the measurements yi include a random error with variance σ2
and that the
measurements xi are exact. Define a “goodness of fit” function
χ2
(a, b) =
m
i=1
yi − a − bxi
σ
2
. (2.4)
If the random variable is normally distributed, then we obtain the most likely (i.e. best)
values for a and b by minimizing this function, that is, by solving the equations
∂χ2
∂a
=
∂χ2
∂b
= 0.
The solution is
ˆb =
sxy
s2
xx
, ˆa = ¯y − ˆb¯x, (2.5)
where
sxy =
1
m
m
i=1
(xi − ¯x)(yi − ¯y)
s2
xx =
1
m
m
i=1
(xi − ¯x)2
¯x =
1
m
m
i=1
xi, ¯y =
1
m
m
i=1
yi.
The uncertainties in the estimates ˆa and ˆb are given by
σ2
a =
σ2
x2
i
m x2
i − ( xi)2
σ2
b =
σ2
m
m x2
i − ( xi)2
. (2.6)
27. 2.5. LINEAR REGRESSION 19
If σ2
is not known a priori, then it can be estimated by
ˆσ2
=
1
m − 2
m
i=1
(yi − ˆa − ˆbxi)2
.
Generalized and orthogonal least squares methods are described in Appendix A. A
recusive procedure is described in Appendix C.
28. 20 CHAPTER 2. IMAGE STATISTICS
Exercises
1. Write the multivariate normal probability density function p(g) for the case Σ = σ2
I.
Show that probability density function for a one-dimensional random variable G is a
special case. Prove that G = µ.
2. In the Monty Hall game a contestant is asked to choose between one of three doors.
Behind one of the doors is an automobile as prize for choosing the correct door. After
the contestant has chosen, Monty Hall opens one of the other two doors to show that
the automobile is not there. He then asks the contestant if she wishes to change her
mind and choose the other unopened door. Use Bayes’ theorem to prove that her
correct answer is “yes”.
3. Derive the uncertainty for a in (2.6) from the formula for error propagation
σ2
a =
N
i=1
σ2 ∂f
∂yi
2
.
29. Chapter 3
Transformations
Up until now we have thought of multispectral images as (r × c × N)-dimensional arrays
of measured pixel intensities. In the present chapter we consider other representations of
images which are often useful in image analysis.
3.1 Fourier transforms
Figure 3.1: Fourier series approximation of a sawtooth function. The series was truncated
at k = ±4. The left hand side shows the intensities |ˆx(k)|2
.
A periodic function x(t) with period T,
x(t) = x(t + T)
can always be expressed as the infinite Fourier series
x(t) =
∞
k=−∞
ˆx(k)ei2π(kf)t
, (3.1)
where f = 1/T = ω/2π and eix
= cos x + i sin x. From the orthogonality of the e-functions,
the coefficients ˆx(k) in the expansion are given by
ˆx(k) = f
1/2f
−1/2f
x(t)e−i2π(kf)t
dt. (3.2)
21
30. 22 CHAPTER 3. TRANSFORMATIONS
Figure 3.1 shows an example for the sawtooth function with period T = 1:
x(t) = t, −1/2 ≤ t 1/2.
Parseval’s formula follows directly from (3.2)
k
|ˆx(k)|2
= f
1/2f
−1/2f
(x(t))2
dt.
3.1.1 Discrete Fourier transform
Let g(j) be a discrete sample of the real function g(x) (a row of pixels), sampled c times at
the sampling interval ∆ over a complete period T, i.e.
g(j) = g(x = j∆), j = 0 . . . c − 1.
The corresponding discrete Fourier series is written
g(j) =
1
c
c/2
k=−c/2
ˆg(k)ei2π(kf)(j∆)
, j = 0 . . . c − 1, (3.3)
where the truncation frequency ±c
2 f is the highest frequency component that can be deter-
mined by the sampling. This frequency is called the Nyquist critical frequency and is given
by 1/2∆, so that f is determined by
cf
2
=
1
2∆
or f =
1
c∆
.
(This corresponds to sampling over one complete period: c∆ = T.) Thus (3.3) becomes
g(j) =
1
c
c/2
k=−c/2
ˆg(k)ei2πkj/c
, j = 0 . . . c − 1.
With the observation
ei2π(−c/2)j/c
= e−iπj
= (−1)c
= eiπj
= ei2π(c/2)j/c
,
we can write this as
g(j) =
1
c
c/2−1
k=−c/2
ˆg(k)ei2πkj/c
, j = 0 . . . c − 1,
a set of c equations in the c unknown frequency components ˆg(k). Equivalently,
g(j) =
1
c
c/2−1
k=0
ˆg(k)eπ2πkj/c
+
1
c
−1
k=−c/2
ˆg(k)ei2πkj/c
=
1
c
c/2−1
k=0
ˆg(k)ei2πkj/c
+
1
c
c−1
k =c/2
X(k − c)ei2π(k −c)j/c
=
1
c
c/2−1
k=0
ˆg(k)ei2πkj/c
+
1
c
c−1
k=c/2
ˆg(k − c)ei2πkj/c
.
31. 3.2. WAVELETS 23
Thus we can write
g(j) =
1
c
c−1
k=0
ˆg(k)ei2πkj/c
, j = 0 . . . c − 1, (3.4)
if we interpret ˆg(k) → ˆg(k − c) when k ≥ c/2.
The solution to (3.4) for the complex frequency components ˆg(k) is called the discrete
Fourier transform and is given by
ˆg(k) =
c−1
j=0
g(j)e−i2πkj/c
, k = 0 . . . c − 1. (3.5)
This follows from the following orthogonality property:
c−1
j=0
ei2π(k−k )j/c
= cδk,k . (3.6)
Eq. (3.4) itself is the discrete inverse Fourier transform. The discrete analog of Parsival’s
formula is
c−1
k=0
|ˆg(k)|2
=
1
c
c−1
j=0
g(j)2
. (3.7)
Determining the frequency components in (3.5) would appear to involve, in all, c2
floating
point multiplication operations. The fast Fourier transform (FFT) exploits the structure of
the complex e-functions to reduce this to order c log c, see for example [PFTV86].
3.1.2 Discrete Fourier transform of an image
The discrete Fourier transform is easily generalized to two dimensions for the purpose of
image analysis. Let g(i, j), i, j = 0 . . . c − 1, represent a (quadratic) gray scale image. Its
discrete Fourier transform is
ˆg(k, ) =
c−1
i=0
c−1
j=0
g(i, j)e−i2π(ik+j )/c
(3.8)
and the corresponding inverse transform is
g(i, j) =
1
c2
c−1
k=0
c−1
=0
ˆg(k, )ei2π(ik+j )/c
. (3.9)
3.2 Wavelets
Unlike the Fourier transform, which represents a signal (array of pixel intensities) in terms
of pure frequency functions, the wavelet transform expresses the signal in terms of functions
which are restricted both in terms of frequency and spatial extent. In many applications,
this turns out to be particularly efficient and useful. We’ll see an example of this in Chapter
7, where we discuss image fusion in more detail. The wavelet transform is discussed in
Appendix B.
32. 24 CHAPTER 3. TRANSFORMATIONS
3.3 Principal components
The principal components transformation forms linear combinations of multispectral pixel
intensities which are mutually uncorrelated and which have maximum variance.
We assume without loss of generality that G = 0, so that the covariance matrix of a
multispectral image is is Σ = GG , and look for a linear combination Y = a G with
maximum variance, subject to the normalization condition a a = 1. Since the covariance
of Y is a Σa, this is equivalent to maximizing an unconstrained Lagrange function, see
Section 1.4,
L = a Σa − 2λ(a a − 1).
The maximum of L occurs at that value of a for which ∂L
∂a = 0. Recalling the rules for vector
differentiation,
∂L
∂a
= 2Σa − 2λa = 0
which is the eigenvalue problem
Σa = λa.
Since Σ is real and symmetric, the eigenvectors are orthogonal (and normalized). Denote
them a1 . . . aN for eigenvalues λ1 ≥ . . . ≥ λN . Define the matrix
A = (a1 . . . aN ), AA = I,
and let the the transformed principal component vector be Y = A G with covariance matrix
Σ . Then we have
Σ = YY = A GG A
= A ΣA = Diag(λ1 . . . λN ) =
λ1 0 · · · 0
0 λ2 · · · 0
...
...
...
...
0 0 · · · λN
=: Λ.
The fraction of the total variance in the original multispectral image which is described by
the first i principal components is
λ1 + . . . + λi
λ1 + . . . + λi + . . . + λN
.
If the original multispectral channels are highly correlated, as is usually the case, the first
few principal components will account for a very high percentage of the variance the image.
For example, a color composite of the first 3 principal components of a LANDSAT TM
scene displays essentially all of the information contained in the 6 spectral components in
one single image. Nevertheless, because of the approximation involved in the assumption
of a normal distribution, higher order principal components may also contain significant
information [JRR99].
The principal components transformation can be performed directly from the ENVI main
menu. However the following IDL program illustrates the procedure in detail:
; Principal components analysis
envi_select, title=’Choose multispectral image’, $
33. 3.4. MINIMUM NOISE FRACTION 25
fid=fid, dims=dims,pos=pos
if (fid eq -1) then return
num_cols = dims[2]+1
num_lines = dims[4]+1
num_pixels = (num_cols*num_lines)
num_channels = n_elements(pos)
image=intarr(num_channels,num_pixels)
for i=0,num_channels-1 do begin
temp=envi_get_data(fid=fid,dims=dims,pos=pos[i])
m = mean(temp)
image[i,*]=temp-m
endfor
; calculate the transformation matrix A
sigma = correlate(image,/covariance,/double)
lambda = eigenql(sigma,eigenvectors=A,/double)
print,’Covariance matrix’
print, sigma
print,’Eigenvalues’
print, lambda
print,’Eigenvectors’
print, A
; transform the image
image = image##transpose(A)
; reform to BSQ format
PC_array = bytarr(num_cols,num_lines,num_channels)
for i = 0,num_channels-1 do PC_array[*,*,i] = $
reform(image[i,*],num_cols,num_lines,/overwrite)
; output the result to memory
envi_enter_data, PC_array
end
3.4 Minimum noise fraction
Principal components analysis maximizes variance. This doesn’t always lead to images of
decreasing image quality (i.e. of increasing noise). The MNF transformation minimizes the
noise content rather than maximizing variance, so, if this is the desired criterion, it is to be
preferred over PCA.
Suppose we can represent a gray scale image G with covariance matrix Σ and zero mean
as a sum of uncorrelated signal and noise noise components
G = S + N,
34. 26 CHAPTER 3. TRANSFORMATIONS
both normally distributed, with covariance matrices ΣS and ΣN and zero mean. Then we
have
Σ = GG = (S + N)(S + N) = SS + NN ,
since noise and signal are uncorrelated, i.e. SN = NS = 0. Thus
Σ = ΣS + ΣN . (3.10)
Now let us seek a linear combination a G for which the signal to noise ratio
SNR =
var(a S)
var(a N)
=
a ΣSa
a ΣN a
is maximized. From (3.10) we can write this in the form
SNR =
a Σa
a ΣN a
− 1. (3.11)
Differentiating we get
∂
∂a
SNR =
1
a ΣN a
1
2
Σa −
a Σa
(a ΣN a)2
1
2
ΣN a = 0,
or, equivalently,
(a ΣN a)Σa = (a Σa)ΣN a .
This condition is met when a solves the generalized eigenvalue problem
ΣN a = λΣa. (3.12)
Both ΣN and Σ are symmetric and the latter is also positive definite. Its Cholesky factor-
ization is
Σ = LL ,
where L is a lower triangular matrix, and can be thought of as the “square root” of Σ. Such
an L always exists is Σ is positive definite. With this, we can write (3.12) as
ΣN a = λLL a
or, equivalently,
L−1
ΣN (L )−1
L a = λL a
or, with b = L a and commutivity of inverse and transpose,
[L−1
ΣN (L−1
) ]b = λb,
a standard eigenproblem for a real, symmetric matrix L−1
ΣN (L−1
) .
From (3.11) we see that the SNR for eigenvalue λi is just
SNRi =
ai Σai
ai (λiΣai)
− 1 =
1
λi
− 1.
Thus the eigenvector ai corresponding to the smallest eigenvalue λi will maximize the signal
to noise ratio. Note that (3.12) can be written in the form
ΣN A = ΣAΛ, (3.13)
35. 3.4. MINIMUM NOISE FRACTION 27
where A = (a1 . . . aN ) and Λ = Diag(λ1 . . . λN ).
The MNF transformation is available in the ENVI environment. It is carried out in
two steps which are equivalent to the above. First of all the noise contribution to G is
“whitened”, i.e. the random vector N has covariance matrix I, the identity matrix. Since
ΣN can be assumed to be diagonal anyway (the noise in any band is uncorrelated with the
noise in any other band), we accomplish this by doing a transformation which divides the
components of G by the standard deviations of the noise,
X = Σ
−1/2
N G,
where
Σ
−1/2
N ΣN Σ
−1/2
N = I.
The transformed random vector X thus has covariance matrix
ΣX = Σ
−1/2
N ΣΣ
−1/2
N . (3.14)
Next we do an ordinary principal components transformation on X, i.e.
Y = B X
where
B ΣX B = ΛX, B B = I. (3.15)
The overall transformation is thus
Y = B Σ
−1/2
N G = A G
where A = Σ
−1/2
N B is not an orthogonal transformation. To see that this transformation is
equivalent to solving the generalized eigenvalue problem, consider
ΣN A = ΣN Σ
−1/2
N B
= Σ
1/2
N ΣX BΛ−1
X
= Σ
1/2
N Σ
−1/2
N ΣΣ
−1/2
N BΛ−1
X
= ΣAΛ−1
X .
This is equivalent to (3.13) with
λXi =
1
λi
= SNRi + 1.
Thus an eigenvalue in the second transformation equal to one corresponds to “pure noise”.
Before the transformation can be performed, it is of course necessary to estimate the
noise covariance matrix ΣN . This can be done for example by differencing with respect to
the local mean:
(ΣN )k ≈
1
cr
c,r
i,j
(gk(i, j) − mk(i, j))(g (i, j) − m (i, j))
where mk(i, j) is the local mean of pixels in some neighborhood of (i, j).
36. 28 CHAPTER 3. TRANSFORMATIONS
3.5 Maximum autocorrelation factor (MAF)
Let x represent the coordinates of a pixel within image G, i.e. x = (i, j). We consider the
covariance matrix Γ between the original image, represented by G(x), and the same image
G(x + ∆) shifted by an amount ∆ = (∆x, ∆y) :
Γ(∆) = G(x)G(x + ∆) ,
assumed to be independent of x. Then
Γ(0) = Σ,
and furthermore
Γ(−∆) = G(x)G(x − ∆)
= G(x + ∆)G(x)
= (G(x)G(x + ∆) )
= Γ(∆) .
Now we consider the covariance of projections of the original and shifted images:
cov(a G(x), a G(x + ∆)) = a G(x)G(x + ∆) a
= a Γ(∆)a
= a Γ(−∆)a
=
1
2
a (Γ(∆) + Γ(−∆))a.
(3.16)
Define Σ∆ as the covariance matrix of the difference image G(x) − G(x + ∆), i.e.
Σ∆ = (G(x) − G(x + ∆))(G(x) − G(x + ∆)
= G(x)G(x) + G(x + ∆)G(x + ∆) − G(x)G(x + ∆)
− G(x + ∆)G(x)
= 2Σ − Γ(∆) − Γ(−∆).
Hence Γ(∆) + Γ(−∆) = 2Σ − Σ∆ and we can write (3.16) in the form
cov(a G(x), a G(x + ∆)) = a Σa −
1
2
a Σ∆a.
The correlation of the projections is therefore given by
corr(a G(x), a G(x + ∆)) =
a Σa − 1
2 a Σ∆a
var(a G(x))var(a G(x + ∆))
=
a Σa − 1
2 a Σ∆a
(a Σa)(a Σa)
= 1 −
1
2
a Σ∆a
a Σa
.
(3.17)
We want to determine that vector a which extremalizes this correlation, so we wish to
extremalize the function
R(a) =
a Σ∆a
a Σa
.
37. 3.5. MAXIMUM AUTOCORRELATION FACTOR (MAF) 29
Differentiating,
∂R
∂a
=
1
a Σa
1
2
Σ∆a −
a Σ∆a
(a Σa)2
1
2
Σa = 0
or
(a Σa)Σ∆a = (a Σ∆a)Σa.
This condition is met when a solves the generalized eigenvalue problem
Σ∆a = λΣa, (3.18)
which is seen to have the same form as (3.12). Again both Σ∆ and Σ are symmetric and
the latter is also positive definite and we obtain the standard eigenproblem
[L−1
Σ∆(L−1
) ]b = λb,
for the real, symmetric matrix L−1
Σ∆(L−1
) .
Let the eigenvalues be λ1 ≥ . . . λN and the corresponding (orthogonal) eigenvectors be
bi. We have
0 = bi bj = ai LL aj = ai Σaj, i = j,
and therefore
cov(ai G(x), aj G(x)) = ai Σaj = 0, i = j,
so that the MAF components are orthogonal (uncorrelated). Moreover with equation (2.14)
we have
corr(ai G(x), ai G(x + ∆)) = 1 −
1
2
λi,
and the first MAF component has minimum autocorrelation.
An ENVI plug-in for performing the MAF transformation is given in Ap-
pendix D.5.2.
38. 30 CHAPTER 3. TRANSFORMATIONS
Exercises
1. Show that, for x(t) = sin(2πt) in Eq. (2.2),
ˆx(−1) = −
1
2i
, ˆx(1) =
1
2i
,
and ˆx(k) = 0 otherwise.
2. Calculate the discrete Fourier transform of the sequence 2, 4, 6, 8 from (3.4). You have
to solve four simultaneous equations, the first of which is
2 =
1
4
ˆg(0) + ˆg(1) + ˆg(2) + ˆg(3) .
Verify your result in IDL with the command
print, FFT([2,4,6,8])
39. Chapter 4
Radiometric enhancement
4.1 Lookup tables
Figure 4.1: Contrast enhancement with a lookup table represented as the continuous function
f(x) [JRR99].
Intensity enhancement of an image is easily accomplished by means of lookup tables. For
byte-encoded data, the pixel intensities g are used to index an array
LUT[k], k = 0 . . . 255,
the entries of which also lie between 0 and 255. These entries can be chosen to implement
linear stretching, saturation, histogram equalization, etc. according to
ˆgk(i, j) = LUT[gk(i, j)], 0 ≤ i ≤ r − 1, 0 ≤ j ≤ c − 1.
31
40. 32 CHAPTER 4. RADIOMETRIC ENHANCEMENT
It is also useful to think of the the lookup table as an approximately continuous function
y = f(x).
If hin(x) is the histogram of the original image and hout(y) is the histogram of the image
after transformation through the lookup table, then, since the number of pixels is constant,
hout(y) dy = hin(x) dx,
see Fig.4.1
4.1.1 Histogram equalization
For histogram equalization, we want hout(y) to be constant independent of y. Hence
dy ∼ hin(x) dx
and
y = f(x) ∼
x
0
hin(t)dt.
The lookup table y for histogram equalization is thus proportional to the cumulative sum
of the original histogram.
4.1.2 Histogram matching
Figure 4.2: Steps required for histogram matching [JRR99].
It is often desirable to match the histogram of one image to that of another so as to
make their apparent brightnesses as similar as possible, for example when the two images
41. 4.2. CONVOLUTIONS 33
are combined in a mosaic. We can do this by first equalizing both the input histogram
hin(x) and the reference histogram href (y) with the cumulative lookup tables z = f(x) and
z = g(y), respectively. The required lookup table is then
y = g−1
(z) = g−1
(f(x)).
The necessary steps for implementing this function are illustrated in Fig. 1.5 taken from
[JRR99].
4.2 Convolutions
With the convention
ω = 2πk/c
we can write (3.5) in the form
ˆg(ω) =
c−1
j=0
g(j)e−iωj
. (4.1)
The convolution of g with a filter h = (h(0), h(1), . . .) is defined by
f(j) =
k
h(k)g(j − k) =: h ∗ g, (4.2)
where the sum is over all nonzero elements of the filter h. If the number of nonzero elements
is finite, we speak of a finite impulse response filter (FIR).
Theorem 1 (Convolution theorem) In the frequency domain, convolution is replaced by
multiplication: ˆf(ω) = ˆh(ω)ˆg(ω).
Proof:
ˆf(ω) =
j
f(j)e−iωj
=
j,k
h(k)g(j − k)e−iωj
ˆh(ω)ˆg(ω) =
k
h(k)e−iωk
g( )e−iω
=
k,
h(k)g( )e−iω(k+ )
=
k,j
h(k)g(j − k)e−iωj
= ˆf(ω).
This can of course be generalized to two dimensional images, so that there are three
basic steps involved in image filtering:
1. The image and the convolution filter are transformed from the spatial domain to the
frequency domain using the FFT.
2. The transformed image is multiplied with the frequency filter.
3. The filtered image is transformed back to the spatial domain.
42. 34 CHAPTER 4. RADIOMETRIC ENHANCEMENT
We often distinguish between low-pass and high-pass filters. Low pass filters perform
some sort of averaging. The simplest example is
h = (1/2, 1/2, 0 . . .),
which computes the average of two consecutive pixels. A high-pass filter computes differences
of nearby pixels, e.g.
h = (1/2, −1/2, 0 . . .).
Figure 4.3 shows the Fourier transforms of these two simple filters generated by the the IDL
program
; Hi-Lo pass filters
x = fltarr(64)
x[0]=0.5
x[1]=-0.5
p1 =abs(FFT(x))
x[1]=0.5
p2 =abs(FFT(x))
envi_plot_data,lindgen(64),[[p1],[p2]]
end
Figure 4.3: Low-pass(red) and high-pass (white) filters in the frequency domain. The quan-
tity |ˆh(k)|2
is plotted as a function of k. The highest frequency is at the center of the plots,
k = c/2 = 32 .
4.2.1 Laplacian of Gaussian filter
We shall illustrate image filtering with the so-called Laplacian of Gaussian (LoG) filter,
which will be used in Chapter 6 to implement contour matching for automatic determination
of ground control points. To begin with, consider the gradient operator for a two-dimensional
image:
=
∂
∂x
= i
∂
∂x1
+ j
∂
∂x2
,
43. 4.2. CONVOLUTIONS 35
where i and j are unit vectors in the vertical and horizontal directions, respectively. g(x)
is a vector in the direction of the maximum rate of change of gray scale intensity. Since the
intensity values are discrete, the partial derivatives must be approximated. For example we
can use the Sobel operators:
∂g(x)
∂x1
≈ [g(i − 1, j − 1) + 2g(i, j − 1) + g(i + 1, j − 1)]
− [g(i − 1, j + 1) + 2g(i, j + 1) + g(i + 1, j + 1)] = 2(i, j)
∂g(x)
∂x2
≈ [g(i − 1, j − 1) + 2g(i − 1, j) + g(i − 1, j + 1)]
− [g(i + 1, j − 1) + 2g(i + 1, j) + g(i + 1, j + 1)] = 1(i, j)
which are equivalent to the two-dimensional FIR filters
h1 =
−1 0 1
−2 0 2
−1 0 1
and h2 =
1 2 1
0 0 0
−1 −2 −1
,
respectively. The magnitude of the gradient is
| | = 2
1 + 2
2.
Edge detection can be achieved by calculating the filtered image
f(i, j) = | |(i, j)
and setting an appropriate threshold.
Figure 4.4: Laplacian of Gaussian filter.
44. 36 CHAPTER 4. RADIOMETRIC ENHANCEMENT
Now consider the second derivatives of the image intensities, which can be represented
formally by the Laplacian
2
= · =
∂2
∂x2
1
+
∂2
∂x2
2
.
2
g(x) is a scalar quantity which is zero whenever the gradient is maximum. Therefore
changes in intensity from dark to light or vice versa correspond to sign changes in the
Laplacian and these can also be used for edge detection. The Laplacian can also be ap-
proximated by a FIR filter, however such filters tend to be very sensitive to image noise.
Usually a low-pass Gauss filter is first used to smooth the image before the Laplacian filter
is applied. It is more efficient, however, to calculate the Laplacian of the Gauss function
itself and then use the resulting function to derive a high-pass filter. The Gauss function in
two dimensions is given by
1
2πσ2
exp −
1
2σ2
(x2
1 + x2
2),
where the parameter σ determines its extent. Its Laplacian is
1
2πσ6
(x2
1 + x2
2 − 2σ2
) exp −
1
2σ2
(x2
1 + x2
2)
a plot of which is shown in Fig. 4.4.
The following program illustrates the application of the filter to a gray scale image, see
Fig. 4.5:
pro LoG
sigma = 2.0
filter = fltarr(17,17)
for i=0L,16 do for j=0L,16 do $
filter[i,j] = (1/(2*!pi*sigma^6))*((i-8)^2+(j-8)^2-2*sigma^2) $
*exp(-((i-8)^2+(j-8)^2)/(2*sigma^2))
; output as EPS file
thisDevice =!D.Name
set_plot, ’PS’
Device, Filename=’c:tempLoG.eps’,xsize=4,ysize=4,/inches,/Encapsulated
shade_surf,filter
device,/close_file
set_plot, thisDevice
; read a jpg image
filename = Dialog_Pickfile(Filter=’*.jpg’,/Read)
OK = Query_JPEG(filename,fileinfo)
if not OK then return
xsize = fileinfo.dimensions[0]
ysize = fileinfo.dimensions[1]
window,11,xsize=xsize,ysize=ysize
Read_JPEG,filename,image1
image = bytarr(xsize,ysize)
45. 4.2. CONVOLUTIONS 37
image[*,*] = image1[0,*,*]
tvscl,image
; run the filter
filt = image*0.0
filt[0:16,0:16]=filter[*,*]
image1= float(fft(fft(image)*fft(filt),1))
; get zero-crossings and display
image2 = bytarr(xsize,ysize)
indices = where( (image1*shift(image1,1,0) lt 0) or (image1*shift(image1,0,1) lt 0) )
image2[indices]=255
wset, 11
tv, image2
end
Figure 4.5: Image filtered with the Laplacian of Gaussian filter.
47. Chapter 5
Topographic modelling
Satellite images are two-dimensional representations of the three-dimensional earth surface.
The correct treatment of the third dimension – the elevation – is essential for terrain mod-
elling and accurate georeferencing.
5.1 RST transformation
Transformations of spatial coordinates1
in 3 dimensions which involve only rotations, scaling
and translations can be represented by a 4 × 4 transformation matrix A
v∗
= Av (5.1)
where v is the column vector containing the original coordinates
v = (X, Y, Z, 1)
and v∗
contains the transformed coordinates
v∗
= (X∗
, Y ∗
, Z∗
, 1) .
For example the translation
X∗
= X + X0
Y ∗
= Y + Y0
Z∗
= Z + Z0
corresponds to the transformation matrix
T =
1 0 0 X0
0 1 0 Y0
0 0 1 Z0
0 0 0 1
,
a uniform scaling by 50% to
S =
1/2 0 0 0
0 1/2 0 0
0 0 1/2 0
0 0 0 1
,
1The following treatment closely follows Chapter 2 of Gonzales and Woods [GW02].
39
48. 40 CHAPTER 5. TOPOGRAPHIC MODELLING
and a simple rotation θ about the Z-axis to
Rθ =
cos θ sin θ 0 0
−sinθ cosθ 0 0
0 0 1 0
0 0 0 1
,
etc. The complete RST transformation is then
v∗
= RSTv = Av. (5.2)
The inverse transformation is of course represented by A−1
.
5.2 Imaging transformations
An imaging (or perspective) transformation projects 3D points onto a plane. It is used to
describe the formation of a camera image and, unlike the RST transformation, is non-linear
since it involves division by coordinate values.
Figure 5.1: Basic imaging process, from [GW02].
In Figure 5.1, the camera coordinate system (x, y, x) is aligned with the world coordinate
system, describing the terrain to be imaged. The camera focal length is λ. From sim-
ple geometry we obtain expressions for the image plane coordinates in terms of the world
coordinates:
x =
λX
λ − Z
y =
λY
λ − Z
.
(5.3)
Solving for the X and Y world coordinates:
X =
x
λ
(λ − Z)
Y =
y
λ
(λ − Z).
(5.4)
49. 5.3. CAMERA MODELS AND RFM APPROXIMATIONS 41
Thus, in order to extract the geographical coordinates (X, Y ) of a point on the earth’s
surface from its image coordinates, we require knowledge of the elevation Z. Correcting for
the elevation in this way constitutes the process of orthorectification.
5.3 Camera models and RFM approximations
Equation (5.3) is overly simplified, as it assumes that the origin of world and image coordi-
nates coincide. In order to apply it, one has first to transform the image coordinate system
from the satellite to the world coordinate system. This is done in a straightforward way
with the rotation and translation transformations introduced in Section 5.1. However it
requires accurate knowledge of the height and orientation of the satellite imaging system at
the time of the image acquisition (or, more exactly, during the acquisition, since the latter
is normally not instantaneous). The resulting non-linear equations that relate image and
world coordinates are what constitute the camera or sensor model for that particular image.
Direct use of the camera model for image processing is complicated as it requires ex-
tremely exact, sometimes proprietary information about the sensor system and its orbit.
An alternative exists if the image provider also supplies a so-called rational function model
(RFM) which approximates the camera model for each acquisition as a ratio of rational
polynomials, see e.g. [TH01]. Such RFMs have the form
r = f(X , Y , Z ) =
a(X , Y , Z )
b(X , Y , Z )
c = g(X , Y , Z ) =
c(X , Y , Z )
d(X , Y , Z )
(5.5)
where c and r are the column and row (XY) coordinates in the image plane relative to an
origin (c0, r0) and scaled by a factor cs resp. rs:
c =
c − c0
cs
, r =
r − r0
rs
.
Similarly X , Y and Z are relative, scaled world coordinates:
X =
X − X0
Xs
, Y =
Y − Y0
Ys
, Z =
Z − Z0
Zs
.
The polynomials a, b, c and d are typically to third order in the world coordinates, e.g.
a(X, Y, Z) = a0 + a1X + a2Y + a3Z + a4XY + a5XZ + a6Y Z + a7X2
+ a8Y 2
+ a9Z2
+ a10XY Z + a11X3
+ a12XY 2
+ a13XZ2
+ a14X2
Y + a15Y 3
+ a16Y Z2
+ a17X2
Z + a18Y 2
Z + a19Z3
The advantage of using ratios of polynomials is that these are less subject to interpolation
error.
For a given acquisition the provider fits the RFM to his camera model using a three-
dimensional grid of points covering the image and world spaces with a least squares fitting
procedure. The RFM is capable of representing the camera model extremely well and can
be used as a replacement for it. Both Space Imaging and Digital Globe provide RFMs with
their high resolution IKONOS and QuickBird imagery. Below is a sample Quickbird RFM
file giving the origins, scaling factors and polynomial coefficients needed in Eq. (5.5).
52. 44 CHAPTER 5. TOPOGRAPHIC MODELLING
END_GROUP = IMAGE
END;
To illustrate a simple use of the RFM data, consider a vertical structure in a high-
resolution image, such as a chimney or building fassade. Suppose we determine the image
coordinates of the bottom and top of the structure to be (rb, cb) and (rt, ct), respectively.
Then from 5.5
rb = f(X, Y, Zb)
cb = g(X, Y, Zb)
rt = f(X, Y, Zt)
ct = g(X, Y, Zt),
(5.6)
since the (X, Y ) coordinates must be the same. This would appear to constitute a set of
four equations in four unknowns X, Y , Zb and Zt, however the solution is unstable because
of the close similarity of Zt to Zb. Nevertheless the object height Zt − Zb can be obtained
by the following procedure:
1. Get (rb, cb) and (rt, ct) from the image.
2. Solve first two equations in (5.6) (e.g. with Newton’s method) for X and Y with Zb
set equal to the average elevation in the scene if no DEM is available, otherwise to the
true elevation.
3. For a spanning range of Zt values, calculate (rt, ct) from the second two equations in
(5.6) and choose for Zt the value of Zt which gives closest agreement to the values
read in.
Quite generally, the RFM can approximate the camera model very well and can be used
as an alternative for providing end users with the necessary information to perform their
own photogrammetric processing. An ENVI plug-in for object height determination
from RFM data is given in Appendix D.2.1.
5.4 Stereo imaging, elevation models and
orthorectification
The missing elevation information Z in (5.3) or in (5.5) can be obtained with stereoscopic
imaging techniques. Figure 5.2 shows two cameras viewing the same world point w from
two positions. The separation of the lens centers is the baseline. The objective is to find
the coordinates (X, Y, Z) of w if its image points have coordinates (x1, y1) and (x2, y2). We
assume that the cameras are identical and that their image coordinate systems are perfectly
aligned, differing only in the location of their origins. The Z coordinate of w is the same for
both coordinate systems.
In Figure 5.3 the first camera is brought into coincidence with the world coordinate
system. Then from (5.4),
X1 =
x1
λ
(λ − Z).
Alternatively, if the second camera is brought to the origin of the world coordinate system,
X2 =
x2
λ
(λ − Z).
53. 5.4. STEREO IMAGING, ELEVATION MODELS AND ORTHORECTIFICATION 45
Figure 5.2: The stereo imaging process, from [GW02].
Figure 5.3: Top view of Figure 5.2, from [GW02].
54. 46 CHAPTER 5. TOPOGRAPHIC MODELLING
But, from the figures,
X2 = X1 + B,
where B is the baseline. We have from the above three equations:
Z = λ −
λB
x2 − x1
. (5.7)
Thus if the displacement of the image coordinates of the point w, namely x2 − x1 can be
determined, the Z coordinate can be calculated. The task is then to find two correspond-
ing points in different images of the same scene. This is usually accomplished by spatial
correlation techniques and is closely related to the problem of image-to-image registration
discussed in the next chapter.
Figure 5.4: ASTER stereo acquisition geometry.
Because the stereo image must be correlated, best results are obtained if they are acquired
within a very short time of each other, preferably “along track” if a single platform is used,
see Figure 5.4. This figure shows the orientation and imaging geometry of the VNIR 3N and
3B cameras on the ASTER platform for acquiring a stereo full scene. The satellite travels at
55. 5.4. STEREO IMAGING, ELEVATION MODELS AND ORTHORECTIFICATION 47
a speed of 6.7 km/sec at a height of 705 km. A 60 × 60 km2
full scene is scanned in 9 seconds.
55 seconds later the same scene is scanned by the back-looking camera, corresponding to a
baseline of 370 km. The along-track geometry means that the stereo pair is unipolar, that
is, the displacements due to viewing angle are only along the y axis in the imaging plane.
Therefore the spatial correlation algorithm used to match points can be one dimensional. If
carried out on a pixel for pixel basis, one obtains a digital elevation model (DEM).
Figure 5.5: ASTER 3N nadir camera image.
Figure 5.6: ASTER 3B back-looking camera image.
As an example, Figures 5.5 and 5.6 show an ASTER stereo pair. Both images have been
rotated so as to make them unipolar.
56. 48 CHAPTER 5. TOPOGRAPHIC MODELLING
The following IDL program calculates a very rudimentary DEM:
pro test_correl_images
height = 705.0
base = 370.0
pixel_size = 15.0
envi_select, title=’Choose 1st image’, fid=fid1, dims=dims1, pos=pos1, /band_only
envi_select, title=’Choose 2nd image’, fid=fid2, dims=dims2, pos=pos2, /band_only
im1 = envi_get_data(fid=fid1,dims=dims1,pos=pos1)
im2 = envi_get_data(fid=fid2,dims=dims2,pos=pos2)
n_cols = dims1[2]-dims1[1]+1
n_rows = dims1[4]-dims1[3]+1
parallax = fltarr(n_cols,n_rows)
progressbar = Obj_New(’progressbar’, Color=’blue’, Text=’0’,$
title=’Cross correlation, column ...’,xsize=250,ysize=20)
progressbar-start
for i=7L,n_cols-8 do begin
if progressbar-CheckCancel() then begin
envi_enter_data,pixel_size*parallax*(height/base)
progressbar-Destroy
return
endif
progressbar-Update,(i*100)/n_cols,text=strtrim(i,2)
for j=25L,n_rows-26 do begin
cim = correl_images(im1[i-5:i+5,j-5:j+5],im2[i-7:i+7,j-25:j+25], $
xoffset_b=0,yoffset_b=-20,xshift=0,yshift=20)
corrmat_analyze,cim,xoff,yoff,m,e,p
parallax[i,j] = yoff (-5.0)
endfor
endfor
progressbar-destroy
envi_enter_data,pixel_size*parallax*(height/base)
end
This program makes use of the routines correl images and corrmat analyze from the IDL
Astronomy User’s Library2
to calculate the cross-correlation of the two images. For each
pixel in the nadir image an 11 × 11 window is moved along an 11 × 51 window in the back-
looking image centered at the same position. The point of maximum correlation defines the
parallax or displacement p. This is related to the relative elevation e of the pixel according
to
e =
h
b
p × 15m,
where h is the height of the sensor and b is the baseline, see Figure 5.7.
Figure 5.8 shows the result. Clearly there are many problems due to the correlation
errors, however the relative elevations are approximately correct when compared to the
DEM determined with the ENVI commercial add-on AsterDTM, see Figure 5.9.
2www.astro.washington.edu/deutsch/idl/htmlhelp/index.html
57. 5.4. STEREO IMAGING, ELEVATION MODELS AND ORTHORECTIFICATION 49
'
b
h
e
p
satellite motion
ground
nadir cameraback camera
Figure 5.7: Relating parallax p to elevation e by similar triangles: e/p = (h − e)/b ≈ h/b.
Figure 5.8: A rudimentary DEM.
58. 50 CHAPTER 5. TOPOGRAPHIC MODELLING
Figure 5.9: DEM generated with the commercial product AsterTDM.
Either the complete camera model or an RFM can be used, but usually neither is sufficient
for an absolute DEM relative to mean sea level. Most often additional ground reference
points within the image whose elevations are known are also required for absolute calibration.
The orthorectification of the image is carried out on the basis of a suitable DEM and
consists of projecting the (X, Y, Z) coordinates of each pixel onto the (X, Y ) coordinates of
a given map projection.
5.5 Slope and aspect
Terrain analysis involves the processing of elevation data. Specifically we consider here
the generation of slope images, which give the steepness of the terrain at each pixel, and
aspect images, which give the prevailing direction relative to north of a vector normal to the
landscape at each pixel.
A 3×3 pixel window can be used to determine both slope and aspect, see Figure 5.10.
Define
∆x1 = c − a ∆y1 = a − g
∆x2 = f − d ∆y2 = b − h
∆x3 = i − g ∆y3 = c − i
and
∆x = (∆x1 + ∆x2 + ∆x3)/(3xs)
∆y = (∆y1 + ∆y2 + ∆y3)/(3xs,
where xs, ys give the pixel dimensions in meters. Then the slope in % at the central pixel
position is given by
s =
(∆x)2 + (∆y)2
2
× 100
whereas the aspect in radians measured clockwise from north is
θ = tan−1 ∆x
∆y
.
59. 5.6. ILLUMINATION CORRECTION 51
a b c
d e f
g h i
Figure 5.10: Pixel elevations in an 8-neighborhood. The letters represent elevations.
Slope/aspect determinations from a DEM are available in the ENVI main menu under
Topographic/Topographic Modelling.
5.6 Illumination correction
Figure 5.11: Angles involved in computation of local solar elevation, taken from [RCSA03].
Topographic modelling can be used to correct images for the effects of local solar illu-
mination, which depends not only upon the sun’s position (elevation and azimuth) but also
upon the local slope and aspect of the terrain being illuminated. Figure 5.11 shows the
angles involved [RCSA03]. Solar elevation is θi, solar azimuth is φa, θp is the slope and φ0
is the aspect. The quantity to be calculated is the local solar elevation γi which determines
60. 52 CHAPTER 5. TOPOGRAPHIC MODELLING
the local irradiance. From trigonometry we have
cos γi = cos θp cos θi + sin θp sin θi cos(φa − φ0). (5.8)
An example of a cos γi image in hilly terrain is shown in Figure 5.12.
Figure 5.12: Cosine of local solar illumination angle stretched across a DEM.
Let ρT represent the reflectance of the inclined surface in Figure 5.11. Then for a
Lambertian surface, i.e. a surface which scatters reflected radiation uniformly in al directions,
the reflectance of the corresponding horizontal surface ρH would be
ρH = ρT
cos θi
cos γi
. (5.9)
The Lambertian assumption is in general not correct, the actual reflectance being de-
scribed by a complicated bidirectional reflectance distribution function (BRDF). An empiri-
cal appraoch which gives a better approximation to the BRDF is the C-correction [TGG82].
Let m and b be the slope and intercept of a regression line for reflectance vs. cos γi for a
particular image band. Then instead of (5.9) one uses
ρH = ρT
cosθi + b/m
cos γi + b/m
. (5.10)
An ENVI plug-in for illumination correction with the C-correction approxi-
mation is given in Appendix D.2.2.
61. Chapter 6
Image Registration
Image registration, either to another image or to a map, is a fundamental task in image
processing. It is required for georeferencing, stereo imaging, accurate change detection, or
any kind of multitemporal image analysis.
Image-to-image registration methods can be divided into roughly four classes [RC96]:
1. algorithms that use pixel values directly, i.e. correlation methods
2. frequency- or wavelet-domain methods that use e.g. the fast fourier transform(FFT)
3. feature-based methods that use low-level features such as edges and corners
4. algorithms that use high level features and the relations between them, e.g. object-
oriented methods
We consider examples of frequency-domain and feature-based methods here.
6.1 Frequency domain registration
Consider two N × N gray scale images g1(i , j ) and g2(i, j), where g2 is offset relative to g1
by an integer number of pixels:
g2(i, j) = g1(i , j ) = g1(i − i0, j − j0), i0, j0 N.
Taking the Fourier transform we have
ˆg2(k, l) =
ij
g1(i − i0, j − j0)e−i2π(ik+jl)/N
,
or with a change of indices to i j ,
ˆg2(k, l) =
i j
g1(i , j )e−i2π(i k+j l)/N
e−i2π(i0k+j0l)/N
= ˆg1(k, l)e−i2π(i0k+j0l)/N
.
(This is referred to as the Fourier translation property.) Therefore we can write
ˆg2(k, l)ˆg∗
1(k, l)
|ˆg2(k, l)ˆg∗
1(k, l)|
= e−i2π(i0k+j0l)/N
, (6.1)
53
62. 54 CHAPTER 6. IMAGE REGISTRATION
Figure 6.1: Phase correlation of two identical images shifted by 10 pixels.
where ˆg∗
1 is the complex conjugate of ˆg1. The inverse transform of the right hand side
exhibits a Dirac delta function (spike) at the coordinates (i0, j0). Thus if two otherwise
identical images are offset by an integer number of pixels, the offset can be found by taking
their Fourier transforms, computing the ratio on the left hand side of (6.1) (the so-called
cross-power spectrum) and then taking the inverse transform of the result. The position of
the maximum value in the inverse transform gives the values of i0 and j0. The following
IDL program illustrates the procedure, see Fig. 6.1
; Image matching by phase correlation
; read a bitmap image and cut out two 512x512 pixel arrays
filename = Dialog_Pickfile(Filter=’*.jpg’,/Read)
if filename eq ’’ then print, ’cancelled’ else begin
Read_JPeG,filename,image
g1 = image[0,10:521,10:521]
g2 = image[0,0:511,0:511]
; perform Fourier transforms
f1 = fft(g1, /double)
f2 = fft(g2, /double)
; Determine the offset
g = fft( f2*conj(f1)/abs(f1*conj(f1)), /inverse, /double )
63. 6.2. FEATURE MATCHING 55
pos = where(g eq max(g))
print, ’Offset = ’ + strtrim(pos mod 512) + strtrim(pos/512)
; output as EPS file
thisDevice =!D.Name
set_plot, ’PS’
Device, Filename=’c:tempphasecorr.eps’,xsize=4,ysize=4,/inches,/Encapsulated
shade_surf,g[0,0:50,0:50]
device,/close_file
set_plot, thisDevice
endelse
end
Images which differ not only by an offset but also by a rigid rotation and change of scale
can in principle be registered similarly, see [RC96].
6.2 Feature matching
A tedious task associated with image-image registration using low level image features is
the setting of ground control points (GCPs) since, in general, it is necessary to resort to
the manual entry. However various techniques for automatic determination of GCPs have
been suggested in the literature. We will discuss one such method, namely contour matching
[LMM95]. This technique has been found to function reliably in bitemporal scenes in which
vegetation changes do not dominate. It can of course be augmented (or replaced) by other
automatic methods or by manual determination. The procedures involved in image-image
registration using contour matching are shown in Fig. 6.2 [LMM95].
LoG
Zero Crossing
Edge Strength
Contour
Finder
Chain Code
Encoder
Closed Contour
Matching
Consistency
Check
Warping
E
E
E
E
E
E
cc
'''
Image 1
Image 2
Image 2
(registered)
Figure 6.2: Image-image registration with contour matching.
64. 56 CHAPTER 6. IMAGE REGISTRATION
6.2.1 Contour detection
The first step involves the application of a Laplacian of Gaussian filter to both images. After
determining the contours by examining zero-crossings of the LoG-filtered image, the contour
strengths are encoded in the pixel intensities. Strengths are taken to be proportional to the
magnitude of the gradient at the zero-crossing.
6.2.2 Closed contours
In the next step, all closed contours with strengths above some given threshold are deter-
mined by tracing the contours. Pixels which have been visited during tracing are set to zero
so that they will not be visited again.
6.2.3 Chain codes
For subsequent matching purposes, all significant closed contours found in the preceding
step are chain encoded. Any digital curve can be represented by an integer sequence
{a1, a2 . . . ai . . .}, ai ∈ {0, 1, 2, 3, 4, 5, 6, 7}, depending on the relative position of the current
pixel with respect to the previous pixel in the curve. This simple code has the drawback
that some contours produce wrap around. For example the line in the direction −22.5o
has
the chain code {707070 . . .}. Li et al. [LMM95] suggest the smoothing operation:
{a1a2 . . . an} → {b1b2 . . . bn},
where b1 = a1 and bi = qi, qi is an integer satisfying (qi−ai) mod 8 = 0 and |qi−bi−1| → min,
i = 2, 3 . . . n.
They also suggest the applying the Gaussian smoothing filter {0.1, 0.2, 0.4, 0.2, 0.1} to the
result. Two chain codes can be compared by “sliding” one over the other and determining
the maximum correlation between them.
6.2.4 Invariant moments
The closed contours are first matched according to their invariant moments. These are
defined as follows, see [Hab95, GW02]. Let the set C denote the set of pixels defining a
contour, with |C| = n, that is, n is the number of pixels on the contour. The moment of
order p, q of the contour is defined as
mpq =
i,j∈C
jp
iq
. (6.2)
Note that n = m00. The center of gravity xc, yc of the contour is thus
xc =
m10
m00
, yc =
m01
m00
.
The centralized moments are then given by
µpq =
i,j∈C
(j − xc)p
(i − yc)q
, (6.3)
65. 6.2. FEATURE MATCHING 57
and the normalized centralized moments by
ηpq =
1
µ
(p+q)/2+1
00
µpq. (6.4)
For example,
η20 =
1
µ2
00
µ20 =
1
n2
i,j∈C
(j − yc)2
.
The normalized centralized moments are, apart from effects of digital quantization, invariant
under scale changes and translations of the contours.
Finally, we can define moments which are also invariant under rotations, see [Hu62]. The
first two such invariant moments are
h1 = η20 + η02
h2 = (η20 − η02)2
+ 4η2
11.
(6.5)
For example, consider a general rotation of the coordinate axes with origin at the center of
gravity of a contour:
j
i
=
cos θ sin θ
− sin θ cos θ
j
i
= A
j
i
.
The first invariant moment in the rotated coordinate system is
h1 =
1
n2
i ,j ∈C
(j
2
+ i
2
) =
1
n2
i ,j ∈C
(j , i )
j
i
=
1
n2
i,j∈C
(j, i)A A
j
i
=
1
n2
i,j∈C
(j2
+ i2
),
since A A = I.
6.2.5 Contour matching
Each significant contour in one image is first matched with contours in the second image
according to their invariant moments h1, h2. This is done by setting a threshold on the
allowed differences, for instance 1 standard deviation. If one or more matches is found, the
best candidate for a GCP pair is then chosen to be that matched contour in the second
image for which the chain code correlation with the contour in the first image is maximum.
If the maximum correlation is less that some threshold, e.g. 0.9, then no match is found.
The actual GCP coordinates are taken to be the centers of gravity of the matched contours.
6.2.6 Consistency check
The contour matching procedure invariably generates false GCP pairs, so a further process-
ing step is required. In [LMM95] use is made of the fact that distances are preserved under
a rigid transformation. Let A1A2 represent the distance between two points A1 and A2 in
66. 58 CHAPTER 6. IMAGE REGISTRATION
an image. For two sets of m matched contour centers {Ai} and {Bi} in image 1 and 2, the
ratios
AiAj/BiBj, i = 1 . . . m, j = i + 1 . . . m,
are calculated. These should form a cluster, so that pairs scattered away from the cluster
center can be rejected as false matches.
An ENVI plug-in for GCP determination via contour matching is given in
Appendix D.3.
6.3 Re-sampling and warping
We represent with (x, y) the coordinates of a point in image 1 and the corresponding point
in image 2 with (u, v). A second order polynomial map of image 2 to image 1, for example,
is given by
u = a0 + a1x + a2y + a3xy + a4x2
+ a5y2
v = b0 + b1x + b2y + b3xy + b4x2
+ b5y2
.
Since there are 12 unknown coefficients, we require at least 6 GCP pairs to determine the
map (each pair generates 2 equations). If more than 6 pairs are available, the coefficients can
be found by least squares fitting. This has the advantage that an RMS error for the mapping
can be estimated. Similar considerations apply for lower or higher order polynomial maps.
Having determined the map coefficients, image 2 can be registered to image 1 by re-
sampling. Nearest neighbor resampling simply chooses the actual pixel in image 2 that has
its center nearest the calculated coordinates (u, v) and transfers it to location (x, y). This
is the preferred technique for classification or change detection, since the registered image
consists of the original pixel brightnesses, simply rearranged in position to give a correct
image geometry. Other commonly used resampling methods are bilinear interpolation and
cubic convolution interpolation, see [JRR99] for details. These methods mix the spectral
intensities of neighboring pixels.
67. 6.3. RE-SAMPLING AND WARPING 59
Exercises
1. We can approximate the centralized moments (6.3) of a contour by the integral
µpq = (x − xx)p
(y − yc)q
f(x, y)dxdy,
where the integration is over the whole image and where f(x, y) = 1 if the point
(x, y) lies on the contour and f(x, y) = 0 otherwise. Use this approximation to prove
that the normalized centralized moments ηpq given in (3.4) are invariant under scaling
transformations of the form
x
y
=
α 0
0 α
·
x
y
.
69. Chapter 7
Image Sharpening
The change detection and classification algorithms that we will meet in the next chapters
exploit of course not only the spatial but also the spectral information of satellite imagery.
Many common platforms (Landsat 7 TM, IKONOS, SPOT, QuickBird) offer panchromatic
images with higher ground resolution than that of the spectral channels. Application of mul-
tispectral change detection or classification methods is therefore restricted to the lower res-
olution. Conventional image fusion techniques, such as the well-known HSV-transformation
can be used to sharpen the spectral components, however the effect of mixing-in of the
panchromatic image is often to “dilute” the spectral resolution. Another disadvantage of
the HSV transformation is that one is restricted to using three of the available spectral
channels. In the following we will outline the HSV method and then consider alternative
fusion techniques.
7.1 HSV fusion
In computers with 24-bit graphics (true color), any three channels of a multispectral image
can be displayed with 8 bits for each of the additive primary colors red, green and blue. The
monitor displays this as an RGB color composite image which, depending on the choice of
image channels and their relative intensities, may or may not appear to be natural. There
are 224
≈ 16 million colors possible.
Another means of color definition is in terms of hue, saturation and value (HSV). Value
(or intensity) can be thought of as an axis equidistant from the three orthogonal primary
color axes. Hue refers to the actual color and is defined as an angle on a circle perpendicular
to the value axis. Saturation is the “amount” of color present and is represented by the
radius of the circle described by the hue,
A commonly used method for fusion of two images (for example a lower resolution multi-
spectral image with a higher resolution panchromatic image) is to transform the first image
from RGB to HSV space, replace the V component with the grayscale values of the second
image after performing a radiometric normalization, and then transform back to RGB space.
The forward transformation begins by rotating the RGB coordinate axes into the diagonal
61
70. 62 CHAPTER 7. IMAGE SHARPENING
axis of the RGB color cube. The coordinates in the new reference system are given by
m1
m2
i1
=
2/
√
6 −1/
√
6 −1/
√
6
0 1/
√
2 −1/
√
2
1/
√
3 1/
√
3 1/
√
3
·
R
G
B
.
Then the the rectangular coordinates (m1, m2, i1) are transformed into the cylindrical HSV
coordinates:
H = arctan(m1/m2), S = m2
1 + m2
2, I =
√
3 i1.
The following IDL code illustrates the necessary steps for HSV fusion making use of ENVI
batch procedures. These are also invoked directly from the ENVI main menu.
pro HSVFusion, event
; get MS image
envi_select, title=’Select low resolution three-band input file’, $
fid=fid1, dims=dims1, pos=pos1
if (fid1 eq -1) or (n_elements(pos1) ne 3) then return
; get PAN image
envi_select, title=’Select panchromatic image’, $
fid=fid2, pos=pos2, dims=dims2, /band_only
if (fid2 eq -1) then return
envi_check_save, /transform
; linear stretch the images and convert to byte format
envi_doit,’stretch_doit’, fid=fid1, dims=dims1, pos=pos1, method=1, $
r_fid=r_fid1, out_min=0, out_max=255, $
range_by=0, i_min=0, i_max=100, out_dt=1, out_name=’c:temphsv_temp’
envi_doit,’stretch_doit’, fid=fid2, dims=dims2, pos=pos2, method=1, $
r_fid=r_fid2, out_min=0, out_max=255, $
range_by=0, i_min=0, i_max=100, out_dt=1, /in_memory
envi_file_query, r_fid2, ns=f_ns, nl=f_nl
f_dims = [-1l, 0, f_ns-1, 0, f_nl-1]
; HSV sharpening
envi_doit, ’sharpen_doit’, $
fid=[r_fid1,r_fid1,r_fid1], pos=[0,1,2], f_fid=r_fid2, $
f_dims=f_dims, f_pos=[0], method=0, interp=0, /in_memory
; remove temporary files from ENVI
envi_file_mng, id=r_fid1, /remove, /delete
envi_file_mng, id=r_fid2, /remove
end
71. 7.2. BROVEY FUSION 63
7.2 Brovey fusion
In its simplest form this method multiplies each re-sampled multispectral pixel by the ratio
of the corresponding panchromatic pixel intensity to the sum of all of the multispectral
intensities. The corrected pixel intensities ¯gk(i, j) in the kth fused multispectral channel are
given by
¯gk(i, j) = gk(i, j) ·
gp(i, j)
k gk (i, j)
, (7.1)
where gk(i, j) is the (re-sampled) pixel intensity in the kth channel and gp(i, j) is the corre-
sponding pixel intensity in the panchromatic image. (The ENVI-environment offers Brovey
fusion in its main menu.) This technique assumes that the spectral range spanned by the
panchromatic image is essentially the same as that covered by the multispectral channels.
This is seldom the case. Moreover, to avoid bias, the intensities used should be the radiances
at the satellite sensors, implying use of the sensors’ calibration.
7.3 PCA fusion
Panchromatic sharpening using principal components analysis (PCA) is similar to the HSV
method. After the PCA transformation, the first principal component is replaced by the
panchromatic image, again after radiometric normalization, see Figure 7.1.
Figure 7.1: Panchromatic fusion with the principal components transformation.
Image sharpening using PCA and the closely related Gram-Schmidt transformation is
available from the ENVI main menu.
72. 64 CHAPTER 7. IMAGE SHARPENING
7.4 Wavelet fusion
Wavelets provide an efficient means of representing high and low frequency components of
multispectral images and can be used to perform image sharpening. Two examples are given
here.
7.4.1 Discrete wavelet transform
The discrete wavelet transform (DWT) of a two-dimensional image is shown in Appendix
B to be equivalent to an iterative application of the high-low-pass filter bank illustrated in
Figure 7.2
H
G
H
H
G
G
78. ↓
E E E
E
E
E
E E
E
E
E
E
E
E
E
E
gk(i, j)
gk+1(i, j)
CH
k+1(i, j)
CV
k+1(i, j)
CD
k+1(i, j)
Columns Rows
Figure 7.2: Wavelet filter bank. H is a low-pass and G a high-pass filter derived from the
coefficients of the wavelet transformation. The symbol ↓ indicates downsampling by a factor
of 2. The original image gk(i, j) can be reconstructed by inverting the filter.
A single application of the filter corresponding to the Daubechies D4 wavelet to a satellite
image g1(i, j) (1m resolution) is shown in Figure B.12. The high frequency information
(wavelet coefficients) is stored in the arrays CH
2 , CV
2 and CD
2 and displayed in the upper right,
lower left and lower right quadrants, respectively. The original image with its resolution
degraded by a factor two, g2(i, j), is in the upper left quadrant. Applying the filter bank
iteratively to the upper left quadrant yields a further reduction by a factor of 2.
The fusion procedure for IKONOS or QuickBird imagery for instance, in which the
resolutions of panchromatic and the 4 multispectral components differ exactly by a factor
of 4, is then as follows: Both the degraded panchromatic image and the four multispectral
images are compressed once again (e.g. to 8m resolution in the case of IKONOS) and the high
frequency components Cz
4 , z = H, V, D, are sampled to estimate the correction coefficients
az
= σz
ms/σz
pan
bz
= mz
ms − az
mz
pan,
(7.2)
where mz
and σz
denote mean and standard deviation, respectively. These coefficients are
then used to normalize the wavelet coefficients for the panchromatic image to those of the
multispectral image:
Cz
i (i, j) → az
Cz
i (i, j) + bz
, z = H, V, D, i = 2, 3. (7.3)
79. 7.4. WAVELET FUSION 65
The degraded panchromatic image g3(i, j) is then replaced by the each of the four multispec-
tral images and the normalized wavelet coefficients are used to reconstruct the original 1m
resolution. We thus obtain what would be seen if the multispectral sensors had the resolution
of the panchromatic sensor [RW00].
An ENVI plug-in for panchromatic sharpening with the DWT is given in
Appendix D.4.1.
7.4.2 `A trous filtering
The radiometric fidelity obtained with the discrete wavelet transform is excellent, as will be
shown in the next section. However the lack of translational invariance of the DWT often
leads to spatial artifacts (blurring, shadowing, staircase effect) in the sharpened product.
This is illustrated in the following program, in which an image is transformed once with the
DWT and the low-pass quadrant shifted by one pixel relative to the high-pass quadrants
(i.e. the wavelet coefficients). After inverting the transformation, serious degradation is
apparent, see Figure 7.3.
pro translate_wavelet
; get an image band
envi_select, title=’Select input file’, $
fid=fid, dims=dims, pos=pos, /band_only
if fid eq -1 then return
; create a DWT object
aDWT = Obj_New(’DWT’,envi_get_data(fid=fid,dims=dims,pos=pos))
; compress
aDWT-compress
; shift the compressed portion supressing phase correlation match
aDWT-inject,shift(aDWT-Get_Quadrant(0),[1,1]),pc=0
; restore
aDWT-expand
; return result to ENVI
envi_enter_data, aDWT-get_image()
end
As an alternative to the DWT, the `a trous wavelet transform (ATWT) has been proposed
for image sharpening [AABG02]. The ATWT is a multiresolution decomposition defined
formally by a low-pass filter H = {h(0), h(1), . . .} and a high-pass filter G = δ − H, where
δ denotes an all-pass filter. Thus the high frequency part is just the difference between the
original image and low-pass filtered image. Not surprisingly, this transformation does not
allow perfect reconstruction if the output is downsampled. Therefore downsampling is not
performed at all. Rather, at the kth iteration of the low-pass filter, 2k−1
zeroes are inserted
between the elements of H. This means that every other pixel is interpolated on the first
iteration:
H = {h(0), 0, h(1), 0, . . .},
while on the second iteration
H = {h(0), 0, 0, h(1), 0, 0, . . .}
etc. (hence the name `a trous = with holes). The low-pass filter is usually chosen to be
symmetric (unlike the Daubechies wavelet filters for example). The prototype filter chosen
80. 66 CHAPTER 7. IMAGE SHARPENING
here is the cubic B-spline filter
H = {1/16, 1/4, 3/8, 1/4, 1/16}.
The transformation is highly redundant and requires considerably more computer storage
to implement. However when used for image sharpening it is much less sensitive to mis-
alignment between the multispectral and panchromatic images.
Figure 7.3: Artifacts due to lack of translational invariance of the DWT.
Figure 7.4 outlines the scheme implemented in the ENVI plug-in for ATWT panchromatic
sharpening. The MS band is nearest-neighbor upsampled by a factor of 2 to match the
dimensions of the high resolution band. The `a trous transformation is applied to both bands
(columns and rows are filtered with the upsampled cubic spline filter, with the difference
determining the high-pass result). The high frequency component of the pan image is
normalized to that of the MS image in the same way as for DWT sharpening, equations
(7.2) and (7.3). Then the low frequency pan component is replaced by the filtered MS
image and the transformation inverted. An ENVI plug-in for ATWT sharpening is
described in Appendix D.4.2.
7.5 Quality indices
Wang and Bovik [WB02] suggest the following measure of radiometric fidelity between two
image bands f and g:
81. 7.5. QUALITY INDICES 67
E
E
E
E
E
T
+
G
G
↑H
↑H
T
insert
E
T
c
normalize
MS
Pan MS(sharpened)
82. ↑
Figure 7.4: `A trous image sharpening scheme for an MS to panchromatic resolution ratio of
two. The symbol ↑H denotes the upsampled low-pass filter.
Figure 7.5: Comparison of three image sharpening methods with the Wang-Bovik quality
index. Left to right: Gram-Schmidt, ATWT, DTW.
83. 68 CHAPTER 7. IMAGE SHARPENING
Q =
σfg
σf σg
·
2 ¯f¯g
¯f2 + ¯g2
·
2σf σg
σ2
f + σ2
g
=
4σfg
¯f¯g
( ¯f2 + ¯g2)(σ2
f + σ2
g)
(7.4)
where ¯f and σf are mean and variance of band f and σfg is the covariance of the two
bands. This first term in (7.4) is seen to be the correlation coefficient between the two
images, with values in [−1, 1], the second term compares their average brightness, with
values in [0, 1] and the third term compares their contrasts, also in [0, 1]. Thus perfect
radiometric correspondence would give a value Q = 1.
Since image quality is usually not spatially invariant, it is usual to compute Q in, say,
M sliding windows and then average over all such windows:
Q =
1
M
M
j=1
Qj.
An ENVI plug-in for determining the quality index for pansharpened images is
given in Appendix D.4.3.
Figure 7.5 shows a comparison of three image sharpening methods applied to a QuickBird
image, namely the Gram-Schmidt, ATWT and DWT transformations. The latter is by far
the best, but spatial artifacts are apparent.
84. Chapter 8
Change Detection
To quote Singh’s review article on change detection [Sin89],
“The basic premise in using remote sensing data for change detection is that
changes in land cover must result in changes in radiance values ... [which] must
be large with respect to radiance changes from other factors.”
In the present chapter we will mention briefly the most commonly used digital techniques for
enhancing this “change signal” in bitemporal satellite images, and then focus our attention
on the so-called multivariate alteration detection algorithm of Nielsen et al. [NCS98].
8.1 Algebraic methods
In order to see changes in the two multispectral images represented by N-dimensional ran-
dom vectors F and G, a simple procedure is to subtract them from each other component-
by-component, examining the N differenced images characterized by
F − G = (F1 − G1, F2 − G2 . . . FN − GN ) (8.1)
for significant changes. Pixel intensity differences near zero indicate no change, large positive
or negative values indicate change, and decision thresholds can be set to define significant
changes. If the difference signatures in the spectral channels are used to classify the kind of
change that has taken place, one speaks of change vector analysis. Thresholds are usually
expressed in standard deviations from the mean difference value, which is taken to correspond
to no change.
Alternatively, ratios of intensities of the form
Fk
Gk
, k = 1 . . . N (8.2)
can be built between successive images. Ratios near unity correspond to no-change, while
small and large values indicate change. A disadvantage of this method is that random
variables of the form (8.2) are not normally distributed, so simple threshold values defined
in terms of standard deviations are not valid.
Other algebraic combinations, such as differences in vegetation indices (Section 2.1) are
also in use. All of these “band math” operations can of course be performed conveniently
within the ENVI/IDL environment.
69
85. 70 CHAPTER 8. CHANGE DETECTION
8.2 Principal components
Figure 8.1: Change detection with principal components.
Consider the bitemporal feature space for a single spectral band m in which each pixel
is denoted by a point (fm, gm), a realization of the random vector (Fm, Gm). Since the
unchanged pixels are highly correlated, they will lie in a narrow, elongated cluster along the
principal axis, whereas change pixels will lie some distance away from it, see Fig. 8.1. The
second principal component will thus quantify the degree of change associated with a given
pixel. Since the principal axes are determined by diagonalization of the covariance matrix for
all of the pixels, the no-change axis may be poorly determined. To avoid this problem, the
principal components can be determined iteratively using weights for each pixel according
to the magnitude of the second principal component. This method can be generalized to
treat all multispectral bands simultaneously [Wie97].
8.3 Post-classification comparison
If two co-registered satellite images have been classified, then the class labels can be com-
pared to determine land cover changes. If classification is carried out at the pixel level (as
opposed to segments or objects), then classification errors (typically 5%) may dominate
the true changes, depending on the magnitude of the latter. ENVI offers functions for
statistical analysis of post-classification change detection.