Kernel spectral matched filter improves hyperspectral target detection

International Journal of Computer Vision 71(2), 127–141, 2007
c© 2006 Springer Science + Business Media, LLC.
Manufactured in the United States.
DOI: 10.1007/s11263-006-6689-3
Kernel Spectral Matched Filter for Hyperspectral Imagery
HEESUNG KWON AND NASSER M. NASRABADI
Army Research Laboratory, ATTN: AMSRD-ARL-SE-SE, 2800
Powder Mill Rd., Adelphi, MD 20783
[email protected]
[email protected]
Received March 31, 2005; Revised December 12, 2005;
Accepted December 14, 2005
First online version published in June, 2006
Abstract. In this paper a kernel-based nonlinear spectral
matched filter is introduced for target detection in
hyperspectral imagery, which is implemented by using the ideas
in kernel-based learning theory. A spectral matched
filter is defined in a feature space of high dimensionality, which
is implicitly generated by a nonlinear mapping
associated with a kernel function. A kernel version of the
matched filter is derived by expressing the spectral matched
filter in terms of the vector dot products form and replacing
each dot product with a kernel function using the so
called kernel trick property of the Mercer kernels. The proposed
kernel spectral matched filter is equivalent to a
nonlinear matched filter in the original input space, which is

capable of generating nonlinear decision boundaries.
The kernel version of the linear spectral matched filter is
implemented and simulation results on hyperspectral
imagery show that the kernel spectral matched filter
outperforms the conventional linear matched filter.
Keywords: Matched filter· hyperspectral· kernel· nonlinear
detection· target detection
1. Introduction
Target detection using linear matched filtering is a well-
known approach in detecting objects of interest in hy-
perspectral imagery (Manolakis et al., 2000; Robey et
al., 1992; Kraut and Scharf, 1999; Kraut et al., 2001;
Chang, 2003; Van Veen and Buckley, 1988; Johnson
and Dudgeon, 1993; Capon, 1969). Typically, the tar-
get spectral signature is obtained from a spectral library
or from a set of training data, which is used in con-
junction with the correlation (covariance) matrix of the
data as the spectral matched filter. However, the linear
spectral matched filter detector (MFD) does not ex-
ploit the higher order statistical correlations between
the spectral bands since it is based only on the sec-
ond order statistics. Therefore, its performance is not
optimal for non-Gaussian data. Furthermore, the deci-
sion boundaries obtained by the conventional MFD is
linear, which is not optimal for non-Gaussian data. The
motivation behind designing the nonlinear matched fil-
ter is to obtain nonlinear decision boundaries as well
as exploiting the higher order statistical correlation be-
tween the spectral bands for non-Gaussian data in order
to improve the performance of the conventional linear
matched filter. Spectral MFDs are based on the assump-
tion of a linear model where the spectral signature of

the target and the background covariance matrix are as-
sumed to be known. A nonlinear spectral matched filter
can easily be developed by defining a matched filter like
model in a feature space of high dimensionality. How-
ever, to implement such a nonlinear matched filter in
the feature space may not be computationally tractable
due to the high dimensionality of the feature space.
Recently, using the ideas of kernel-based learning
theory it has been shown in Müller et al. (2001),
Schölkopf et al. (1999), Ruiz and Lopez-de Teruel
128 Kwon and Nasrabadi
(2001), Kwon and Nasrabadi (2004), Baudat and
Anouar (2000), Kwon and Nasrabadi (2005) that a
number of linear algorithms can easily be extended
to nonlinear versions by implementing them in terms
of kernel functions, thus avoiding the implementation
of the algorithm in the feature space. The input data
is in fact implicitly mapped into a kernel feature space
where the algorithm is then implemented in terms of
certain kernels such as the Mercer kernels (Schölkopf
and Smola, 2002).
In this paper, we introduce a kernel-based spectral
matched filter in the feature space and drive an expres-
sion for its kernel version. We define a matched filter
in a kernel feature space, which is equivalent to a non-
linear matched filter in the original input space. The
matched filter problem is first formulated in a particu-
lar kernel feature space, which is implicitly generated
by a nonlinear mapping associated with a kernel func-
tion. The matched filter expression derived in that fea-

ture space is then rewritten in terms of the vector dot
products form and by using the so called kernel trick,
see (12) in Section 2.2, it is converted in terms of the
kernel function. We refer to this process as kernelizing
the expression for the nonlinear matched filter and the
resulting matched filter is called the kernel-based spec-
tral matched filter detector (KMFD). Using the kernel
trick idea we avoid implementing the algorithm in the
high dimensional feature space. Furthermore, we do not
need to know the explicit expression for the nonlinear
map that produced the kernel feature space. However,
an appropriate kernel function with dot product prop-
erty (Mercer kernel) needs to be selected, which has
a particular nonlinear mapping associated with it that
models the data implicitly in the kernel feature space.
This paper is organized as follows. Section 2 intro-
duces the linear matched filter and the idea of kernel
trick when using Mercer kernels. In Section 3 the non-
linear matched filter is described, which is reformu-
lated in terms of the kernel function to obtain the ker-
nel matched filter. Performance of the kernel matched
filter on hyperspectral imagery is provided in Section
4 and conclusions are given in Section 5.
2. Preliminaries: Linear Matched Filter,
Introduction to Kernel Feature Space and
Kernel Trick
2.1. Linear Matched Filter
In this section, we introduce the concept of linear spec-
tral matched filter. The constrained least squares ap-
proach is used to derive the linear matched filter. Let the
input spectral signal x be x = [x (1), x (2), . . . , x ( J )]T

consisting of J spectral bands. We can model each
spectral observation as a linear combination of the tar-
get spectral signature and noise
x = as + n, (1)
where a is an attenuation constant (target abundance
measure). When a = 0 no target is present and
when a > 0 target is present, vector s = [s(1),
s(2), . . . , s( J )]T contains the spectral signature of the
target and vector n contains the additive background
clutter noise.
We can design a linear matched filter w = [w(1),
w(2), . . . , w( J )]T such that the desired target sig-
nal s is passed through while the average filter out-
put energy is minimized. Define X to be a J ×
N matrix of the N reference pixels obtained from
the test input image. Let each observation spectral
pixel to be represented as a column in the sample
matrix X
X = [x1 x2 · · · xN ]. (2)
The output of the filter for the input xi is given by
y(xi ) = wT xi = xTi w. (3)
The average output power of the filter for the reference
data X is given by
1
N
N∑

i =1
y(xi )
2 = wT
(
1
N
N∑
i =1
xi x
T
i
)
w = wT R
̂ w, (4)
where the R
̂ is the estimated correlation matrix of the
reference data. This constrained filter design is equiv-
alent to a constrained least squares minimization prob-
lem, as was shown in Scharf (1991), Van Veen and
Buckley (1988), Harsanyi (1993), Chang (2003), which
is given by
min
w
{wT R
̂ w} subject to sT w = 1 (5)
where minimization of minw{wT R
̂ w} ensures the back-
ground clutter noise is suppressed by the filter w and
the constrain condition sT w = 1 makes sure that the
filter gives an output of unity when a target is detected.

aliab
Highlight
aliab
Highlight
Kernel Spectral Matched Filter for Hyperspectral Imagery 129
The solution to this quadratic minimization problem is
given by
w = R
̂
−1s
sT R
̂ −1s
(6)
where correlation matrix R
̂ is usually estimated
from the input image. The expression (6) is referred
to as the minimum variance distortionless response
(MVDR) beamformer in the array processing litera-
ture (Van Veen and Buckley, 1988; Johnson and Dud-
geon, 1993). More recently, the same expression was
derived in the hyperspectral target detection literature
and was called the constrained energy minimization
(CEM) filter or the spectral correlation-based matched
filter (Harsanyi, 1993; Chang, 2003). The output of the
linear correlation-based matched filter for a test input
r, given the estimated correlation matrix, is given by
y(r) = wT r = s
T R
̂ −1r

sT R
̂ −1s
. (7)
If the mean of the observation data is removed (cen-
tered) a similar expression is obtained for the centered
data which is given by
yr = wT r =
sT Ĉ−1r
sT Ĉ−1s
(8)
where Ĉ represents the estimated covariance matrix
for the centered reference image data. In the above
derivation of the correlation-based matched filter no
assumption was made about the distribution of the ad-
ditive noise n in the linear model (1). However, if n
is assumed to be a Gaussian random noise distributed
as N (0, C), it has been shown in Robey et al. (1992)
and Kraut and Scharf (1999) that using the Generalized
Likelihood Ratio Test (GLRT) a similar expression to
(8), as in MVDR or CEM, can be obtained for the es-
timated abundance measure â given by
â = s
T C−1r
sT C−1s
. (9)
It should be noted that C is now the expected covari-
ance matrix of the background noise only and does not
include any target data. When the estimated covari-
ance matrix Ĉ of the background data is used in (9)
this filter is referred to as the adaptive matched filter

(Robey et al., 1992) in the signal processing literature
or Capon method (Capon, 1969) in the array processing
literature. In Robey et al. (1992) it was shown that the
CFAR behavior of this filter is given by
α(r) = |s
T Ĉ−1r|2
sT Ĉ−1s
(10)
which is proportional to the estimated squared magni-
tude of the output matched filter referred in Kraut et al.
(2001) as the signal-to-noise ratio (SNR).
In this paper, we only present the experimental re-
sults for the linear correlation-based matched filter
given by the expression (7). Similar results are obtained
by using (8) for the centered data.
2.2. Kernel Feature Space and Kernel Trick
In this subsection an introduction to kernel feature
map and kernel learning is provided, which is used
in the next section to convert a nonlinear version of
the matched filter into its corresponding kernel format.
Suppose the input hyperspectral data is represented by
the data space (X ⊆ RJ ) and F be a nonlinear fea-
ture space associated with X by a nonlinear mapping
function φ
φ : X → F,
x �→ φ(x), (11)
where x is an input vector in X , which is mapped into

a potentially much higher dimensional feature space.
Any linear algorithm can be remodeled in this high
dimensional feature space by replacing the original in-
put data x with the mapped data φ(x). Implementing
any linear algorithm (e.g., matched filter) in the feature
space is equivalent to performing a nonlinear version
of that algorithm (i.e., nonlinear matched filter) in the
original data space. Due to the high dimensionality of
the feature space F it is computationally not feasible to
implement the algorithm in the feature space. However,
in the kernel-based learning algorithms the task is first
formulated in terms of dot products in the feature space
and then the kernel trick (12) is used to implement the
dot products in terms of kernel functions (Schölkopf
and Smola, 2002). The kernel representation for the dot
products in F (known as the kernel trick) is expressed
as
k(xi , x j ) = 〈φ(xi ), φ(x j )〉
= φ(xi ) · φ(x j ), (12)
Anonymous
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab

Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab

Highlight
where k is a positive definite kernel, such as a Mercer
kernel (Schölkopf and Smola, 2002).
Using this kernel trick allows us to implicitly com-
pute the dot products in F without mapping the input
vectors into F; therefore, in the kernel-based learn-
ing methods, the mapping φ does not need to be iden-
tified. However, an appropriate kernel has to be de-
fined, which has a nonlinear mapping associated with
it. Equation (12) shows that the dot products in F can
be avoided and replaced with a kernel, which can be
easily calculated without identifying the nonlinear map
φ. Experimental results for three different kernels are
reported in this paper. The three kernels are
(i) the Gaussian Radial Bases Function kernel (RBF
kernel): k(x, y) = exp( −‖x−y‖2
c
),
(ii) spectral angle-based kernel: k(x, y) = x·y‖x‖‖y‖ , and
(iii) polynomial kernel: k(x, y) = ((x · y) + θ )d
where c represents the width of the Gaussian RBF ker-
nel, d is a positive integer, and θ ≥ 0 is a constant. See
Schölkopf and Smola (2002) for more detailed infor-
mation about the properties of kernels and kernel-based
learning theory.

3. Nonlinear Matched Filter and Kernel
Matched Filter
In this section, we show how to formulate a kernel
version of the linear matched filter. This is achieved by
modeling the data in a kernel feature space where the
corresponding linear matched filter in the feature space
is equivalent to a nonlinear matched filter in the input
space. We then show how to implement the nonlinear
matched filter in terms of the kernel function.
3.1. Introduction to Nonlinear Matched Filter
Consider a linear model of the input data in a kernel
feature space given by
φ(x) = aφ φ(s) + nφ , (13)
where φ is a nonlinear mapping associated with a ker-
nel function, aφ is an attenuation constant (target abun-
dance measure), the high dimensional vector φ(s) con-
tains the spectral signature of the target in the feature
space, and vector nφ contains the additive noise in the
feature space. The above linear model (13) in the fea-
ture space is not the same as the nonlinearly mapped
version of the additive model given in (1). However,
this linear model in the feature space is equivalent to a
specific nonlinear model in the input space. Therefore,
defining a matched filter using the linear model (13) is
the same as developing a nonlinear matched filter for a
specific nonlinear model in the input space.
Using the constrained least squares approach that
was explained in the previous section it can easily be
shown that the equivalent correlation-based matched

filter wφ in the feature space is given by
wφ =
R
̂ −1φ φ(s)
φ(s)T R
̂ −1φ φ(s)
, (14)
where R
̂ φ is the estimated correlation of pixels in the
feature space. The estimated correlation matrix is given
by
R
̂ φ =
1
N
Xφ Xφ
T (15)
where Xφ = [φ(x1) φ(x2) . . . φ(xN )] is a matrix
whose columns are the mapped input reference data
in the feature space. The matched filter in the feature
space (14) is equivalent to a nonlinear matched filter in
the input space and its output for the input φ(r) is given
by
y(φ(r)) = wTφ φ(r) =
φ(s)T R
̂ −1φ φ(r)
φ(s)T R
̂ −1φ φ(s)
. (16)
Due to the high dimensionality of the feature space
the expression (16) is not tractable. Therefore, we can-
not directly implement it in the feature space. We need

to first convert this expression in terms of the dot prod-
ucts of the input vectors in the feature space. The kernel
trick is then used to convert the dot products in the fea-
ture space in terms of the kernel functions.
3.2. Kernel Matched Filter
In this subsection, we show how to kernelize the
matched filter in the feature space. The estimated back-
ground correlation matrix can be represented by its
eigenvalue decomposition or so called spectral decom-
position (Strang, 1986) given by
R
̂ φ = Vφ �Vφ T , (17)
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight

aliab
Highlight
Figure 1. Two-dimensional toy data sets: (a) A Gaussian
mixture and (b) nonlinearly mapped data.
Figure 2. Contour and surface plots of MSD and KMSD: (a)
MFD on the data shown in Fig. 1(a), (b) KMFD on the data
shown in Fig. 1(a),
(c) MFD on the data shown in Figs. 1(b) and (d) KMFD on the
data shown in Fig. 1(b).
where � is a diagonal matrix consisting of the nonzero
eigenvalues of R
̂ φ and Vφ is a matrix whose columns
are the eigenvectors of R
̂ φ in the feature space. The
eigenvector matrix is represented by
Vφ = [v1φ v2φ · · · vNφ ], (18)
where N is the maximum number of eigenvectors with
nonzero eigenvalue.
The sample correlation (covariance) matrix in the
feature space is rank deficient because the number of
samples are usually much less than the dimensional-
ity of the feature space. Therefore, the inverse of R
̂ φ
cannot be obtained and we have to resort to the pseudo-
inverse of the sample correlation matrix, which is the
minimum length least squares solution for the inverse
sample correlation matrix (see p. 450 in Strang, 1986).
The pseudo-inverse of the estimated background cor-

relation matrix can be written in terms of its eigen-
value decomposition or singular value decomposition
(Strang, 1986) as
R
̂ #φ = Vφ �−1Vφ T . (19)
Furthermore, the diagonal matrix �−1 can be replaced
with a truncated version of �−1 by only including
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
Figure 3. Sample band images (48th) from the HYDICE and
mine images. (a) Desert Radiance II image, (b) Forest Radiance
I image and (c)
mine image.
the eigenvalues that are above a small threshold in
order to obtain what is called the effective rank of
the matrix (see p. 445 in Strang, 1986). Truncation

of the diagonal matrix �−1 will numerically provide
a more stable pseudo-inverse since, due to round-off
errors, it is not easy to identify the true non-zero
eigenvalues.
Each eigenvector v
j
φ in the feature space, as shown
in Schölkopf et al. (1999), can be expressed as a linear
combination of the input reference vectors in the feature
space given by
v
j
φ =
N∑
i =1
λ j
−1/2
β
j
i φ(xi ) = Xφβ j λ j −1/2, (20)
where the expansion coefficient vectors β j =
(β
j
1 , β
j
2 , . . . , β

j
N )
T , for j = 1, . . . , N1, N1 ≤ N , are
the eigenvectors with nonzero eigenvalues of the ker-
nel (Gram) matrix K(X, X) as shown in Appendix I,
K(X, X) = (K)i j is an N × N matrix whose entries
are the dot products k(xi , x j ) =< φ(xi ), φ(x j ) > for
xi , x j ∈ X; and λ j , j = 1, . . . , N1, N1 ≤ N are the
corresponding nonzero eigenvalues associated with the
eigenvectors. For all the eigenvectors Vφ in the feature
space we have
Vφ = XφB�−1/2, (21)
where B = [β1 β2 . . . β N1 ]. Substituting (21) into
(19) yields
R
̂ #φ = XφB�−2BT XTφ . (22)
Inserting Eq. (22) into (16) it can be rewritten as
y(φ(r)) =
φ(s)T XφB�−2BT XTφ φ(r)
φ(s)T XφB�−2BT XTφ φ(s)
. (23)
aliab
Highlight
aliab
Highlight

aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight

Figure 4. Examples of spectral curves from various regions in
the Forest Radiance I image: (a) tree, (b) grass, (c) shadow, and
(d) target regions.
In general, the spectral curves show a wide range of spectral
variability. Especially, the target spectral curves are remarkably
different in a local
region.
The dot product term φ(s)T Xφ in the feature space can
be represented in terms of the kernel function, which is
referred to as its empirical kernel map (Schölkopf and
Smola, 2002)
φ(s)T Xφ = (k(x1, s), k(x2, s), . . . , k(xN , s))
= kT (X, s) = kTs . (24)
Similarly ,
φ(r)T Xφ = (k(x1, r), k(x2, r), . . . , k(xN , r))
= kT (X, r) = kTr . (25)
Also using the properties of the Kernel Principal Com-
ponent Analysis (PCA) as shown in Appendix I, we
have the relationship
K−2 = 1
N 2
B�−2BT . (26)
Substituting (24), (25), and (26) into (23) the kernelized

version of the matched filter is given by
y(kr) =
k(X, s)T K−2k(X, r)
k(X, s)T K−2k(X, s)
= k
T
s K
−2kr
kTs K
−2ks
, (27)
which can now be implemented with no knowledge of
the mapping function φ. The only requirement is a good
choice for the kernel function k.
In the expression (27) the inverse K−2 may not be
numerically stable if the background spectral samples
are not independent. Therefore, the pseudo-inverse of
K is used, which is based on eigenvalue decomposition
in (42) where eigenvectors with non-zero eigenvalues
are used. In the experimental section, expression (42) is
used to obtain the pseudo-inverse of K where only the
eigenvectors with eigenvalues above a small threshold
are kept. A similar procedure has been used in Ruiz
and Lopez-de Teruel (2001) to obtain a stable pseudo-
inverse of the Gram matrix by discarding the lowest
eigenvalues that are below 10−5. The number of eigen-
vectors that are kept will determine the effective rank
of the matrix K.
In the derivation of the correlation-based kernel

matched filter we assumed that the data was not cen-
tered in the feature space. To obtain covariance-based
kernel matched filter, expression (8), we need to cen-
ter the data by removing the sample mean in the
feature space. However, removing the sample mean
is not computationally tractable in the feature space
due to the high dimensionality of F. Therefore, the
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight

aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
aliab
Highlight
Figure 5. Examples of spectral curves from various regions in
the Desert Radiance II image: (a) vegetation, (b) dirt road, (c)
soil, and (d) target
regions. The target spectral curves are also characterized by a
wide range of spectral variability.
kernel matrix K needs to be properly centered, as shown
in Schölkopf and Smola (2002). The effect of center-
ing on the kernel matched filter can be incorporated by

replacing the uncentered Xφ with the centered Xφ −μφ
(where μφ = 1N
∑N
i =1 φ(xi ) is the mean of the refer-
ence input data) in the estimation of the centered cor-
relation (covariance) matrix expression (15) as well as
(24) and (25) for the empirical kernel mappings of the
target and input data, respectively. The resulting cen-
tered K
̂ is shown in Schölkopf and Smola (2002) to be
given by
K
̂ = (K − 1N K − K1N + 1N K1N ), (28)
where the elements of the N × N matrix (1N )i j = 1/N .
The properly centered kernel matched filter output for
(27) corresponding to the kernel version of (8) is now
given by
y(k̂ r) =
k̂ Ts K
̂
−2k̂ r
k̂ Ts K
̂
−2k̂ s
, (29)
where k̂ Ts = kTs − 1N
∑N
i =1 k(xi , s)�1 and k̂ Tr = kTr −
1
N

∑N
i =1 k(xi , r)�1, which are obtained by replacing Xφ
with Xφ − μφ in (24), and (25), respectively. The col-
umn vector �1 denotes an N-dimensional vector with all
its components equal to 1.
Furthermore, the CFAR behavior of the kernel
matched filter (29) is given by
α(k̂ r) =
|k̂ Ts K
̂ −2k̂ r|2
k̂ Ts K
̂
−2k̂ s
. (30)
4. Simulation Results
In this section, we implemented both the proposed
KMFD described by (27) and the conventional MFD
described by (10) on simulated data as well as real
hyperspectral imagery. We implemented KMFD with
three kernel functions, each kernel function being as-
sociated with a different feature space. The three dif-
ferent kernels used were (i) the Gaussian RBF kernel,
exp(
−‖x−y‖2
30
), (ii) spectral angle-based kernel,
x·y

‖x‖‖y‖ ,
and (iii) 5th order polynomial kernel, ((x · y) + 1)5.
The simulated data consists of two illustrative two-
dimensional toy data sets; a Gaussian mixture, as
shown in Fig. 1(a), and a nonlinearly mapped data
shown in Fig. 1(b). In Fig. 1 data points for the desired
target were represented by the star-shaped symbol and
aliab
Highlight
aliab
Highlight
Figure 6. Detection results for the Desert Radiance II image
using the kernel matched filter detectors (KMFD) and the
matched filter detector
(MFD): (a) KMFD with the Gaussian RBF kernel, (b) KMFD
with the polynomial kernel, (c) KMFD with the spectral angle-
based kernel, and
(d) MFD in the original input domain.
the background points were represented by the circles.
In Fig. 1(b) the two-dimensional data points x = (x , y)
for each class was obtained by nonlinearly mapping the
original Gaussian mixture data points x0 = (x0, y0) in
Fig. 1(a). All the data points in Fig. 1(a) were nonlin-
early mapped by x = (x , y) = (x0, x 20 + y0), therefore,
the second component of each data point is nonlinearly

related to its first component.
Figure 2 shows contour and surface plots of the con-
ventional MFD and KMFD on the toy data sets as
shown in Fig. 1. For both data sets, the contours gen-
erated by KMFD are highly nonlinear and naturally
follow the dispersion of the data, thus successfully sep-
arating the two classes, as opposed to the linear con-
tours obtained by MSD. For both the Gaussian mixture
and nonlinearly mapped data, KMFD clearly provided
significantly improved discrimination over the conven-
tional MFD.
The real hyperspectral images are from a HY-
DICE (HYperspectral Digital Imagery Collection
Figure 7. Detection results for the Forest Radiance I image
using the kernel matched filter detectors (KMFDs) and the
matched filter detector
(MFD): (a) KMFD with the Gaussian RBF kernel, (b) KMFD
with the polynomial kernel, (c) KMFD with the spectral angle-
based kernel and
Experiment) sensor and a mine detection sensor. The
HYDICE imaging sensor generates 210 bands across
the whole spectral range (0.4–2.5 μm). But we only
use 150 bands by discarding water absorption and low
SNR bands; the spectral bands used are the 23rd–
101st, 109th–136th, and 152nd–194th. The hyperspec-

tral mine image consists of 70 bands over the spectral
range of 8–11.5 μm which includes the long-wave in-
frared (LWIR) band. Two HYDICE images from the
Desert Radiance II data collection and the Forest Ra-
diance I data collection, and the mine image were used
to test both the kernel-based and conventional matched
filter detectors. The Desert Radiance II (DR-II) image
contains 6 military targets located in the dirt road; the
Forest Radiance I (FR-I) image includes total of 14
military targets along the tree line; and the hyperspec-
tral mine image contains a total of 33 surface mines,
as shown in the sample band images in Fig. 3. While
spectral curves from various regions of different mate-
rials (or terrain types) show somewhat different spectral
Figure 8. Detection results for the mine image using the kernel
matched filter detectors (KMFDs) and the matched filter
detector (MFD): (a)
KMFD with the Gaussian RBF kernel, (b) KMFD with the
polynomial kernel kernel, (c) KMFD with the spectral angle-
based kernel kernel and
characteristics, even in a local region of the same ma-
terial types they show a wide range of spectral variabil-
ity (especially for targets and vegetation), thus making
target detection a challenging task, as shown in Figs. 4
and 5.
All the pixel vectors in a test image are first normal-

ized by a constant, which is a maximum value obtained
from all the spectral components of the spectral vec-
tors in the corresponding test image, so that the entries
of the normalized pixel vectors fit into the interval of
spectral values between zero and one. The rescaling of
pixel vectors was mainly performed to effectively uti-
lize the dynamic range of Gaussian RBF kernel. The
rescaling does not affect the performance of KMFDs
or MFD when the other two kernel functions are used.
4.1. Algorithm Implementation
Background statistics—the kernel matrix K
̂ and the
correlation matrix R
̂ —are globally estimated for the
kernel-based and conventional implementation of the
Figure 9. ROC curves obtained from the detection results for the
Desert Radiance II image shown in Fig. 6.
Figure 10. ROC curves obtained from the detection results for
the Forest Radiance I image shown in Fig. 7.
matched filter detectors, respectively. Global estima-
tion must be performed prior to detection and normally
needs a large amount of data samples to successfully
represent all the background types present in a given
data set. In this paper, we use a large number of spectral
vectors obtained from a given test image that best rep-
resent the spectral characteristics of the background. A
well-known data clustering algorithm, k-means (Jain

et al., 1999), is used on the spectral vectors in order to
generate a significantly less number of spectral vectors
Figure 11. ROC curves obtained from the detection results for
the mine image shown in Fig. 8.
(centroids) from which appropriate background statis-
tics are estimated. The number of the representative
spectral vectors obtained from the k-means procedure
was set to 300 in the experiment. The correlation and
kernel matrices were obtained from these 300 spectral
vectors. In all the experiments the same target spectral
signature s is obtained by averaging the target samples
collected from one of the targets in the test image. For
both the DR-II and FR-I images it is the left most target.
Since a relatively small set of the background sam-
ples was used to represent the background statistics of
the given test images, the kernel matrix K
̂ and the cor-
relation matrix R
̂ need to be regularized in order to
obtain a numerically stable pseudo-inverse. In this pa-
per, in order to regularize both K
̂ and R
̂ the eigenvalues
decomposition of each of the two matrices is first per-
formed. The eigenvectors with large eigenvalues con-
vey more relevant information to the prior knowledge
of the background than the ones with very small eigen-
values that could represent sensor noise. The regular-
ized versions of K
̂ and R
̂ are obtained by discarding
all the eigenvalues below a small threshold, 10−5. In
our experimental results we did not have to regularize
the correlation matrix since the number of the back-
ground data was sufficient to obtain the correct inverse.

However, in the case of inverse Gram matrix the small
eigenvalues were always discarded.
4.2. Performance Comparison
The receiver operating characteristics (ROC) curves
representing detection probability Pd versus false
alarm rates N f were generated to provide quantitative
performance comparison as well as qualitative perfor-
mance comparison. For ROC curves generation, based
on the ground truth information for the HYDICE im-
ages, we obtain the coordinates of all the rectangular
target regions. Each target was considered to be de-
tected if at least one pixel within the corresponding
target region was detected at a given false alarm rate.
Pd and N f are defined as
Pd := Nhi tNt and N f :=
Nmi ss
Nt ot
, (31)
where Nhi t represents the number of targets detected
given a certain threshold; Nt represents the total num-
ber of targets in the hyperspectral image; Nmi ss rep-
resents the number of background pixels detected as
targets; and Nt ot represents the total number of pixels
in the hyperspectral image. Pd becomes one only when
all the targets are detected.
Figures 6–8 show the detection results for the DR-
II, FR-I and mine images using KMFD with the three
different kernels and correlation-based MFD, respec-
tively. The corresponding ROC curves for the detection

Anonymous
Highlight
results are shown in Fig. 9–11. For the DR-II image, as
shown in Fig. 9, KMFD with any choice of the three
kernels clearly outperformed the conventional MFD at
almost every false alarm rate. For the FR-I image the
background structure is much more complex than that
of DR-I. It includes the tree area where most irregular
illumination effects occur, the long shadowy transition
and the region filled mostly with grass. KMFD using
any choice of the kernels still showed improved de-
tection results over the conventional MFD for both the
FR-I and the mine images, as shown in Figs. 10 and
11, respectively.
5. Conclusions
We have extended the conventional matched filter de-
tector to a nonlinear version by implicitly mapping the
input data into a much higher dimensional feature space
in order to make use of high-order nonlinear correla-
tions between the spectral bands of a hyperspectral im-
age. The expression of the nonlinear matched filter de-
tector in the feature space, which is basically intractable
due to high (potentially infinite) dimensionality, is con-
verted in terms of kernels using the kernel eigenvector
representation as well as the kernel trick to derive a
tractable algorithmic expression.
KMFD, the kernel counterpart of MFD, was im-
plemented with several different kernels, each with

different characteristics. In general, KMFD with all
the kernels showed a superior detection performance
when compared to the conventional MFD for the HY-
DICE and mine images tested in this paper. The detec-
tion results show that the kernel-based nonlinear detec-
tion method is quite suitable for identifying underlying
structures of complex data such as hyperspectral data,
thus they are more powerful in discriminating targets
of interest.
Appendix I: Kernel PCA
In this Appendix, we present derivation of Kernel PCA
and its properties providing the relationship between
the correlation (covariance) matrix and the correspond-
ing Gram (centered) matrix. Our goal is to prove (26).
To drive the Kernel PCA consider the estimated back-
ground clutter correlation matrix in the feature space
and assume that the input data is not centered . The
estimated correlation matrix in the feature space is
given by
R
̂ φ =
1
N
Xφ X
T
φ . (32)
The PCA eigenvectors are computed by solving the
eigenvalue problem
λvφ = R
̂ φ vφ

= 1
N
N∑
i =1
φ(xi )φ(xi )
T vφ
= 1
N
N∑
i =1
〈φ(xi ), vφ 〉φ(xi ), (33)
where vφ is an eigenvector in F with a corresponding
nonzero eigenvalue λ. Equation (33) indicates that each
eigenvector vφ with corresponding λ �= 0 are spanned
by φ(x1), . . . , φ(xN )—i.e.
vφ =
N∑
i =1
λ
−1/2
βi φ(xi ) = λ−1/2Xφβ, (34)
where Xφ = [ φ(x1) φ(x2) . . . φ(xN ) ] and β =
(β1, β2, . . . , βN )
T . Substituting (34) into (33) and mul-

tiplying with φ(xn )
T , n = 1, . . . , N , yields
λ
N∑
i =1
βi < φ(xn ), φ(xi ) >
= 1
N
N∑
i =1
βi φ(xn )φ(xi )φ(xi )
T
N∑
i =1
φ(xi )
= 1
N
N∑
i =1
βi 〈φ(xn ),
N∑
j =1
φ(x j )〈φ(x j ), φ(xi )〉〉,

for all n = 1, . . . , N . (35)
We denote by K = K(X, X) = (K)i j the N ×
N kernel matrix whose entries are the dot products
〈φ(xi ), φ(x j )〉. Equation (35) can now be rewritten as
N λβ = Kβ, (36)
where β turn out to be the eigenvectors with nonzero
eigenvalues of the kernel matrix K. Therefore, the
Gram matrix can be written in terms of its eigenvector
decomposition as
K = B�BT , (37)
where B = [β1 β2 . . . β N ] are the eigenvectors of the
kernel matrix and � is a diagonal matrix with diagonal
values equal to the eigenvalues of the kernel matrix
K. Similarly, from the definition of PCA in the feature
space (33) the estimated background correlation matrix
is decomposed as
R
̂ φ = Vφ �Vφ T , (38)
where Vφ = [v1φ v2φ . . . vNφ ] and � is a diagonal ma-
trix with its diagonal elements being the eigenvalues of
Ĉφ . From (33) and (36) the eigenvalues of the correla-
tion matrix � in the feature space and the eigenvalues
of the kernel matrix � are related by
� = 1

N
�. (39)
Substituting (39) into (37) we obtain the relationship
K = N B�BT , (40)
where N is a constant representing the total num-
ber of background clutter samples, which can be
ignored.
The sample correlation matrix in the feature space is
rank deficient, therefore, its inverse cannot be obtained
but its pseudo-inverse can be written as (Strang, 1986)
R
̂ #φ = Vφ �−1Vφ T = XφB�−2BT Xφ T . (41)
The maximum number of eigenvectors in the pseudo-
inverse is equal to the number of non-zero eigenvalues
(or the number of independent data samples), which
cannot be exactly determined due to round-off er-
ror in the calculations. Therefore, the effective rank
(Strang, 1986) is determined by only including the
eigenvalues that are above a small threshold. Simi-
larly, the inverse Gram matrix K−1 can also be written
as
K−1 = 1
N
B�−1BT . (42)
If the data samples are not independent then the pseudo-
inverse of the Gram matrix has to be used, which is the
same as (42) except only the eigenvectors with eigen-
values above a small threshold are included in order

to obtain a numerically stable inverse. Using (42) it is
obvious that the squared inverse Gram matrix can also
be written as
K−2 = 1
N 2
B�−2BT . (43)
Acknowledgment
The authors would like to thank the anonymous review-
ers for their insightful comments and suggestions that
helped us to improve the quality of our paper.
References
Baudat, G. and Anouar, F. 2000. Generalized discriminant
analysis
using a kernel approach. Neural Computat., 12:2385–2404.
Capon, J. 1969. High-resolution frequency-wavenumber
spectrum
analysis. Proc. of the IEEE, 57:1408–1418.
Chang, C.-I. 2003. Hyperspectral Imaging: Techniques for
Detection
and Classification. Kluwer Academic/Plenum Publishers.
Harsanyi, J.C. 1993. Detection and Classification of Subpixel
Spec-
tral Signatures in Hyperspectral Image Sequences. Ph.D. disser-
tation, Dept. Elect. Eng., Univ. of Maryland, Baltimore County.

Jain, A.K., Murty, M.N., and Flynn, P.J. 1999. Data clustering:
A
review. ACM Computing Surveys, 31(3):264–323.
Johnson, D.H. and Dudgeon, D.E. 1993. Array Signal
Processing.
Prentice Hall.
Kraut, S. and Scharf, L.L. 1999. The CFAR adaptive subspace
detec-
tor is a scale invariant-invariant GLRT. IEEE Trans. Signal Pro-
cess., 47(9):2538–2541.
Kraut, S., Scharf, L.L., and McWhorter, T. 2001. Adaptive
subspace
detectors. IEEE Trans. Signal Process., 49(1):208–216.
Kwon, H. and Nasrabadi, N.M. 2004. Kernel-based subpixel
target
detection in hyperspectral images. In Proc. of IEEE Joint
Confer-
ence on Neural Networks, Budapest, Hungary, pp. 717–722.
Kwon, H. and Nasrabadi, N.M. 2005. Kernel RX-algorithm: A
non-
linear anomaly detector for hyperspectral imagery. IEEE Trans.
Geosci. Remote Sensing, 43(2):388–397.
Manolakis, D., Shaw, G., and Keshava, N. 2000. Comparative
analy-
sis of hyperspectral adaptive matched filter detector. In Proc.

SPIE,
vol. 4049, pp. 2–17.
Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K., and Schölkopf,
B.
2001. An introduction to kernel-based learning algorithms.
IEEE
Trans. Neural Networks., 2:181–202.
Robey, F.C., Fuhrmann, D.R., and Kelly, E.J. 1992. A CFAR
adaptive
matched filter detector. IEEE Trans. on Aerospace and Elect.
Syst.,
28(1):208–216.
Ruiz, A. and Lopez-de Teruel, E. 2001. Nonlinear kernel-based
sta-
tistical patten analysis. IEEE Trans. Neural Networks., 12:16–
32.
Scharf, L.L. 1991. Statistical Signal Processing. Addison-
Wesley.
Schölkopf, B. and Smola, A.J. 2002. Learning with Kernels. The
MIT Press.
Schölkopf, B., Smola, A.J., and Müller, K.-R. 1999. Kernel
principal
component analysis. Neural Computat., (10):1299–1319 .
Strang, G. 1986. Linear Algebra and Its Applications. Harcourt
Brace
& Company.

Van Veen, B.D. and Buckley, K.M. 1988. Beamforming: A
versatile
approach to spatial filtering. IEEE ASSP Magazine, pp. 4–24.
College of Administrative and Financial Sciences
MGT 312
Assignment 3
Deadline: 30/11/2019 @ 23:59
Course Name: Decision Making and Problem Solving
Student’s Name:
Course Code: MGT 312
Student’s ID Number:
Semester: I
CRN:
Academic Year: 1440/1441 H
For Instructor’s Use only
Instructor’s Name:
Students’ Grade: Marks Obtained/Out of
Level of Marks: High/Middle/Low
Instructions – PLEASE READ THEM CAREFULLY
· The Assignment must be submitted on Blackboard (WORD
format only) via allocated folder.
· Assignments submitted through email will not be accepted.
· Students are advised to make their work clear and well
presented, marks may be reduced for poor presentation. This

includes filling your information on the cover page.
· Students must mention question number clearly in their
answer.
· Late submission will NOT be accepted.
· Avoid plagiarism, the work should be in your own words,
copying from students or other resources without proper
referencing will result in ZERO marks. No exceptions.
· All answered must be typed using Times New Roman (size 12,
double-spaced) font. No pictures containing text will be
accepted and will be considered plagiarism).
· Submissions without this cover page will NOT be accepted.
Course Learning Outcomes-Covered
CLO Number
Course Learning Outcome
10
Develop numerical skills required for quantitative decisions in
complex situations. (4.1)
13
Create a Decision Making and Problem Solving worksheet
document. (4.5)
14
Utilize technological tools and graphics as an aid to give visual
display to decision making data. (4.6)
15
Identify organized alternatives and select among
possible alternative to evaluate business options. (2.10)
Critical Thinking Questions: (Marks
05)
1. You are planning to start an online marketing for your KSA
based restaurant chain named as “Al-Mataam.”
a. Create a mind map to identify the essential procedures to
carry out an Internet marketing for “Al-Mataam”. [Note: Mind
map must be up to three levels]

b. Develop a decision tree to help your online customer to
determine what meal to buy from “Al-Mataam”. You may use
any attribute you wish (cost, calories,) type of food (Veg, Non-
Veg (chicken, Mutton, beef, or Fish)) Regional Food (Arabian,
India, Turkish, Chinese, etc).
c. As a CEO of “Al-Mataam” you have to hire a Restaurant
Manager for its Riyadh location. You have shortlisted the
following five candidates with their skill set.
Yousuf
Amal
John
Ahmad
Julie
Experience in Year
Experience of Restaurant
5
2
3
0
1
Knowledge of service industry
5
4
4
5
3
Business Development Experience
1
3
2
4
4

Leadership Experience
2
2
1
2
3
Computer proficiency
Good
Average
Good
Poor
Excellent
a. Rate each criteria on a scale of 1-5. In this case, 1 is less
important and 5 is most important. Based on the weightage and
criteria identify the most suitable candidate.
[Please answer in the next page]
ANSWER

Kernel Matched Signal Detectors for Hyperspectral Target
Detection
Heesung Kwon and Nasser M. Nasrabadi
U.S. Army Research Laboratory, 2800 Powder Mill Rd.,
Adelphi, MD 20783-1197
Abstract
In this paper, we compare several detection algorithms that
are based on spectral matched (subspace) filters. Nonlin-
ear (kernel) versions of these spectral matched (subspace)
detectors are also discussed and their performance is com-
pared with the linear versions. These kernel-based detec-
tors exploit the nonlinear correlations between the spec-
tral bands that are ignored by the conventional detectors.
Several well-known matched detectors, such as matched
subspace detector, orthogonal subspace detector, spectral
matched filter and adaptive subspace detector (adaptive co-
sine estimator) are extended to their corresponding kernel
versions by using the idea of kernel-based learning theory.
In kernel-based detection algorithms the data is implicitly
mapped into a high dimensional kernel feature space by a
nonlinear mapping which is associated with a kernel func-
tion. The detection algorithm is then derived in the feature
space which is kernelized in terms of the kernel functions in
order to avoid explicit computation in the high dimensional
feature space. Experimental results based on simulated toy-
examples and real hyperspectral imagery show that the ker-
nel versions of these detectors outperform the conventional

linear detectors.
1 Introduction
Detecting signals of interest, particularly with wide signal
variability, in noisy environments has long been a challeng-
ing issue in various fields of signal processing. Among a
number of previously developed detectors, the well-known
matched subspace detector (MSD) [1], orthogonal subspace
detector (OSD) [1, 2], spectral matched filter (SMF) [3, 4],
and adaptive subspace detectors (ASD) also known as adap-
tive cosine estimator (ACE) [5, 6] have been widely used to
detect a desired signal (target).
Matched signal detectors, such as spectral matched fil-
ter and matched subspace detectors (whether adaptive or
non-adaptive), only exploit second order correlations, thus
completely ignoring nonlinear (higher order) spectral inter-
band correlations that could be crucial to discriminate be-
tween target and background. In this paper, our aim is to
introduce nonlinear versions of MSD, OSD, SMF and ASD
detectors which effectively exploits the higher order spec-
tral inter-band correlations in a high (possibly infinite) di-
mensional feature space associated with a certain nonlinear
mapping via kernel-based learning methods [7]. A nonlin-
ear mapping of the input data into a high dimensional fea-
ture space is often expected to increase the data separability
and reduce the complexity of the corresponding data struc-
ture. The nonlinear versions of a number of signal process-
ing techniques such as principal component analysis (PCA)
[8], Fisher discriminant analysis [9], linear classifiers [10],
and kernel-based anomaly detection [11] have already been
defined in a kernel space.
This paper is organized as follows. Section 2 provides

the background to the kernel-based learning methods and
kernel trick. Section 3 introduces a linear matched subspace
and its kernel version. The orthogonal subspace detector is
defined in Section 4 as well as its kernel version. In Section
5 we describe the conventional spectral matched filter ad its
kernel version in the feature space and reformulate the the
expression in terms of the kernel function using the kernel
trick. Finally, in Section 6 the adaptive subspace detector
and its kernel version are introduced. Performance com-
parison between the conventional and the kernel versions of
these algorithms is provided in Section 7 and conclusions
are given in Section 8.
2 Kernel-based Learning and Kernel
Trick
Suppose that the input hyperspectral data is represented by
the data space (
� � � �
) and � is a feature space associated
with
�
by a nonlinear mapping function �
� � � � � �
� � � � � (1)
where is an input vector in � which is mapped into a
potentially much higher – (could be infinite) – dimensional
feature space. Due to the high dimensionality of the feature
space � , it is computationally not feasible to implement any
algorithm directly in feature space. However, kernel-based
learning algorithms use an effective kernel trick given by

Eq. (2) to implement dot products in feature space by em-
ploying kernel functions [7]. The idea in kernel-based tech-
Proceedings of the 2005 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR’05)
1063-6919/05 $20.00 © 2005 IEEE
niques is to obtain a nonlinear version of an algorithm de-
fined in the input space by implicitly redefining it in the
feature space and then converting it in terms of dot prod-
ucts. The kernel trick is then used to implicitly compute the
dot products in � without mapping the input vectors into� ;
therefore, in the kernel methods, the mapping � does
not need to be identified.
The kernel representation for the dot products in � is
expressed as � � � � � � � � � � � � �
� � � � � � (2)
where
�
is a kernel function in terms of the original data.
There are a large number of Mercer kernels that have the
kernel trick property, see [7] for detailed information about
the properties of different kernels and kernel-based learn-
ing. Our choice of kernel in this paper is the Gaussian RBF
kernel and the associated nonlinear function � with this ker-
nel generates a feature space of infinite dimensionality.
3 Linear MSD and Kernel MSD
3.1 Linear MSD

In this model the target pixel vectors are expressed as a lin-
ear combination of target spectral signature and background
spectral signature, which are represented by subspace target
spectra and subspace background spectra, respectively. The
hyperspectral target detection problem in a � -dimensional
input space is expressed as two competing hypotheses �
and � �
� � � � � � � � � (3)
� � � � � � � � � � � � � � � � � � � � � �
where � and � represent orthogonal matrices whose � -
dimensional column vectors span the target and background
subspaces, respectively; � and � are unknown vectors whose
entries are coefficients that account for the abundances of
the corresponding column vectors of � and � , respectively;�
represents Gaussian random noise ( � � � � ) distributed
as � ! � " # $ � ; and � � � � is a concatenated matrix of�
and � . The numbers of the column vectors of � and� , % &
and % ' , respectively, are usually smaller than �
( % & � % ' ( � ).
The generalized likelihood ratio test (GLRT) for the
model (3) was derived in [1], given as
) # � � � � * � $ + , - � �� * � $ + , . - � � / 01/ 2 3
(4)
where , - � � � * � � 4 � � * � � * is a projection ma-
trix associated with the % ' -dimensional background sub-
space ( � 5 ; , . - is a projection matrix associated with
the ( % ' & % ' � % & )-dimensional target-and-background
subspace ( � � 5

, . - � � � � � 6 � � 7 * 6 � � 7 � 4 � � � � � * 8 (5)
3.2 Linear MSD in the Feature Space and its
Kernel Version
The hyperspectral detection problem based on the target and
background subspaces can be described in the feature space� as
� � 9 � � � � � � : � : � � : � (6)� � 9 � � � � � �
: � : � � : � : � � :
� � : � : � � � :� : � � � : �
where � : and � : represent full-rank matrices whose
column vectors span target and background subspaces (� : 5
and ( � : 5 in � , respectively; � : and � : are un-
known vectors whose entries are coefficients that account
for the abundances of the corresponding column vectors of� :
and � : , respectively; � : represents Gaussian random
noise; and � � : � : � is a concatenated matrix of � :
and � : . Using a similar reasoning as described in the pre-
vious subsection, the GLRT of the hyperspectral detection
problem depicted by the model in (6) is given by
) # � � � � � � � � � � * � , ; 9 + , - 9 � � � � ��
� � * � , ; 9 + , . 9 - 9 � � � � � � (7)
where , ; 9 represents an identity projection operator in � ;, - 9
� : � � *: � : � 4 � � *: � : � *: is a background
pro-
jection matrix; and , . 9 - 9 is a joint target-and-background
projection matrix in �
, . 9 - 9 � � : � : � � 6 � : � : 7 * 6 � : � : 7 � 4 � < (8)
� � : � : � *
� � : � : � � � *: � : � * : � :� *: � : � *: � : � 4
� � � * :� *: � 8

To kernelize (7) we will separately kernelize the numer-
ator and the denominator. First consider its numerator,
� � � � * � , ; 9 + , - 9 � � � � � � � � � * , ; 9 � �
� � + (9)� � � � * � : � *: � � � � 8
Each column of � : and � : can be written in terms of its
corresponding data space [7] as
� : � = > � = > # 8 8 8 = ? @> � � A B C � (10)
� : � = D � = D # 8 8 8 = ? ED � � A F G � (11)
1063-6919/05 $20.00 © 2005 IEEE
where � �� and � � � are the significant eigenvec-
tors of the target and background covariance
metrics � � � and � � � , respectively;
� �
� � � � � � � � � � � � � � � � � � � �
� � and
� �
� � � � � � � � � � � � � � � � ��
� � ; the column vectors of � and � represent
only the significant normalized eigenvectors ( � � , � � ,
. . . , � � � ) and ( � � , � � , . . . , � � ) of the
backgroundcentered kernel matrix ! � � � � � � � = � ! � �
� � " � � � � � � � �� and the target
centered kernel matrix! � � � � � � � = � ! � � � � " � �
� � � � � � � � � � � � � � , respec-
tively. Using (10) the projection of � � � onto # $ becomes#
%$ � � � � � % ! � � � � � � and, similarly, using (11)
the

projection onto & $ is & %$ � � � � � % ! � � � � � �
where! � � � � � � and ! � � � � � � , referred to as the
empirical
kernel maps in the machine learning literature [7], are
column vectors whose entries are
" � ' � � � � for ' � � � � and' � � � � , respectively. Now
we can write
� � � % # $ # %$ � � � � ! � � � � � � % � � % !
� � � � � � � (12)
The projection onto the identity operator � � � % ( ) �
� � �
also needs to be kernelized which is given by
� � � % ( ) � � � � � � � � %
� � * * % %
� � � � � (13)
� ! � � � � � � � % * * % ! � � � � � � � �
where
� � �
� +
� and * is a matrix
whose columns are the eigenvectors ( , � , , � , . . . , , � � )
of the centered kernel matrix ! � � � � � � � � � = � ! � �
�
=
" � � � � � � � � � � � � � � � � + � � with nonzero
eigen-
values, normalized by the square root of their associ-
ated eigenvalues and ! � � � � � � � is a concatenated vector
! � � � � � � % ! � � � � � � % � % . To complete the

kernel-
ization process the denominator of (7) is given by
� � � % ( � � � � � � � � � � � %
& $ # $ � - (14)
. & %$ & $ & % $ # $# %$ & $ # % $ # $ /
0 � . & % $# % $ / � � �
�
! � � � � � � % � ! � � � � � � % � � -
. � % ! � � � � � � � � � % ! � � � � � � � �� % ! �
� � � � � � � � % ! � � � � � � � � /
0 � -
. � % ! � � � � � �� % ! � � � � � � / �
Finally, substituting (12), (14), and (14) into (7) the kernel-
ized GLRT is given by
1 � 2 � � ! � � � � � � � % * * % ! � � � � � � � 3
(15)! � � � � � � % � � % ! � � � � � � � 4
� ! � � � � � � � % * * % ! � � � � � � � 3
! � � � � � � % � ! � � � � � � % � � 5 0 �� -
. � % ! � � � � � �� % ! � � � � � � / � �
where 5 � � . � % ! � � � � � � � � � % ! � � � � � �
� �� % ! � � � � � � � � � % ! � � � � � � � � / �
In the above derivation (15) we assumed that the mapped
input data was centered in the feature space by removing
the sample mean. However, the original data is usually not
centered and the estimated mean in the feature space can not

be explicitly computed, therefore, the kernel matrices have
to be properly centered. The resulting centered 6! is shown
in [7] to be given by
6! � � ! 3 7 � ! 3 ! 7 � 8 7 � ! 7 � � � (16)
where the 9 - 9 matrix � 7 � � � � � : 4 9 . The empirical
kernel maps ! � � � � � � , ! � � � � � � , and ! � � � �
� � � have
also to be centered by removing their corresponding em-
pirical kernel map mean. (e.g. ;! � � � � � � � ! � � � � �
� 3�� < �� = �
" � � � � � � � � � � � � .)
4 OSP and Kernel OSP Algorithms
4.1 Linear spectral mixture model
The OSP algorithm [2] is based on maximizing the SNR
(signal-to-noise ratio) in the subspace orthogonal to the
background subspace and only depends on the noise
second-order statistics. It also does not provide any esti-
mate of the abundance measure for the desired end member
in the mixed pixel. A linear mixture model for pixel � con-
sisting of > spectral bands is described by
� � ? @ 8 A � (17)
where the � > - B � matrix ? represent B endmembers spec-
tra, @ is a � B - : � column vector whose elements are the co-
efficients that account for the proportions (abundances) of
each endmember spectrum contributing to the mixed pixel,
and A is an � > - : � vector representing an additive zero-
mean Gaussian noise with covariance matrix C � D and D is
the � > - > � identity matrix.
Assuming now we want to identify one particular signa-
ture (e.g. a military target) with a given spectral signatureE

and a corresponding abundance measure @ F , we can rep-
resent ? and @ in partition form as ? � � G H E � and
1063-6919/05 $20.00 © 2005 IEEE
� � � �� then model (17) can be rewritten as� � � � �
� � �
� (18)
where the columns of represent the undesired spectral
signatures (background signatures or eigenvectors) and the
column vector � is the abundance measures for the unde-
sired spectral signatures. The OSP operator that maximizes
the signal to noise ratio is given by�
� � � � �
� � � # � (19)
which consists of a background signature rejecter followed
by a matched filter. The output of the OSP classifier is now
given by �
� � � � �
� � � � � �
� � � # � � � (20)
4.2 OSP in feature space and its kernel ver-
sion
The mixture model in the high dimensional feature space �
is given by
� � � � � � � � �
� � (21)
where � � is a matrix whose columns are the endmembers

spectra in the feature space and
� is an additive zero-mean
Gaussian noise with covariance matrix � � � � and � � is the
identity matrix in the feature space. The model (21) can
also be rewritten as
� � � � � � � � � � � � � � �
� � (22)
where � � � � represent the spectral signature of the desired
target in the feature space and the columns of � represent
the undesired background signatures in the feature space.
The output of the OSP classifier in the feature space is
given by�
� � � � � �
� � � � � � � � � �
� � � � � #� � � � � � � (23)
This output (23) is very similar to the numerator of (7). It
can easily be shown that the kernelized version of (23) is
given by� � � � � ! � " # $ � � �
% %
! � " # $ � � � � (24)! � " # � � �
& &
! � " # � � �
where " # � ' ( ) ( � � � � ( * + correspond to , input back-
ground spectral signatures and & � � - ) � - � � � � � � - *
. �
are the , / significant eigenvectors of the centered ker-
nel matrix (Gram matrix) ! � " # � " # � normalized by
the square root of their corresponding eigenvalues [8].
! � " # � � � and ! � " # � � � , are column vectors whose
en-
tries are

0 � ( 1 � � � and 0 � ( 1 � � � for ( 1 2 " # , respectively." #
$ � " # 3 � and % is a matrix whose columns are the , / $
eigenvectors ( 4 ) , 4 � , . . . , 4 * . 5 ) of the centered kernel
ma-
trix ! � " # $ � " # $ � = � ! � 1 6 = 0 � ( 1 � ( 6 � � ( 1 � (
6 2 " # $ with
nonzero eigenvalues, normalized by the square root of their
associated eigenvalues. Also ! � " # $ � � � is the concate-
nated vector 7 ! � " # � � �
! � � � � �
8
and ! � " # $ � � �
is the concatenated vector 7 ! � " # � � �
! � � � � �
8
.
In the above derivation (24) we assumed that the mapped
input data was centered in the feature space. The kernel
matrices and the empirical kernel maps have to be properly
centered as was shown in subsection 3.2
5 Linear SMF and Kernel Spectral
Matched Filter
5.1 Linear Spectral Matched Filter
In this section, we introduce the concept of linear SMF.
The constrained least squares approach is used to derive
the linear SMF. Let the input spectral signal ( be ( �' 9 � : � �
9 � ; � � � � � � 9 � < � +
consisting of < spectral bands. We
can model each spectral observation as a linear combination
of the target spectral signature and noise( � = > �
� (25)
where = is an attenuation constant (target abundance mea-
sure). When = � ? no target is present and when = @ ? tar-

get is present, vector > � ' A � : � � A � ; � � � � � � A �
< � +
contains
the spectral signature of the target and vector
contains the
added background clutter noise.
Let us define B to be a < C , matrix of the , mean-
removed background reference pixels (centered) obtained
from the input image. Let each centered observation spec-
tral pixel to be represented as a column in the sample matrixB B
� ' ( ) ( � � � � ( * + � (26)
We can design a linear matched filter such that the desired
target signal > is passed through while the average filter out-
put energy is minimized. The solution to this minimization
problem was shown in [12] and was called Constrained En-
ergy Minimization (CEM) filter. The output of the linear
matched filter for a test input � , given the estimated covari-
ance matrix is given by
D E � F
� � >
GH I ) �>
GH I ) > (27)
where GH is the estimated covariance matrix. In [4, 5] it
was shown that using the GLRT the same expression for the
linear matched filter (27) can be obtained.
1063-6919/05 $20.00 © 2005 IEEE
5.2 SMF in Feature Space and its Kernel Ver-

sion
Consider the linear model of the input data in a kernel fea-
ture space which is equivalent to a non-linear model in the
input space � � � � � � � � � � � �
� (28)
where � is the non-linear mapping that maps the input data
into a kernel feature space, �
is an attenuation constant
(abundance measure), the high dimensional vector � � � �
contains the spectral signature of the target in the feature
space, and vector
contains the added noise in the feature
space.
Using the constrained least squares approach it can eas-
ily be shown that the output of the desired matched filter for
the input � � � � is given by
� � � � � � � � � � ��
� � � ��
� � � � (29)
where ��
is the estimated covariance of pixels in the fea-
ture space.
We now show how to kernelize the matched filter ex-
pression (29) where the resulting non-linear matched filter
is called the kernel matched filter. The pseudoinverse (in-
verse) of the estimated background covariance matrix can
be written in terms of its eigenvector decomposition as [10]��
�
� �
� � � � � � � �
(30)
where �
= � � � � � � � � � � � � � � � � � � � � is a matrix

whose
columns are the mapped background reference data in the
feature space and � � � � � � � � � ! � are the nonzero
eigenvectors of the centered kernel matrix (Gram matrix)" � �
� � � normalized by the square root of their corre-
sponding eigenvalues.
Inserting Equation (30) into (29) it can be rewritten as
� � � � � � � � � � �
� � � � � � � �
� � � ��
� � � � � � � �
� � � � � (31)
Also using the properties of the Kernel PCA [7], we have
the relationship " � � � � � � � � . We denote " �" � � �
� � � � " � # $ an % & % Gram kernel matrix whose
entries are the dot products ' � � � # � � � � � $ � ( .
Finally, the
kernelized version of SMF is now given by
) � � " � � � � � � " � � " � � � � �" � � � � � � " �
� " � � � � � � " �* " � � " +" �* " � � " * (32)
where the empirical kernel maps " * � " � � � � � and " + �"
� � � � � . As in the previous section the kernel matrix "
as well as the empirical kernel maps need to be properly
centered.
6 Adaptive Subspace Detector and
Kernel Adaptive Subspace Detec-
tor
6.1 Linear ASD
In this section, the GLRT under the two competing hypothe-
ses ( , - and , � ) for a certain mixture model is described.
The subpixel detection model for a measurement � (a pixel
vector) is expressed as, - . � � � Target absent (33), � . �
� / 0 � 1 � Target present

where / represents an orthogonal matrix whose column
vectors are the eigenvectors that span the target subspace' / ( ; 0
is an unknown vector whose entries are coeffi-
cients that account for the abundances of the corresponding
column vectors of / ; represents Gaussian random noise
distributed as 2 � 3 � � � .
In the model, � is assumed to be a background noise un-
der , - and a linear combination of a target subspace signal
and a scaled background noise, distributed as 2 � / 0 � 1 � �
� ,
under , � . The background noise under the two hypothe-
ses is represented by the same covariance but different vari-
ances because of the existence of subpixel targets under , � .
The GLRT for the subpixel problem as described in [5] (so
called ASD) is given by4 5 6 7 � � � � � � �� / � / �
�� / � � � / � �� 8 9:8 ; <
5 6 7 �
(34)
where �� is the MLE (maximum likelihood estimate) of the
covariance � and <
5 6 7 represents a threshold. Expression
(34) has a constant false alarm rate (CFAR) property and
is also referred to as the adaptive cosine estimator because
(34) measures the angle between =� and ' =/ ( where =� ��
� � > � � and =/ � �� > � / .
6.2 ASD in the Feature Space and its Kernel
Version
We define a new subpixel model by assuming that the input
data has been implicitly mapped by a nonlinear function �
into a high dimensional feature space ? . The model in ? is
then given by, - � . � � � � �

� Target absent (35), � � . � � � � � /
0
� 1
� Target present
where /
represents a full-rank matrix whose @ � col-
umn vectors are the eigenvectors that span target subspace
1063-6919/05 $20.00 © 2005 IEEE
� � � � in � ; � � is unknown vectors whose entries are
coefficients that account for the abundances of the corre-
sponding column vectors of � � ; � � represents Gaussian
random noise distributed by � �
� � � ; and
� is the noise
variance under � � � . The GLRT for the model (35) in � is
now given by�
� � � � � � � � � � � � ��
��
��
(36)
where �� is the MLE of � � .
The kernelized expression of (36) is given by� � �
� � � � � � (37)� � � � � � �
! � � � �

� � � � �
! � � " � � � ��# � �
� � � �
� � � # � �
�
where � � � # � �
� � � �
� � � � �
! � �
background spectral signatures is denoted by � � � � � $ % %
% � & " , target spectral signa-
tures are denoted by ! � � ' � ' $ % % % ' ( " and� � � ) � ) $
% % % ) ( * "
+ � � +
is a matrix consistingof the + � eigenvectors of the kernel
matrix � � !
! � . As
in the previous section, all the kernel matrices as well as
the empirical kernel maps need to be properly centered [7].
7 Experimental Results
In this section, the kernel-based matched signal detectors,
such as the kernel MSD (KMSD), kernel ASD (KASD),
kernel OSP (KOSP) and kernel SMF (KSMF) as well as
the corresponding conventional detectors are implemented
based on two different types of data sets – illustrative
toy data sets and a real hyperspectral image that contains
military targets. The Gaussian RBF kernel,
, � �
' � �
exp � � - � � . - /0 �
was used to implement the kernel-based de-
tectors. � represents the width of the Gaussian distribution
and the value of c was chosen such that the overall data vari-

ations can be fully exploited by the Gaussian RBF function.
In this paper, the values of � were determined experimen-
tally.
A. Illustrative Toy Examples
Figs 1 and 2 show contour and surface plots of the con-
ventional detectors and the kernel-based detectors, on two
different types of two-dimensional toy data sets: a Gaus-
sian mixture in Fig. 1 and nonlinearly mapped data in Fig.
2. In the contour and surface plots, data points for the
desired target were represented by the star-shaped symbol
and the background points were represented by the circles.
In Fig. 2 the two-dimensional data points � � � 1
2 � for
each class were obtained by nonlinearly mapping the orig-
inal Gaussian mixture data points � 3 � � 1 4
2 4 � in Fig. 1.
All the data points in Fig. 2 were nonlinearly mapped by� � �
1
2 � � � 1 4
1 $4 5 2 4 � . In the new data set the second
component of each data point is nonlinearly related to its
first component.
For both data sets, the contours generated by the kernel-
based detectors are highly nonlinear and naturally following
the dispersion of the data and thus successfully separating
the two classes, as opposed to the linear contours obtained
by the conventional detectors. Therefore, the kernel-based
detectors clearly provided significantly improved discrimi-
nation over the conventional detectors for both the Gaussian
mixture and nonlinearly mapped data. Among the kernel-
based detectors, KMSD and KASD outperform KOSP and
KSMF mainly because targets in KMSD and KASD are bet-
ter represented by the associated target subspace than by a

single spectral signature used in KOSP and KSMF. Note
that the contour plots for MSD (Fig. 1(a) and Fig. 2 (a))
represent only the numerator of Eq. 4 because the denomi-
nator becomes unstable for the two-dimensional cases: i.e.,
for the two-dimensional data � 6 7 8 9 : � becomes zero.
B. Hyperspectral Images
In this section, a HYDICE (HYperspectral Digital Imagery
Collection Experiment) image from the Desert Radiance II
data collection (DR-II) was used to compare detection per-
formance between the kernel-based and conventional meth-
ods. The HYDICE imaging sensor generates 210 bands
across the whole spectral range (0.4 – 2.5 � � ) which in-
cludes the visible and short-wave infrared (SWIR) bands.
But we only use 150 bands by discarding water absorp-
tion and low signal to noise ratio (SNR) bands; the spectral
bands used are the 23rd–101st, 109th–136th, and 152nd–
194th for the HYDICE images. The DR-II image includes 6
military targets along the road, as shown in the sample band
images in Fig. 3. The detection performance of the DR-
II image was provided in both the qualitative and quantita-
tive – the receiver operating characteristics (ROC) curves –
forms. The spectral signatures of the desired target and un-
desired background signatures were directly collected from
the given hyperspectral data to implement both the kernel-
based and conventional detectors.
Figs. 4-5 show the detection results including the ROC
curves generated by applying the kernel-based and conven-
tional detectors to the DR-II image. In general, the detected
targets by the kernel-based detectors are much more evi-
dent than the ones detected by the conventional detectors,
as shown in Fig. 4. Fig. 5 shows the ROC curve plots
for the kernel-based and conventional detectors; the kernel-
based detectors clearly outperformed the conventional de-

1063-6919/05 $20.00 © 2005 IEEE
tectors. In particular, KMSD performed the best of all the
kernel-based detectors detecting all the targets and signif-
icantly suppressing the background. The performance su-
periority of KMSD is mainly attributed to the utilization of
both the target and background kernel subspaces represent-
ing the target and background signals in the feature space,
respectively.
8 Conclusions
In this paper, nonlinear versions of several matched signal
detectors, such as KMSD, KOSP, KSMF and KASD have
been implemented using the kernel-based learning theory.
Performance comparison between the matched signal de-
tectors and their corresponding nonlinear versions was con-
ducted based on two-dimensional toy-examples as well as a
real hyperspectral image. It is shown that the kernel-based
nonlinear versions of these detectors outperform the linear
versions. If enough target spectral samples are available to
build a target subspace the kernel matched subspace detec-
tors (KMSD and KASD) generally provide improved detec-
tion performance over KOSP and KSMF. If the target sub-
space cannot be properly estimated because only a small
number of target samples are available KOSP or KSMF can
be used instead which uses a single spectral signature as a
reference to a target of interest.
References
[l] L. L. Scharf and B. Friedlander, “Matched subspace detec-

tors,” IEEE Trans. Signal Process., vol. 42, no. 8, pp. 2146-
2157, Aug. 1994.
[2] J . C. Harsanyi and C.4. Chang, “Hyperspectral image clas-
sification and dimensionality reduction: An orthogonal sub-
space projection approach,” IEEE Trans. Geosci. Remote
Sensing, vol. 32, no. 4, pp. 779-785, July 1994.
[3] D. Manolakis, G. Shaw, and N. Keshava, “Comparative
anal-
ysis of hyperspectral adaptive matched filter detector,” in
Proc. SPIE, April 2000, vol. 4049, pp. 2-17.
[4] F. C. Robey, D. R. Fuhrmann, and E. J. Kelly, “A CFAR
adaptive matched filter detector,” IEEE Trans. on Aerospace
and Elect. Syst., vol. 28, no. 1, pp. 208-216, Jan. 1992.
[5] S Kraut and L. L. Scharf, ‘The CFAR adaptive subspace
detector is a scale-invariant GLRT,” IEEE Trans. Signal Pro-
cess., vol. 47, no. 9, pp. 2538-2541, Sep. 1999.
[6] S Kraut, L. L. Scharf, and T McWhorter, “Adaptive
subspace
detectors,” IEEE Trans. Signal Process., vol. 49, no. 1, pp.
[7] B Schokolpf and A. J. Smola, Learning with Kernels, The
MIT Press, 2002.
[8] B Schokolpf, A. J. Smola, and K.-R. Muller, “Kernel princi-
pal component analysis,” Neural Computation, vol. 10, pp.
129%1319, 1998.
1-16, Jan. 2001.
[9] G. Baudat and F Anouar, “Generalized discriminant analysis
using a kernel approach,” Neural Computation, vol. 12, no.

[lo] A. Ruiz and E. Lopez-de Teruel, “Nonlinear kernel-based
statistical patten analysis,” ZEEE Trans. on Neural Networks,
vol. 12, pp. 16-32, 2001.
[ll] H. Kwon and N. M. Nasrabadi, “Kernel RX-algorithm :
A nonlinear anomaly detector for hyperspectral imagery,”
IEEE Trans. Geosci. Remote Sensing, vol. 43, no. 2, pp. 388-
391, Feb. 2005.
Detection and Class&cation of Sub-
pixel Spectral Signatures in Hyperspectral Image Sequences,
Ph.D. dissertation, Dept. Elect. Eng., Univ. of Maryland,
Baltimore County, 1993.
12, pp. 2385-2404,2000.
[12] J. C. Harsanyi,
(c) ASD (d) KASD
I
(e) OSP (f) KOSP
7
(€9 SMF (h) KSMF
Figure 1: Contour and surface plots of the conventional
matched signal detectors and their corresponding kernel
versions on a toy dataset (a mixture of Gaussian).
1063-6919/05 $20.00 © 2005 IEEE

’ ”
(a) MSD
(c) ASD
I
(e) OSP
(b) KMSD
(d) KASD
(f) KOSP
(a)MSD
I
(b) KMSD
(c) ASD (d) KASD
(e) OSP (f) KOSP
(h) KSMF
Figure 4: Detection results for the DR-I1 image using the
conventional detectors and the corresponding kernel ver-
sions.
(g>SMF (h) KSMF
Figure 2 : Contour and surface plots of the conventional

matched signal detectors and their corresponding kernel
versions on a toy dataset: in this toy example, the Gaus-
sian mixture data shown in Fig. 1 was modified to generate
nonlinearly mixed data.
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
0.1
False alarm rale
Figure 3: A sample band image from the DR-I1 data
Figure 5: ROC curves obtained by conventional detectors
and the corresponding kernel versions for the DR-I1 image.
1063-6919/05 $20.00 © 2005 IEEE

Kernel spectral matched filter improves hyperspectral target detection

Recommended

Recommended

More Related Content

Similar to Kernel spectral matched filter improves hyperspectral target detection

Similar to Kernel spectral matched filter improves hyperspectral target detection (20)

More from vrickens

More from vrickens (20)

Recently uploaded

Recently uploaded (20)

Kernel spectral matched filter improves hyperspectral target detection