Paper 5094-43_5

Performance assessment of frequency plane filters applied to track
association and sensor registration
Clay J. Stanek a
, Bahram Javidi b
, and P. Yanni a
a
ANZUS, 9747 Businesspark Ave, San Diego CA 92131
b
Department of E. E., University of Connecticut, 206 Glenbrook Road, Storrs CT 06269-2157
ABSTRACT
The current generation of correlation systems attempting to provide a Single Integrated Picture (SIP) have concentrated
on improving quality from the situational awareness (SA) and tracking perspective with limited success, while having
not addressed the combat identification (CID) issue at all. Furthermore, decision time has lengthened, not decreased,
[1] as more and more sensor data are made available to the commanders --much of which is video in origin. Many
efforts are underway to build a network of sensors including the Army’s Future Combat System (FCS), Air Force
Multi-mission Command and Control Aircraft (MC2A), Network-Centric Collaborative Targeting (NCCT), and the
follow-on to the Navy’s Cooperative Engagement Capability (CEC). Each of these (and still other) programs has the
potential to increase precision of the targeting data with successful correlation algorithms while eliminating dual track
reports, but almost none have combined or will combine disparate sensor data into a cohesive target with a high
confidence of identification. In this paper, we address an architecture that solves the track correlation problem using
frequency plane pattern recognition techniques that also can provide CID capability. Also, we discuss statistical
considerations and performance issues.
Keywords: track correlation, sensor registration, phase invariance, generalized noise, fusion
INTRODUCTION
Correlation engines have been evolving since the implementation of radar. Here, correlation is a process typically referred
to in the literature as track association/correlation. Association is often taken to mean a preliminary list of tracks for
comparison (hypotheses) and correlation is reflects the track association hypotheses that pass a statistical test. Track
correlation and sensor registration are required to produce common, continuous, and unambiguous tracks of all objects in
the surveillance area [2]. The objective is to provide a unified picture of the theatre or area of interest to battlefield
decision makers. This unified picture has many names, but is most commonly referred to as a Single Integrated Picture
(SIP). A related process, known as sensor registration or gridlock filtering (gridlocking), refers to the reduction in
navigation errors and sensor misalignment errors so that one sensor’s track data can be accurately transformed into another
sensor’s coordinate system. As platforms gain multiple sensors, the correlation and gridlocking of tracks become
significantly more difficult. In this paper, we concentrate on correlation algorithms and follow work introduced in
previous papers [3, 4].
From a technical viewpoint, legacy track correlation systems rely on a few key mathematical concepts born out of research
in the early 60s through early 80s. These are: statistical hypothesis testing applied to the track association/correlation
problem, an assignment algorithm to further eliminate ambiguities, and the Kalman filter for positional estimation of the
targets. Several limitations arise. Digital computer performance lead researchers to develop methods of simplifying the
association/correlation and estimation process either through model assumptions, data reduction, or both. In application,
modeling assumptions tend to be a convenient exception rather than the rule, adversely affecting the association/correlation
and Kalman filtering process in many real-world situations. For example, the Kalman filter assumes all data is
independent of prior data and no cross-correlation of errors with other system data occurs; this can yield poor tracking
performance if left unaccounted [5].
A related assumption is used for hypothesis testing as information is accumulated over time in the
association/correlation process: the Naïve Bayesian approximation is employed across parameters and multiple samples

in time. Well known as the “curse of dimensionality”, when the observed feature vector has a high dimension, finding
the marginal amd joint densities is very difficult.. Thus, modeling assumptions and data reduction in tandem can
significantly reduce the processing burden, but can severely impact performance in terms of the quality of the delivered
SIP [6]. In summary, statistical hypothesis testing assigns a probability conditioned on the priors and is only consistent
with what we tell it: Did we enumerate the possible states of nature, discrete or continuous? Are the assigned priors
consistent with our state of knowledge? Is there any additional evidence we can digest via Bayes Rule? Have we
described the sampling probabilities (conditional density functions) honestly given our state of knowledge? Problems
in any and all of these steps all can lead to a dubious posterior probability calculation. If this weren’t discouraging
enough, legacy systems are extremely sensitive to time-alignment and sensor gridlock, which ultimately inject
additional sources of systematic error and random noise into the process.
Furthermore, to combine classification data with tracking data while also performing track correlation, a better approach
using modern technology is required. The approach presented here avoids these limitations by using pattern recognition
techniques that can be implemented on a digital, FPGA, or optical architecture. This architecture provides a native
ability to accept classification data along with standard track information to provide track correlation with CID. The
pattern recognition algorithms are based on concepts developed for real-time facial and target recognition systems. This
can all but eliminate the need to throw away information in the process.
TRACK CORRELATION & FUSION FORMULATION
Consider the problem of determining if a track reported by one sensor or link, ix , and a second track, jx , correspond to
the same target. In terms of a hypothesis test, we have two cases of interest and describe the binary case in this way: Let
0ω = the event that track ix and track jx correspond to the same target
1ω = the event that track ix and track jx do not correspond to the same target.
The typical track state contains parameters such as position, velocity, and course and are often reported with respect to a
wide variety of reference frames. The most common of these is with respect to the WGS-84 datum, which defines an
ellipsoidal reference surface where the origin is taken as the center of the Earth. Sensors can use a spherical coordinate
system in their internal processing. In the context of data links for track reports, it is often the case that a local Cartesian
frame is used with the origin being a surveyed ground point (improves track accuracy), or a more arbitrarily chosen
reference. Through simple transformations, we can take the WGS-84 coordinate and transform it into a Cartesian
coordinate relative to a tangent plane of specific orientation. Such transformations occur between tracks reported on Link-
16 and those on Cooperative Engagement Capability (CEC) and vice-versa. The xy plane is taken as tangent to the Earth
surface at the origin of the grid, with the y axis oriented in the direction of North, and z axis normal to the plane. Let us
take the position of the track in ellipsoidal coordinates where λ is the azimuth, φ is the reduced latitude, h is the altitude
(referenced as a geometric height above the reference ellipsoid), and often speed and heading information is also available
(it is vector in nature and can be resolved into component velocities). Thus, it is perfectly valid to refer to the description
of a track sample as [ ]φ λ=x h v in ellipsoidal coordinates or [ ]=x x y z v in Cartesian coordinates where x
is a vector representing the track state. The state of the target at a particular t can be denoted as [ ]φ λ=t t
x h v
or [ ]=t t
x x y z v . By accumulating several track state samples, we can construct a matrix where each row represents
a field and each column a specific sample time; this track state matrix is referred to as iX and represents all the
information of the track reported over all time as shown in Figure 1.

Figure 1 (left) 52 tracks represented by line plots over time; (right) Track image generated from one case. See
approximately 29.5N, 86.2W in the image for red, circular track.
HYPOTHESIS TESTING
We will outline in some detail the standard hypothesis testing approach to track correlation and describe its foundation in
Bayesian theory. The Bayesian decision rule is simple: choose the class that maximizes the posteriori probability. For a
two class problem, the decision rule is written as
( ) ( )0 1| |ω ω<>q x q x (1.1)
where ( )|ωiq x is the posteriori probability for class ωi . This says if the probability of class 0ω given x is greater
than the probability of 1ω , choose 0ω and vice-versa. x is a vector of attributes, which includes everything used in the
decision process: measurements such as target position, velocity, radar cross-section, and ESM parameters as well as
taxonomic clues such as IFF modes, PPLI (Precise Position Location Indicator on Link-16), and others such that
ν =  t x y z t
x x y z v v v m T … and so on.
The posteriori probability is one of the most misunderstood concepts in scientific inference. E.T. Jaynes offers perhaps
what the author feels is the best description of probability, which is often simply interpreted as the actual frequency with
which we would observe events to be true in a real experiment. He states,
“In pedantic, but scholastically correct terminology, a probability p is an abstract concept, a quantity
that we assign theoretically, for the purpose of representing a state of knowledge, or that we calculate
from previously assigned probabilities using the rules of inference. A frequency, f , is, in situations
where it makes sense to speak of repetitions, a factual property of the real world, that we measure or
estimate. Instead of committing the error of saying that the probability is the frequency, we ought to
calculate the probability ( )p f df that the frequency lies in various intervals df .” [7]
Using Bayes rule, ( )
( ) ( )
( )
|
|
ω ω
ω = i i
i
p x p
q x
p x
, with prior class probabilities ( )ωip , conditional class densities,
( )|ωip x , and writing in ratio form:
( )
( )
( )
( )
( )
( ) ( )
( )
( )
0 0 0
0 0 0
1 1 1
|
| ,
|
ω ω ω
ω ω ω
ω ω ω
≡ = =
q p x p
O x O O
q p x p
(1.2)

In (1.2), we have used the Jaynes [8] description by referring to ( )0 |ωO x as the “odds on hypothesis 0ω ”. Thus, the
posteriori odds are equal to the prior odds ( )0ωO multiplied by a dimensionless factor called the likelihood ratio. Using
the Fukanaga representation [9], define the likelihood ratio ( )l x [10] and decision rule as
( )
( )
( )
( )
( )
0 1
1 0
|
,
|
ω ω
τ τ
ω ω
= > =
<
p x p
l x
p x p
(1.3)
Either (1.2) or (1.3) is an equivalent formulation using Bayes rule. By using base 10 and putting a factor of 10 in front,
we can measure evidence, ( ) ( )( )0 0| 10log |ω ω=e x O x , in decibels (db) in (1.2) rather than odds. Clearly, the
evidence equals the prior evidence plus the number of db of evidence provided by calculating the likelihood ratio.
While there is nothing in this calculation per se that tells us where to make the decision 0ω is accepted, there are several
techniques for measuring false alarm as a function of the threshold, which help us guide the decision portion of the
calculation. The use of a loss function helps us rescale what the Bayesian calculation provides into something which
becomes an actionable calculation.
THE NAÏVE BAYESIAN APPROXIMATION
Suppose that the new information consists of several pieces or propositions such that 1 2, , ,= nx x x x… . Then we could
expand (1.2) by using the product rule for probabilities
( )
( )
( )
( )
( )
( )
( )
( )1 2 0 2 0 0
0 0
1 2 1 2 1 1
| , , | , , |
| 10log 10log 10log
| , , | , , |
ω ω ω
ω ω
ω ω ω
= + + +n n n
n n n
p x x x p x x p x
e x e
p x x x p x x p x
… …
… …
(1.4)
If we suppose that each ix is independent of the outcome of the other jx given the class, then we have the “Naïve
Bayesian approximation”.
( )
( )
( )
( )0
0 0
1 1
|
| 10log
|
ω
ω ω
ω=
= +∑
n
i
i i
p x
e x e
p x
(1.5)
Notice that each piece of information contributes additively to the evidence and makes for a clear way to introduce
newly learned information into the inference process. The Bayesian normally draws distinction between logical and
causal independence. Typically this is taken to mean an honest representation of the state of knowledge about the
situation, not merely known or measured cause and effect. Thus, perception is an important part of the formulation to
the Bayesian. Two events may be causally dependent in the sense one influences the other, but to a person attempting
to infer from available data that has not yet discovered this, the probabilities representing ones state of knowledge might
be independent. There are other advantages to (1.5) that many have written about [11]. Probably the most important of
these is the rate of learning from a performance standpoint: given n training examples over l attributes, the time
required to learn a boosted naive Bayesian classifier is ( )Ο ⋅n l , i.e. linear.
We will comment on the non-extensibility of (1.5) to the multi-hypothesis case. In the CID problem, our hypothesis is
the platform (an F-15 friendly, MIG-21 hostile, a SCUD, etc.) and intent, and thus the naïve Bayesian approximation is
really saying that everything is conditioned on the target type alone, e.g. a model-based approach. We could interpret
this from the following scenario. Imagine characteristics for a platform such as radar information (pulse width,
frequency range, pulse repetition frequency, waveform), engine type, length, radar cross-section, positional and
kinematic information. Our simplifying assumptions would mean that given the platform was an F15, the radar is
independent of the engine is independent of the length, is independent of the speed, and so on. The main advantage in
this type of formulation is the avoidance of joint density estimation where the features are correlated. Think of this in
only three dimensions. Rather than needing to estimate from data covering the whole volume in three dimensions, one
can intersect that volume with three orthogonal planes representing a feature per plane.

However, when the set of hypotheses is [ ]0, , 2ω ∈ ≥i n n , then (1.5) will lead to trivial conclusions because 0ω
contains many alternative hypotheses, not just 1ω . Thus, even if our assumption of each ix independent of the other jx
honestly represents our state of knowledge, application of (1.5) is a serious error. Instead we must express (1.2) as
( )
( )
( )
( ) ( )
( ) ( )
( )
1 2
11 2 0
0 0 1 2 0
1 2 0
1
, , |
, , |
| , , , |
, , |
ω ω
ω
ω ω ω
ω
ω
=
=
= =
∑
∑
n
n j j
jn
n n
n
j
j
p x x x p
p x x x
O x O p x x x
p x x x
p
…
…
…
…
(1.6)
so that the denominator of the likelihood ratio is a weighted sum and is the average probability for a feature of all
alternatives. One could of course take two hypotheses at a time and use the simpler form, but then would have to rank
them. The Quicksort algorithm accomplishes this in ( )logΟ n n operations for a full ranking procedure.
SINGLE AND SEQUENTIAL HYPOTHESIS TESTING
Classification based on kinematic data is compounded by the difficulty in describing the class distributions for the case
of uncorrelated track pairs. One well-known alternative is to perform a single hypothesis test, the most common of
which is the distance test. The distance from the feature vector to the class mean is computed (weighted by the
covariance matrix) and compared to a threshold to determine if the feature vector supports the hypothesis. If the
original 0ω class distribution is Gaussian, then the distance classification represents the evaluation of ( )( )0log |ωp x to
within a constant factor, which is referred to as the Mahalanobis distance:
( ) ( ) ( )2 1 2
1
,µ µ µ−
=
= − Σ − = = = −∑
nT
T T
i
i
d x x z z z z A x (1.7)
and thus is half of the calculation for a binary hypothesis test. Its relation to quadratic classifiers will not be further
discussed here. Note that z is a whitened feature vector of zero mean, 0µ = , and unit covariance, Σ = I . From the
Bayes standpoint, if all we know about our data set is the first two moments of the distribution, then (1.7) is the most
honest representation of our state of knowledge according to the Principle of Maximum Entropy. Taking it one step
further, the most widespread use of the Gaussian sampling distribution is not that the error frequencies are known or
believed to be Gaussian, but rather because they are unknown. However, the issue comes from the unknown
distribution of the second class and the use of the threshold in lieu of a complete binary hypothesis test.
Fukunaga [12] has outlined the issues with this approach that warrant mention here. He shows that the expectation and
variance of the distance are given by { } { } { }2 2 4
1 1| , | , 1ω ω γ γ= = = −iE d n Var d n E z when the 'iz s are independent.
For normal distributions, 2γ = and the density function for 2
d is
( ) ( )
( )
2
2 2
/ 2
1
2 / 2
n
n
p d u
n
ζ ζ
−
=
Γ
(1.8)
which is the gamma density function.
For the location of the second class, we can safely assume that under a whitening transformation, the mean vector is
non-zero and the covariance matrix non-white. Thus, we take { }2
ˆ|ω µ=E x and ( ){ }2
2
ˆ |µ ω− = ΛE x and can easily
calculate that
{ } { }2 2 2 2
2 2
1 1 1 1
ˆ ˆ| , | 2 4ω λ µ µ ω λ λ µ
= = = =
= + = +∑ ∑ ∑ ∑
n n n n
T
i i i i
i i i i
E d Var d (1.9)
where λi is an eigenvalue of Λ . The result of this type of analysis is that the Bayes error in the one-dimensional 2
d
space is considerably higher than that in the n -dimensional space of x when n is moderate to large in number. The
mapping from the original space to the one-dimensional space destroys classification information which existed in the
original feature space.

SEQUENTIAL HYPOTHESIS TESTING
Previously, we have discussed a general hypothesis testing scheme for situations such as
[ ] [ ]1 2, ,φ λ= =t nt
x h v x x x… and noted that the naïve Bayesian approximation refers to the independence
between features, which here are distinct parameters. One distinguishing characteristic of tracks is their updating with
time and thus we often have many samples available from which to draw an inference. A basic approach is the
averaging of observation vectors; this can theoretically achieve zero error when the hypotheses have different
expectations.
Let 1 2 3 4, , , , , nx x x x x… be a series of observed feature vectors over time. If these are assumed to be independent and
identically distributed, then (1.5) applies directly. Thus, we examine the distribution of the evidence function and seek
to accept or reject our hypothesis 0ω depending on the sign of ( )0 |ωe x . If the log-likelihood ratio and its expectation
are defined as
( )
( )
( )
( ){ } ( ){ }0 2
1
|
10log , | , |
|
ω
ω η ω σ
ω
 
= = =  
 
i
i i j j i j j
i
p x
h x E h x Var h x
p x
(1.10)
then ( ){ } ( ){ } 2
| , |ω η ω σ= =j j j jE e x n Var e x n . This demonstrates a well-known scaling with sample number in the
sequential testing case: as the number of samples scales with n , the expectation of the evidence increases n faster
than the variance of the evidence. This implies that the evidence density functions for the two classes become more
separable and more normal as n increases. If the expectation of the log-likelihood function for each class is identical,
then a linear classifier can not improve our separability in the sequential test. However, if the two classes have different
covariance matrices for ( )|ωi jh x , then a quadratic classifier will provide increased separability with sample number.
When there is correlation among the time series, the error mitigation is significantly affected since a cross correlation
term arises in the computation of the variance of the evidence.
( ){ } ( ){ } ( )( ) ( )( ){ }1 1 1
| | | | |
n n n
j i j i j j k j j j
i k i
Var e x Var h x E h x h xω ω ω η ω η ω
= = =
= + − −∑ ∑∑ (1.11)
In the limit of perfect linear correlation across the samples ( )( ) ( )( ){ }| | | 1ω η ω η ω− − →i j j k j j jE h x h x , then variance
of the evidence function and the expectation both change in proportion to the number of samples, making it tantamount
to a single hypothesis test. In Figure 2, we provide an example of a non-zero second term in (1.11) for several
components of a feature vector normally constructed in testing hypothesis 0ω . We outline examples of the
autocorrelation over multiple samples and demonstrate that it can be significant. For each autocorrelation sequence,
there are 3 track pair examples. For example, we provide autocorrelation sequences for a difference in latitude for three
track pairs in the top-left plot. There are several possible explanations for this. The two most prominent are 1) the
interpolation scheme used to provide time alignment of the data, and 2) the cross-correlation of the errors in the track
states as reported from two different links or sensors.
Nevertheless, the sequential hypothesis test receives continual treatment in the literature. For example, modern papers
last year treated the topic of the Mean-Field Bayesian Data Reduction Algorithm (BDRA) for adaptive sequential
classification utilizing Page’s test. This method has application in detecting a permanent change in the distribution or to
classify as quickly as possible with an acceptable Mean Time Between False Alarms (MTBF) [13]. We would view
(1.5) as the accumulation of evidence over time (samples) and denote this as
( )( ) ( )
( )
( )1
| ,
, 10log ,
| ,=
= =∑
t
k
t
n l
f y x H
S e y n e y
f y x H
(1.12)
then the decision rule becomes min
<
− <>t m
m t
S S h with the threshold h set by the false alert rate desired.

Figure 2 Correlation of feature vectors impacts of independent, identically distributed assumption of samples in
time
CONSTRUCTING THE FEATURE VECTOR FROM KINEMATIC INFORMATION
We outline one model-based approach proposed in the early development of this problem [14] to hypothesis testing here
to enlighten the reader on many of the issues with classification in the track correlation problem. Let us suppose the
most recent track report on a target from one sensor provides a position of ( ),=i i iP x y and a second sensor provides a
track report of ( ),=j j jP x y . One sensible feature vector is the difference in positions reported by the two sensors
( ) ( )
( ) ( )
, , or
, ,φ φ λ λ φ λ
= − − ≡ ∆ ∆
= − − ≡ ∆ ∆
i j i j
i j i j
x x x y y x y
x
(1.13)
for either a Cartesian or elliptical coordinate system description. Let us take the Cartesian formulation explicitly for
further evaluation. As a model of the conditional class density function, we take the distribution of x to be normal, or
( ) ( )0| 0,ω = Σp x N in the case two tracks are the same target. Recalling our previous definitions, the system is in state
0ω when the two tracks correspond to the same target and in state 1ω when the two tracks correspond to different
targets. One simple model is that the measurement differences will be uniform when in state 1ω and that this
assumption holds over a window size on the order of the average target separation, which goes like the track density. If
the window size is denoted by having length scale ξ , then we take ( ) ( )( )1| U 0, ,ω α= ∈p x D x D where ( )α D is the
area in domain 2
ξ∝D depending on the exact shape of the window. It further implies that ( )1| 0ω =p x when outside
of the area D .
Applying the Bayesian classifier in (1.4), we can express this simplified association problem with two-element feature
vector ( ),= ∆ ∆x x y as:
( )
2
21
22
10 log
1
ξ
πρ σ σ ρ
∆ ∆
∆ ∆ ∆ ∆ ∆ ∆
  
  = −
   −  
x y
x y x y x y
l d x (1.14)
with ( ) ( ) ( )2 1
0 0µ µ−
∆ ∆
 = − Σ −
  
T
x yd x x x as given by (1.7), the covariance matrix and correlation coefficient
2
2
σ ρ σ σ
ρ σ σ σ
∆ ∆ ∆ ∆ ∆
∆ ∆
∆ ∆ ∆ ∆ ∆
 
Σ =  
  
x x y x y
x y
x y x y y
, and ρ∆ ∆x y . We can further extend the feature vector with additional information.
If we assume Gaussian distributions for speed and heading differences for both the correlated and uncorrelated case,
then the likelihood ratio is written as

( ) ( ){ }
( ) ( ){ }
1 1
1 1
11
0 02
2 11
1 12
exp
10log
1 exp
µ µσ σ
σ σ ρ µ µ
−
∆ ∆
∆ ∆
−
∆ ∆ ∆ ∆
  − − Σ −  
=   
−  − − Σ −    
T
s c
s cs c
T T
s c s c s c
x x
l
x x
(1.15)
with ( ),= ∆ ∆x s c , covariance matrix ( )
2
2
,
σ ρ
ρ σ
∆ ∆ ∆
∆ ∆
∆ ∆ ∆
 
Σ = Σ ∆ ∆ =  
 
s s c
s c
s c c
s c , and correlation coefficient ρ∆ ∆s c . Also, note
that ( ){ } ( ){ }0 0 1 1, | , , |ω µ ω µ= ∆ ∆ = = ∆ ∆ =E x s c E x s c . If the class means are equal, then all the separability will fall
to covariance differences and would dominate the Bhattacharyya distance. Finally for altitude or tangent plane height,
we take
( ) ( ){ }
( ) ( ){ }
1
1
1
112 2
0 02
11
1 12
exp
10log
exp
µ µσ σ
σ µ µ
−
∆
∆∆
−
+∆
  − − Σ −+  
=   
  − − Σ −    
T
z
z zz
T T
z
z z
x x
l
x x
(1.16)
with ( ){ } ( ){ } 2
0 0 1 1| , | ,ω µ ω µ σ∆ ∆= ∆ = = ∆ = Σ =z zE x z E x z .
In this model, we have really expressed likelihood ratios for separate components of the complete kinematic feature vector.
So far, the key assumption is that of logical independence of the likelihood ratios for the horizontal position, the
speed/heading, and the altitude. We have allowed for correlation between the horizontal position information, as well as
correlation between the course and speed. The independence of position and velocity is not necessarily a good one, but is a
simplifying assumption, while the independence of altitude is usually a good one except at short range. Also, we have not
demonstrated that the probability density functions themselves are well-modeled as Gaussian for both classes. Normality
tests are usually accomplished with conventional Chi-squares tests, Beta distribution test on the Mahalanobis distance with
estimated mean and covariance from the training data (also called the Kolmogorov-Smirnov test).
Using the 5-dimensional feature vector
 = ∆ ∆ ∆ ∆ ∆ t ij ij ij ij ij t
x x y z s c (1.17),
from our formula in (1.5), the instantaneous evidence in favor of hypothesis 0ω is
( ) ( )0 0|ω ω∆ ∆ ∆ ∆ ∆
= + + +x y s c z
e x l l l e (1.18)
Furthermore, in the sequential case under the model assumptions, we envision the accumulation of evidence according
to
( ) ( )0 0
1
|ω ω∆ ∆ ∆ ∆ ∆
=
= + + +∑
n
x y s c z
t t t
t
e x l l l e (1.19)
We mention that the assertion in (1.19), while convenient, is subject to the reality presented by (1.11), and Figure 2
implies that some basic assumptions (i.i.d.) begin to break down when compared to real data.
While this model was developed in a Cartesian reference frame, we should take notice of another difficulty that often
arises in hypothesis testing. If we were to now formulate the above line of reasoning with an elliptical coordinate frame
such as WGS-84, we would have to transform our distributions that reflect our knowledge of the feature vector. The
approach is as follows. Since the same event ( 0ω ) has two simultaneous expressions (as a probability in terms of
 = ∆ ∆ ∆ ∆ ∆ t ij ij ij ij ij t
x x y z s c or φ λ = ∆ ∆ ∆ ∆ ∆ t ij ij ij ij ij t
x h s c ), the volume in probability density space is
conserved. To eliminate confusion, let us momentarily refer to the latter feature vector as ty , the actual probability
should be independent of our method for describing it. For instance, say we have a description for the joint probability
density in several variables, but now we want it in terms of other variables. Such as:
( ) ( ) ( ) ( )known, want with ,= =p x q y x g y y f x (1.20)
By the argument above, ( ) ( )=p x dx q y dy and

( ) ( )( )= ∑ i
i
q y p f x J (1.21)
where the sum is over all ( )ix leading to the outcome ( )q y . Note the use of the Jacobian in (1.21). This quantity
relates the change in differential volume elements when transforming coordinates and is defined by
1 2
1 1 1
1 2
2 2
1
∂∂ ∂
∂ ∂ ∂
∂ ∂
∂ ∂
∂∂
∂ ∂
 
= 
 
 
n
n
n n
xx x
y y y
x x
y y
xx
y y
x
J
y
…
(1.22)
In Figure 3, we show some examples of actual data representing the distribution of the feature vectors, along with
Gaussian fits to those distributions. Parzen estimation is also relevant in the estimation of density functions with
( ) ( )
1
1
ˆ κ
=
= −∑
n
i
i
p x x x
n
(1.23)
( )ˆp x is the estimated probability and κ is the kernel function, which is typically normal or uniform.
Figure 3 Histogram representing feature distributions and Gaussian pdf model fit for , , andφ λ∆ ∆ ∆v
While Figure 3, demonstrates at least some superficial indication that normality isn’t the worst assumption one could
make for these distributions, we have not described the conditional class density for the second class because it is so
difficult to infer. Thus, a single hypothesis test is often attempted (a distance classifier as in (1.7)) where we accept the
increase in Bayes error due to dimensional folding. Also, even in 5 dimensions, estimating the conditional density
functions is difficult from the training data. The simplifications of (1.19) help, but still leaves us with two correlation
coefficients to estimate. The reader should take care to distinguish between feature component correlations, and time
series correlation of a particular feature. Each presents unique challenges to the classification problem.
Even having overcome the density estimation issue, a potentially more problematic issue arises. Angular misalignment
is a dominant source of error. As such, the distribution of changes in latitude and longitude, etc, are a function of the
absolute distance between the measuring sensor and the target. Thus, we really should be gathering information on the
change of the distributions based on the spatial relationships between the sensors and targets. Setting decision
thresholds at .02 degrees for example, will not necessarily lead to the same location on the ROC curve for all targets.
CONNECTING HYPOTHESIS TESTING TO MATCHED FILTERING
Under Gaussian conditions the Bayes classifier for the two-class problem becomes a linear classifier when the class
covariance matrices are the same and a quadratic classifier when the covariance matrices are different. If we use a
classification scheme of based on (1.5) of

( )0sgn |ω  e x (1.24)
with evidence, ( )0 |ωe x , viewed as an argument to a discriminant function, then we can seek to optimize our classifier
subject to some criterion. Dropping 0ω , we take ( )y x as the generalization of ( )e x . For the linear case, the general
solution has the form
( )
( )( )
1 0
0
0 1
,
1
µ µ−
= + =
Σ + − Σ
T
y x V x p V
s s
(1.25)
where 0p and s are constant.
Returning to our earlier discussion, the covariance matrix Σ can always be made an identity matrix through a suitable
whitening transformation and the decision rule of (1.24) in the Gaussian, equal covariance case becomes
( ) ( ) ( ) ( )1
0 1 0 1 1 0 0 02
sgn | sgn
T
T T
e x x eω µ µ µ µ µ µ ω = − − − − −     
(1.26)
which is directly derivable from (1.15) when 1 1∆ ∆Σ = Σ =s c s c I .
Since ( ) ( )1
1 1 0 0 0 02
µ µ µ µ ω− − − =T T
e p is a constant term independent of x , we can view (1.26) as proportional to the
difference of two correlation operations:
( )1 0 1 0µ µρ ρ µ µ− = −
T
xx x (1.27)
where the correlation operation is defined as
( ) ( ) ( )
1
µρ κ µ κ
=
= +∑i
N
x i
j
j x j (1.28)
and in (1.27), 0κ = . This is nothing more than expression as the inner product of x and iµ with given lag κ . The
decision rule for equation (1.25) then becomes ( ) ( )0 1 0sgn ,
T
T
y x V x p V xµ µ = + = −  . Clearly, the connection
between (1.26) and (1.25) is the correlation operation. The decision rule compares the difference in correlation scores to a
threshold and to a class accordingly. The threshold is determined by the mean class separability and the prior probabilities
for each class.
We can explain this in terms of basic, linear filtering theory. Given an input x and filter h , the output of a linear
system is = ∗y h x where ∗ is a convolution operation:
( ) ( ) ( )
1
κ κ
=
= −∑
N
j
y h j x j (1.29)
If ( ) ( )µ κ κ+ = −i j h j , then (1.28) and (1.29) are equivalent and convolution and cross-correlation can be seen as one in
the same and the discriminant function is nothing more than the ( )sgn i function applied to the difference in outputs of a
matched filters One of the filters has impulse response ( )0µ κ and the other ( )1µ κ .
CORRELATION FILTERS: PHASE, NONOVERLAPPING DISJOINT NOISE
Finally, we state the key relation in discrete form as discussed in [15] and countless others by returning to
( ) ( ) ( ) ( ) ( ) ( )
1
ˆ ˆ ˆ
N
j j i i j j j
i
y t h t x Y H Xτ τ ν ν ν
=
= − ⇔ =∑ (1.30)

The output of a linear system is the convolution of the input with the filter. In the frequency domain, this is just
multiplication of their respective transforms. For a track image as demonstrated in Figure 1 and applying the 2-D
convolution theorem, we can state the two-dimensional version of (1.30) as
( ) ( ) ( )1 2 1 2 1 2
ˆ ˆ ˆ, , ,ν ν ν ν ν ν=Y H X
(1.31)
where ˆ ˆ, ,X H and ˆY are the 2-D Fourier transforms of the input track image, the filter, and the output response. For a
matched filter,
( ) ( ) ( ) ( )*
1 2 1 2
ˆ ˆ, , , ,j j j i j iH X h s t x s tν ν ν ν σ τ= ⇔ = − − (1.32)
with *ˆX the complex conjugate of ˆX . For a phase-only filter
( )
( )( )
( )( )
*
1 2
1 2 *
1 2
Im ,
ˆ , exp atan
Re ,
ν ν
ν ν
ν ν
   
   =
   
   
X
H i
X
(1.33)
Notice that for a matched filter,
( ) ( ) ( ) ( ) ( )
2
1 2 1 2 1 2 1 2 1 2
ˆ ˆ ˆ ˆ, , , , ,ν ν ν ν ν ν ν ν ν ν∗
= = = ΦY X X X (1.34)
From a frequency-plane correlation viewpoint, we can introduce a Fourier-plane nonlinearity with the hopes of improving
the correlation performance. The effect of this is to allow more complicated decision surfaces to better partition the class
regions. One such simple mapping is
( ) ( )
( )( )
( )( )
*
1 2
1 2 1 2 X *
1 2
ˆIm ,
ˆ ˆ, , , 0 1, = atan
ˆRe ,
γ
φ
ν ν
ν ν ν ν γ φ
ν ν
∗
 
 = ≤ ≤
 
 
Xi
X
H X e
X
(1.35)
When 1γ = , we have a classic matched filter if ( ) ( )1 2 1 2
ˆ ˆ, ,H Xν ν ν ν∗
= and a phase-only filter if 0γ = . In optical
architectures, 0 1γ< < nonlinearities are achievable.
Figure 4 Track image tiles and correlation output
Phase Invariance for Alignment Errors
Time alignment refers to the need to interpolate or extrapolate data so that comparisons can be made across data from
different data links and sensors as the samples from these sources almost never occur at the same time. One of the main
ways this becomes an issue is when the kinematics of the target do not follow a linear velocity model. For example, the
six-state or constant velocity tracker is the Kalman filter of choice for constant speed targets, while a nine-state, or
constant acceleration tracker is the Kalman filter of choice for maneuvering targets. When models do not correspond to
actual target dynamics, at best this results in larger uncertainty estimates from the filter and at worst a lost track. The
correlation algorithm has many of the same issues.
For current track association algorithms, data alignment is an important component in the algorithm behavior. Some
algorithms are so sensitive to these time errors that even with constant velocity targets, if the reported time is in error

from the actual time by as much as 50 msec, it will result in a de-correlation. However, when more information is used,
we can recognize like patterns easier even when there are gaps in the data or other distortions.
Figure 5 (Left) Latitude, longitude and phase plots for 3 correlated track pairs. Even time alignment problems will
tend to preserve phase. (Right) Plot of difference vectors for same information
In practice, one is typically not interested in exact interpolation. First, real samples are usually noisy and an
interpolating function passing through every data point will lead to overfitting and thereby poor generalization. Better
generalization will be achieved using a smoother fit that averages out noise. This is the working premise of the modern
area of radial basis functions and the generalized theory of kernel machines; classifiers must have useful generalization
properties to capture system behaviour and not just model the input data.
Consider the left and middle plots of Figure 5. The latitude and longitude of three correlated tracks are shown and the
associated phases for the track time series. These latitude reports are of the same target, but due to sensor registration
issues and temporal sampling differences, the latitudes and longitudes do not correspond perfectly. However, when we
look at the phase of this information as plotted on the far right, we maintain good correspondence across most of the
phase. On the far right, we see a plot of the time series for a feature vector comprised of φ λ = ∆ ∆ t ij ij t
x . The
latitude difference vectors are on top and the longitude difference vectors on the bottom. There is a reasonably wide
variation in the quantities for certain track combinations and we should not expect a constant threshold to work well for
all correlated track pair combinations because of rotational misalignment introducing a bias depending on the physical
location of the track with respect to the measuring radar.
Figure 6 Two track images that should correlate. One has a significant translational and rotational bias, and is
missing a feature as well.
However, phase is nearly invariant to small angular misalignment and certainly invariant to translation, which is really an
additive constant to the feature. This is further demonstrated in Figure 6. A 5 degree in-plane rotational misalignment and
translation offset is introduced. Furthermore, we have removed a feature from one image that is present in the other.
Attaining a correlation peak is relatively easy as demonstrated.

Generalized Distortion Modeling
In several previous papers, we have outlined the extension of least-squares filter performance for more general and
applicable conditions than that of overlapping white noise. Basic to the approach was the construction of a window
function, which allowed us to differentiate between the target and background, and allow us to specify specific noise
processes for a region of the input, rather than uniformly imposed on the whole input. The general approach is to let the
input signal be represented by ijx for the feature vector constructed from track i and track j and further subdivide the
windowing function on the track image to contain the ability to have different noises for different features.
The model has considerable application in the track correlation problem by allowing us to describe regions of noise
specific to certain parameters. Due to space considerations, we refer the reader to [16], [17], [18].
COMBAT ID / REGISTRATION
The approach until this point has focused on the use of image topology to make track association decisions. Central to this
approach is that tracks have attributes in common that can be compared. In the track image construction process, we can
always leave an attribute ‘blank’ in the image by setting that field to zero, but there must be some common information.
This section focuses on the ability to merge disparate information. Such a situation can arise when various input sources
provide a specific piece of information, but none of it alone is enough to make a classification. The Polynomial correlation
filter (PCF) is designed to address this situation.
The objective is to find filters, ( ),ih m n such that the filter can respond to different transformations of the true class, and
do so in a simultaneous manner. Furthermore, positive true class detections can be due to the filter response to individual,
some, or all of the input data about the object. The typical performance criterion is
( )
2
ˆˆ
ˆ
ˆ ˆˆ
m h
J h
h Bh
+
 
 = +
(1.36)
where ˆB is a diagonal matrix related to the spectral properties of the training images. Notice that this criterion is
analogous to the MACH filter where ˆ ˆˆB S C= + . (1.36) was extended to multiple sensors by A. Mahalanobis [19].
To explain the idea, imagine that we have several sources of information about an object, χ . This might be in the form of
imagery, intelligence, kinematic, or other types of information. We can describe the information as some transformation of
the original object, however complicated that transformation might be. Furthermore, let us assume that we can describe
this information in some two-dimensional format.
( ) ( ), χ=i ix m n f (1.37)
if is the transformation applied to object χ by source i and ( ),ix m n is the information described in a two-dimensional
format. We could then design a filter bank such that the correlation output plane is expressed as
( ) ( ) ( )
1
, ,χ χ
=
= ⊗∑
T
i i
i
y m n h m n f (1.38)
where ⊗ is the correlation operation (see equation (1.28)) and there are T sources of information on χ .
Imagine we had three sources of information on χ and we wanted to design a filter to recognize χ based on the output of
3 sensors. Let ( ) ( )1 1, χ=x m n f be the output of a radar tracking the object with kinematic values supplied at various
times. That is, a row, m , would be one parameter such as course, speed, latitude, altitude, or longitude and the column, n ,
would represent a particular sample at a particular time. Let ( ) ( )2 2, χ=x m n f be the output of an LWIR FLIR imaging
the target and let ( ) ( )3 3, χ=x m n f be the hyperspectral information such as plume signature data, and
1
1
ˆ ˆ
iT
i i
j
ji
m x
T =
= ∑ the

average training image in frequency space for source input i . The average filter response in the frequency plane to the
source input i is ( ) ˆˆ î i
iy m h
+
< > = . For this example, the Average Correlation Height (ACH) and Average Similarity
Metric (ASM) are [ ] ( )
23
2
1
ˆÂCH ,i
i
i
m h
+
=
 
=  
 
∑ and
3 3
1 1
ˆ ˆÂSM i ij j
i j
h h+
= =
= Σ∑∑ with ˆ
iijjΣ a frequency-plane term that resembles
the covariance in training image sets between sensor i and j . Mahalanobis showed that an optimal filter for this
problem can be written as
ˆ ˆ îjh mΣ -1
= (1.39)
and we have an effective algorithm for a CID architecture.
Figure 7 A track image with additional information useful to CID process
A simple example is given in Figure 7. The tiles have latitude, longitude, altitude, course, and speed on the vertical axis
and time on the horizontal axis. Some tracks have mode-II / mode-III codes and frequency information from an
Electronic Intelligence (ELINT) system. These mode codes are used to identify friendly platforms from hostile ones,
and frequency information can be a non-cooperative way of identifying the platform.
CONCLUSION
We have discussed the number one problem facing battlefield information systems today: track association/correlation for
improved situational awareness and CID. We outlined in some detail how statistical hypothesis testing is used today to
address this problem and described these techniques with a specific example. The statistical viewpoint is quite useful for
understanding performance issues and theoretical bounds. As a novel approach to track association, we have introduced
the concepts of image topology with regards to tracking and how this topology can be used to develop a complete track
association system with ID capability. We looked at several frequency plane nonlinear and composite filters and discussed
how sensor registration covariance can be used to define the region associated tracks are to lay with high probability. Then
we sample this region of uncertainty in such a way as to generate training images to be used in composite filters.
Furthermore, the majority of the filters discussed lend well to optical processing and we discuss a commercial system that
can be used to implement simplified versions of nonlinear and/or composite filters.
APPENDIX
First, take the variances of the location of the target in two reported positions: a local and remote report. The covariance
matrix, Σ , for ,l lx y is expressed as
( ) ( )
2 2
2 2
, , ,
σ ρ σ σ σ ρ σ σ
ρ σ σ σ ρ σ σ σ
   
Σ = Σ = Σ = Σ =   
     
l l l l l r r r r r
l l l l l r r r r r
x x y x y x x y x y
l l l r r r
x y x y y x y x y y
x y x y (1.40)
With radars, the measurements are typically performed in ( ),θR and these variables are assumed to be independent. For
most filters, this is a good assumption. If ( ),θi iR is the target range and bearing from one sensor and ( ),θj jR the second
target range and bearing, then the first sensor measurements and the second sensor variances are

( ) ( )
2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2
2 2 22 2 2
sin cos sin cos
cos sin cos sin
sin 2sin 2
2 2
θ θ
θ θ
θθ
σ σ θ σ θ σ σ θ σ θ
σ σ θ σ θ σ σ θ σ θ
σ σ θσ σ θ
ρ ρ
σ σ σ σ
= + = +
= + = +
−−
= =
i i i j j j
i i i j j j
j ji i
i i j j
x R i i i x R j j j
y R i i i y R j j j
R j jR i i
i j
y x y x
R R
R R
RR
(1.41)
with σ iR being the first sensor range error and lθσ the sensor azimuth error. These are fundamental system performance
parameters. It is easy to show that the variances and correlation coefficient of the differences are given by
2 2 2 2 2 2
, ,
ρ σ σ ρ σ σ
σ σ σ σ σ σ ρ
σ σ∆ ∆ ∆ ∆
∆ ∆
+
= + = + = i i j j
i j i j
i x y j x y
x x x y y y x y
x y
(1.42)
In a similar fashion, the speed-heading differences can be expressed as
∆    
= −    ∆     
ji
ji
sss
ccc
(1.43)
Under convolution of probability density functions due to subtraction of two random variables in creating the feature
elements, we find the entries in the covariance matrix:
2 2 2 2 2 2
, ,
ρ σ σ ρ σ σ
σ σ σ σ σ σ ρ
σ σ
∆ ∆ ∆ ∆
∆ ∆
+
= + = + = i i j j
i j i j
i s c r s c
s s s c c c s c
s c
(1.44)
1 R. Reynolds, Colonel (Ret.) USAF, private communication
2 RADM M. Mathis, Col H. Dutchyshyn, CAPT J. Wilson, Single Integrated Air Picture, Network Centric Warfare
Conference, American Institute of Engineers, 23 October 2001
3 C. Stanek, B. Javidi, and P. Yanni, Imaged-based Topology for Sensor Gridlocking and Association, SPIE
Proceedings in Automatic Target Recognition, Vol 4726 April 2002
4 C. Stanek, B. Javidi, P. Yanni, Filter Construction for Topological Track Association and Sensor Registration, SPIE
Annual Meeting Proceedings Vol. 4789, 2002
5 O. Drummond, Track and Tracklet Fusion Filtering Using Data from Distributed Sensors, Proceedings from the
Workshop on Tracking, Estimation, and Fusion: A Tribute to Bar Shalom, May 2001.
6 Sensor gridlock is often called sensor registration: the process of registering the sensor’s frame of reference to a
common frame of reference or datum. The accurate registration of multiple sensors is required before any gains in
precision can be made.
7 E.T. Jaynes, Bayesian Methods: General Background, An Introductory Tutorial, p. 8, 1996
8 E.T. Jaynes, Probability Theory as Logic: Hypothesis Testing, Chapter 4, 1994
9 K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd
Edition, Ch 3, 1990
10 G. Toussaint, Course Notes in Pattern Recognition 308-644B, University of McGill
11 C. Elkan, Naïve Bayesian Learning, Dept. of Computer Science, Harvard University
12 K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd
Edition, pp 67-73, 1990
13 R. Lynch, Peter Willett, Adaptive Sequential Classification using Page’s Test, Proceedings of SPIE, Vol. 4731,
2002
14 Applied Physics Laboratory, John Hopkins, Gridlock Analysis Report, Vol. III, Issue 1, July 1982
15 X.Rong Li, Probability, Random Signals, and Statistics, CRC Press, 1999
16 B. Javidi, J. Wang, Design of filters to detect a noisy target in nonoverlapping background noise, J. Opt Soc Am,
Vol. 11 No., 10 Oct 1994
17 B. Javidi, F. Parchekani, and G.Zhang, Minimum-mean-square error filters for detecting a target in background
noise, Applied Optics, Vol 35, No. 35, December 1996
18 B. Javidi, J. Wang, Optimum distortion-invariant filter for detecting a noisy target in nonoverlapping background
noise, J Opt Soc of Am, Vol. 12, No. 12, December 1995
19 .B Javidi, ed., Image Recognition and Classification, Ch. 10, Marcel Dekker, 2002

Paper 5094-43_5

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (15)

Similar to Paper 5094-43_5

Similar to Paper 5094-43_5 (20)

Paper 5094-43_5