1. Detection theory
Detection theory involves making a decision based on set of measurements. Given a set of
observations, a decision has to be made regarding the source of observations.
Hypothesis –A statement about the possible source of the observations.
Simplest –binary hypothesis testing chooses one of two hypotheses, namely
0H Null hypothesis which is the usually true statement.
1H Alternative hypothesis.
In radar application, these two hypotheses denote
0H Target is absent
1H Target is present
M-ary hypothesis testing chooses one of M alternatives:
0 1 1, ........, MH H H .
A set of observations denoted by
1 2[ ..., ]'nx x xx
We can associate an a prior probability to each of hypothesis as
0 1, ,..., MP H P H P H
Given hypothesis iH ,the observations are determined by the conditional PDF / ( )iHfX x
The hypothesis may be about some parameter that determines / ( )iHfX x . is chosen
from a parametric space .
Simple and composite hypotheses:
For a simple hypothesis, the parameter is a distinct point in while for a composite hypothesis
the parameter is specified in a region. For example, 0 : 0H is a simple hypothesis while
1 : 0H is a composite one.
Bayesian Decision theory for simple binary hypothesis testing
The decision process D X partitions the observation space in to the region 0 1
n
Z Z
such that 0D X H if 0x Z and 1H otherwise
A cost ,ij j iC C H D H x is assigned to each ,j iH D Hx pair. Thus
00 0 0, ( )C C H D H x , and so on 1,0 0 1,C C H D H x .The objective is to minimize
the average risk
2.
,
,
,
Bayesian decison minimize over D , 0,1
Equivalently minimize , over , 0,1
X X x
X x X
j
X x j
C R D E C H D X
E E C H D X
E C H d X f x dx
R D X H j
E C H D X d X H j
We can assign 00 11 10 010 and C 1C C C but cost function need not be symmetric
Likelyhood ratio Test
Suppose 0D X H
1
0
1
0
00 0 01 1
1
1 10 0 11 1
0 11
00 0 01 1 10 0 11
, /
similarly if
, , ( )
The decision rule will be
, , ( ) , , ( )
H
H
H
H
X x X xE C H D X X x C P H C P H
D X H
E C H D X X x D X H C P H X x C P H X x
E C H D X X x D X H E C H D X X x D X H
C P H X x C P H X x C P H X x C P
1
0
1
| 01
0
| 1
| 0
0
1
00 01 11 10
01 11 1 10 00 0
|
1
|
0
01 11 1 10 00 0 |
since and ,we can simplify
Note that
, 0,1.
we can write
H
H
j
i
H
X H
H
H
X H
X H
H
X x X x
j X H
j X x
i X H
i
X H
H X x
C C C C
C C P H C C P H
P H f x
P H j
P H f x
C C P H f x C C P H f x
f x
L X
f x
1
10 00 0
01 11 1
C C P H
C C P H
This decision rule is known as likelihood ratio test LRT .
Errors in decision
3. The decision space partitions the observation space into two regions: In 0Z , 0H is decided
to be true and in 1Z , 1H is decided to be true. If the decisions are wrong, two errors are
committed:
Type I error=probability false alarm given by
0
1
1 0
/
( ( ) | )
( )
FA
X H
Z
P P D X H H
f x dx
Type II error given by
1
0
0 1
/
( ( ) | )
( )
M
X H
Z
P P D X H H
f x dx
4. Error Analysis:
Two types of errors Type
0
1
1
0
1 0
11 0 1
0 1
/
/
/
/
/
False alarm probability
Type II error probability:
/
miss detction probability
probability of error
=P P
We observe that
P
( )
I
FA
M
e
F M
F X H
Z
X H
X H
X
P P D x H H
P
P P D x H H
P
P
H P H P
f x
f x
f L x dL x L x
f
0
( )H x
Similarly,
1
0
1
/
/
0
M X H
Z
X H
P f x dx
f L x dL x
00 0 0 0
10 0 1 0
01 1 0 1
11 1 1 1
00 0 10 0
01 1 11 1
0 1
00
Now Bayesian riskis given by
,
/
/
/
/
1
1
if we substitute 1 , can be wriiten as
F F
M M
R D C EC H D x
C P H P D x H H
C P H P D x H H
C P H P D x H H
C P H P D x H H
C P H P C P H P
C P H P C P H P
P H P H R D
R D C
10 1 11 00 01 11 10 00
1
1
=function of and the threshold
F F M FP C P P H C C C C P C C P
P H
I
5.
00 11 10 01
0 1
0 01 00 0
1 10 11 1
0, 1
The threshold is given by
=
F M
e
C C C C
thenR d isgivenby
P H P P H P
P
LR
P C C P
P C C P
Minimum probability of error criterion :
MinMax Decision Rule
Recall the Baysesian risk function is given by
00 10 1 11 00 01 11 10 001 F F M FR D C P C P P H C C C C P C C P which is function
of 1P H . For a given 1P H we can determine the other parameters in the above expression
using the minimum Baye’s risk criterion.
R D
0
1P H
Suppose the parameters are designed using the Baye’s minimum risk at 1P H p .If 1P H is
now varied,the modified risk curve will be a straight line tangential to the Baye’s risk curve at
, ( )p R D p .The decision will no longer optimal. To overcome this difficulty, Baye’s minimax
criterion is used. According to this criterion, decide by
6.
1
1
Min Max ,
j
R D P H
H P H
Under mild conditions, we can write
1 1
Min Max ( ) Max Min ( )
j jH P H P H H
R D R D
Assuming differentabilty, we get
00 10 1 11 00 01 11 10 00
1 1
11 00 01 11 10 00
1 0
( ) ( )
0
F F M F
M F
d d
R D C P C P P H C C C C P C C P
dP H dP H
C C C C P C C P
The above equation is known as the minimax equation and can be solved to find the threshold.
Example Suppose
0
1
00 11
01 10
: ~ exp(1)
: ~ exp(2)
0
2, 1
H X
H X
C C
C C
Then,
0
1
0
2
1
11 00 01 11 10 00
2
0
2
2
We have to solve the min max
0
Now
1
2
2
1
H
H
x
X H
x
X H
M FA
M F
x
FA
x
M
f e u x
f e u x
L x
equation
C C C C P C C P
P P
P e dx
e
P e dx
e
Substituting FAP and MP in the minimax equation, we can find
7. Receiver Operating Characteristics
The performance of a test is analyzed in terms of graph showing vsD FAP P . Note that
( )1
1 ( )
D X H
Z
P f x dx
η
∞
= ∫ and ( )0
1 ( )
FA X H
Z
P f x dx
η
∞
= ∫ where 1
( )Z η is a point corresponding to the
likelihood ratio threshold η and 1
Z represents the region corresponding to the decision of
1H .
=D FAP P
In general, we will like to select a FAP that results in a DP near the knee of the ROC. If we
increase FAP beyond that value, there is a very small increase in DP . For a continuous DP ,the
ROC has the following properties.
1.ROC is a non-decreasing function of FAP . This is because to increase FAP , we have to
expand 1Z and hence DP will increase.
2.ROC is on or above the line =D FAP P
3.For the likely- hood ratio test, the slope of the roc gives the threshold
Recall that
1
DP
1FAP
8. ( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( )
1 1
1
0 0
1
0
1
0 0
1
1( )
1
1( )
( )
1 1 1
1
1
1
( )
( )
( )
( )
( ) ( ) ( )
( )
( )
( )
D
D X H X H
Z
FA
FA X H X H
Z
D X H
Z
D
X H X H
D
D
FAFA
dP
P f x dx f Z
dZ
dP
P f x dx f Z
dZ
P L x f x dx
dP
L Z f Z f Z
dZ
dP
dP dZ
dPdP
dZ
η
η
η
η
η
η
η
η η η η
η
η
η
η
∞
∞
∞
= ⇒ = −
= ⇒ = −
=
∴ = − = −
∴ = =
∫
∫
∫
Neyman-Pearson (NP) Hypothesis Testing
The Bayesean approach requires the knowledge of the a priori probabilities 0( )P H and .
Finding these is a problem in many cases. In such cases, the NP hypothesis testing can be
applied.
The NP method maximizes the detection probability while keeping the probability of false
alarm with a limit. The problem can be mathematically written as
( )
maximize
, 0,1
subject to
D
j
FA
P
D x H j
P α
= =
≤
I statistical parlance, α is called the size of the test.
We observe that ROC curve is non-decreasing one. Decreasing FAP will decrease DP also.
Therefore for optimal performance FAP should be kept fixed at α .Hence the modified
optimized problem is
( )
maximize
, 0,1
subject to
D
j
FA
P
D x H j
P α
= =
=
We can solve the problem by the Lagrange multiplier method.
9. ( )
( )
( ) ( )
( ) ( )( )
1 0
1 1
1 0
1
, 0,1
Now
j
D FA
D x H j
X H X H
Z Z
X H X H
Z
Maximize J P P
J f x dx f x dx
f x f x dx
λ α
λ
αλ
= =
= − −
= −
= − +
∫ ∫
∫
To maximize we should select 1Z such that
( ) ( )
( )
( )
( )
1 0
1
1
0
0
0
1
0
will give thethreshold in terms of .
can be found out from
H
H
X H X H
X H
X H
X H
Z
f x f x
f x
f x
f x dx
λ
λ
λ
λ α
>
<
− >
=∫
Example
( )
( )
( )
( )
( )
1
0
1
0
0
1
2 1
2
2 1
2
; 0,1
; 1,1
0.25
H
H
FA
x
X H
X H
x
H X N
H X N
P
f x
L x e
f x
e λ
−
−
>
<
∼
∼
=
= =
Taking logarithm,
( )
( )( )
( )( )
1
0
1
0
2
2
1
1 2ln
2
2 1 2ln
1
1 2ln
2
1
0.25
2
0.675
H
H
H
H
x
x
x
e dx
λ
λ
λ η
π
η
>
<
>
<
∞ −
+
−
⇒ + =
=
⇒ =
∫
10. Composite Hypothesis Testing
Uncertainty of the parameter under the hypothesis .Suppose
0
1
0 / 0 0
1 / 1 1 1
: ~ ,
: ~ ,
X H
X H
H X f x
H X f x
0
θθ
If 0
θ
1or
θ
contains single element it is called a simple hypothesis; otherwise it is called a
composite Hypothesis.
Example:
Suppose
0
1
: ~ N 0,1
: ~ N ,1 , 0
H X
H X
These two hypotheses may represent the absence and presence of dc signal in presence of a
0mean Gaussian noise of known variance 1.
Clearly 0H is a simple and 1H is a composite algorithm. We will consider how to deal
with the decision problems.
Uniformly Most Powerful (UMP) Test
Consider the example
2
2
0
1
1
2
0
1
2
0
21
0
: , 0,1,..., 1 , ~ 0,1
: , 0,1,..., 1 , ~ ,1 , 0
Likelyhood ratio
1
2ln ln
1
2
2
i
i
i i
i i
xN
i
xN
i
N
i
i
H x i N X iid N
H x i N X iid N
e
L x
e
n
x
If 0 ,then we can have
1
00
H
H
N
i
i
T x x
Now we have the modified hypothysis follows:
11.
0
1
:T ~ N 0,N
: t ~ N , N
H X
H
DP
2
1
2
1
2
x
N
FA
v
P e dx
v
Q
N
Example
FAP
1
2
1
2
1
2
2
0
2
1
2 2
2
2
2
: ~ N 0,
: ~ N 1, , 0
1
ln
2
1
2
1
2
if we take 0,then
1
2
H
H
H
H
H
H
H X
H X
x x
L x
x
x
x
With this threshold we can determine the probability of detection DP and the possibility of
false alarm FAP .However, the detection is not optimal any sense. The UMP may not generally
exist. The following result is particularly useful for the UMP test.
Karlin-Rubin Theorem
Suppose 0 0 1 1 0: , :H H
Let T T x be test statistic . If the likelihood ratio
1
0
1
0
L
t H
t H
f t
t
f t
is a non-decreasing function of t , then the test
1
0
H
H
T
maximizes the detection probability DP for a given FAP Thus the test is UMP for a
fixed FAP .
12. Example:
1 0
0 0
1 1 0
1
0
: ~
: ~
Then
x
H X Poi
H X Poi
L x e
is a non-decreasing function of x.
Therefore the threshold test for a given PFA is UMP.
13. Generalized likelihood ratio test(GLRT):
In this approach to composite hypothesis testing, the non –random parameters are replaced by
their MLE in the decision rule.
Suppose
0
1
0
1
: ~
: ~
X
X
H X f x
H X f x
Then according to GLRT, the models under the two hypotheses are compared. Thus the
decision rule is
11
1
0
0
0
max
max
H
H
X H
X
f x
f x
This approach also provides the value of the unknown parameter. It is not optimal but works
well in practical situations.
Example:
2
1
2
0
0
1
2
1
2
1
2
1
1
2
1
: 0
: 0
~ ( , )
.
.
1
2
1
2
i
i
i
n
xn
X
i
n x
X
i
H
H
X iid N
x
x
=
x
f x e
f x e
X
The MLE of 0 is given by
14.
2
2
1
2
2
2
2
1 1
2
2
1
2
2
1
1
1 1
2
1
1
2
1
1 1
2
1
2
1
2
2
1
1
ˆ
Under GLRT, the likelihood ratio becomes
1
2L x =
1
2
1
ln L
2
n
i i
i
i
n n
i i
i i
n
i
i
n
i
i
n
i
i
x xn
n
i
n x
i
x x
n
x
x
n
n
i
i
x
n
e
e
e
e
e
x x
n
2
2
2
1
2
1
2
1
0
2 2
2 2
1 1
2 2 2
1
Using the GLRT
2 ln
~ ,
~ chi-squre distributed
Particularly, under
1
n
i
i
n
i
i
n
i
i
n n
i i
i i
x n
X N n
x
H
x n x
n
n X
where 2
1X is a chi-square random variable with degree 1.
we can find the FAP
Multiple Hypothesis testing:
Decide of 0 1 2 1, , ,...... MH H H H on the basis of observed that
,i 0,1,2,....M 1iP H are assumed to be known, Associate cost ijC associated with the
decision iH and defined by
( , ( ) )ij j iC C H D X H .
The average cost then given by
1 1
0 0
/
M M
ij i j j
i j
C C P D H H P H
15. Z
The decision process will partition the observation space n
Z (or a subset of it) into M
subsets 0 1 1, ,..., MZ Z Z
1
0
1
0
1
1
j j
M
i
i
i j
j
i j X H X H
Z
Z
M
X H
j
j i
P D H H f x dx f x dx
f x dx
We can write
1 1 1
0 0 1
j
i
M M M
ii i j ij jj X H
i i jZ
j i
C C P H P H C C f x
Minimization is achieved by placing x in the region such that the above integral is
minimum. Choose the region of x corresponding to the minimum value of
1
1
j
M
i j ij jj X H
j
j i
C x P H C C f x
Thus decision rule based on minimizing over iH
0 0
1
0
1
0
1
0
, 0,1,2,..., 1
1, , 0
then
j
j
M
X Hi
i j ij jj
jX H X H
j i
M
i ij jj j
j
ij jj
M
i j X H j
j
j i
f xC x
J x P H C C
f x f x
P H C C L x i M
C i j C
J x P H f x H
The above minimization corresponds to the minimization of the probability of error.
Z0 Z1
.
.
:
ZM-1
16. Rewriting iJ x we get,
1
0
1
M
i j X
j
j i
i X
J x P H x f x
P H x f x
and arrive at MAP criterion
If the hypotheses are equally likely i.e 0 1 1...... MP H P H P H P then we can write
1
0
j i
M
i X H X X H
j
j i
J x Pf x f x Pf x
Therefore the minimization is equivalent to the minimization of the likelihood iX Hf x .
Example
1
2
2
0
1 1
2 2
2 2
0 0
0
1
2
1
2
1 1
2
2 2
, 0,1,...niid Gaussian
: 1
:
:
1
2
1
.
2
n
i
i
i
n n
i i
i i
i
x
X H n
x x
n
X i
H
H
H
f x e
e e
We have to decide on the basis
1 1
2
0 0
2
2
0
1
2
0
1
2
2
: 2 1
: 4 4
: 2 1
3
:0
2
3
:
2
: 0
n n
i
i i
x
T x x
H T x x
H T x x
H T x x
H x
H x
H x
17. Sequential Detection and Wald’s test
In many applications of decision making observations are sequential in nature and the
decision can be made sequentially on the basis of available data.
We discuss the simple case of sequential binary hypothesis testing by modification of the NP
test.This test is called the sequential probability ratio test(SPRT) or the Wald’s test.
In NP test, we had only one threshold for the likelihood ratio L x given by
1
0
H
H
L x
The threshold is determined from the given level of significance
In SPRT, L x is computed recursively and two thresholds 0 1and are used. The simple
decision rule is:
If 1 1, decideL Hx
If 0 0, decideL Hx
If 0 1L x wait for the next sample to decide.
The algorithm stops when we get 1 0orL x L x .
Consider iX to be iid.
1
0
1
0
0
1 2
1 2
0 1 2
1
1
1
1
1
Then for .....
, ,....,
, ,....,
i i
i
i
i
i
n
n
X H n
X H n
n
X H i
i
n
X H i
i
n
X H i
X H ni
n
X H n
X H i
i
n n
X X X
L L
f x x x
f x x x
f x
f x
f x
f x
f x
f x
L L x
n
n
X
X
In terms of logarithm
1ln ln lnn n nL L L x
18. Suppose the test requirement is
1 andM D FAP P P
We have to fix 0 1and on the basis of and .
Relation between 0 1, and ,
We have
1
1
0
1
0
1
1
0
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
Similarly
1
1
1
D X H
Z
X H
Z
X H
Z
M X H
Z
X H
Z
X H
Z
P f x dx
L x f x dx
f x dx
P f x dx
L x f x dx
f x dx
Average stopping timefor the SPRT is optimal in the sense that for a given error levels no
test can perform better than the SPRT with the average number of samples less than that
required for the test. We may take the conservative values of 0 1and as
1
1
and
0
1
19. Example Suppose
2
0
2
1
~ ~ 0,
~ ~ ,
H X N
H X N
and and
2
2
1
2
1
1
2
1
i
i
xn
i
xn
i
e
L x
e
We can compute 1
1
and
0
1
and design the SPRT.