SlideShare a Scribd company logo
1 of 303
Download to read offline
PRENTICE HALL SIGNAL PROCESSING SERIES
Alan V. Oppenheim, Series Editor
ANDREWS
AND HUNT
BRIGHAMThe Fast Fourier Tmnsform
BRIGHAM
BURDIC
CASTLEMAN Digital Image Processing
COWAN AND GRANT Adaptive Filters
CROCHIERE AND RABINER
DUDGEON AND MERSEREAU
HAMMING Digital Filters, 3/E
HAYKIN,
ED.
HAYKIN,
ED. Array Signal Processing
JAYANT
AND NOLL
JOHNSON AND DUDGEON
KAY
KAY Modern Spectral Estimation
KINO
LEA, ED.
LIM
LIM, ED. Speech Enhancement
LIM AND OPPENHEIM,EDS.
MARPLE
MCCLELLAN AND RADER
MENDEL
OPPENHEIM, ED.
OPPENHEIM AND NAWAB, EDS.
OPPENHEIM, WILLSKY,
WITH YOUNG
OPPENHEIM AND SCHAFER Digital Signal Processing
OPPENHEIM AND SCHAFERDiscrete- Time Signal Processing
QUACKENBUSH ET AL. Objective Measures of Speech Quality
RABINER
AND GOLD
RABINER
AND SCHAFERDigital Processing of Speech Signals
ROBINSON AND TREITEL
STEARNS AND DAVID
STEARNS AND HUSH
TRIBOLET
VAIDYANATHAN
WIDROW AND STEARNS
Digital Image Restomtion
The Fast Fourier Transform and Its Applications
Underwater Acoustic System Analysis, 2/E
Multimte Digital Signal Processing
Multidimensional Digital Signal Processing
Advances in Spectrum Analysis and Array Processing, Vols. I € 5 II
Digital Coding of waveforms
Array Signal Processing: Concepts and Techniques
Fundamentals of Statistical Signal Processing: Estimation Theory
Acoustic Waves: Devices, Imaging, and Analog Signal Processing
Trends in Speech Recognition
Two-Dimensional Signal and Image Processing
Advanced Topics in Signal Processing
Digital Spectral Analysis with Applications
Lessons in Digital Estimation Theory
Number Theory a
n Digital Signal Processing
Applications of Digital Signal Processing
Symbolic and Knowledge-Based Signal Processing
Signals and Systems
Theory and Applications of Digital Signal Processing
Geophysical Signal Analysis
Signal Processing Algorithms
Digital Signal Analysis, 2/E
Seismic Applications of Homomorphic Signal Processing
Multimte Systems and Filter Banks
Adaptive Signal Processing
Fundamentals of
Statistical Signal Processing:
Estimation Theory
Steven M. Kay
University of Rhode Island
For book and bookstoreinformation
I I
http://wmn.prenhrll.com
gopherto gopher.prenhall.com
Upper Saddle River, NJ 07458
Contents
Preface xi
1 Introduction 1
1.1 Estimation in Signal Processing . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Mathematical Estimation Problem . . . . . . . . . . . . . . . . . . 7
1.3 Assessing Estimator Performance . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Some Notes to the Reader . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Minimum Variance Unbiased Estimation 15
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Unbiased Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Minimum Variance Criterion . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Existence of the Minimum Variance Unbiased Estimator . . . . . . . . . 20
2.6 Finding the Minimum Variance Unbiased Estimator . . . . . . . . . . . 21
2.7 Extension to a Vector Parameter . . . . . . . . . . . . . . . . . . . . . . 22
3 Cramer-Rao Lower Bound 27
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Estimator Accuracy Considerations . . . . . . . . . . . . . . . . . . . . . 28
3.4 Cramer-Rao Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5 General CRLB for Signals in White Gaussian Noise . . . . . . . . . . . . 35
3.6 Transformation of Parameters . . . . . . . . . . . . . . . . . . . . . . . . 37
3.7 Extension to a Vector Parameter . . . . . . . . . . . . . . . . . . . . . . 39
3.8 Vector Parameter CRLB for Transformations . . . . . . . . . . . . . . . 45
3.9 CRLB for the General Gaussian Case . . . . . . . . . . . . . . . . . . . 47
3.10 Asymptotic CRLB for WSS Gaussian Random Processes . . . . . . . . . 50
3.11 Signal Processing Examples . . . . . . . . . . . . . . . . . . . . . . . . . 53
3A Derivation of Scalar Parameter CRLB . . . . . . . . . . . . . . . . . . . 67
3B Derivation of Vector Parameter CRLB . . . . . . . . . . . . . . . . . . . 70
3C Derivation of General Gaussian CRLB . . . . . . . . . . . . . . . . . . . 73
3D Derivation of Asymptotic CRLB . . . . . . . . . . . . . . . . . . . . . . 77
vii
viii
CONTENTS
4 Linear Models
4.1 Introduction ....... .
4.2 Summary ........ .
4.3 Definition and Properties
4.4 Linear Model Examples
4.5 Extension to the Linear Model
5 General Minimum Variance Unbiased Estimation
5.1 Introduction ... .
5.2 Summary ......... .
5.3 Sufficient Statistics . . . . .
5.4 Finding Sufficient Statistics
5.5 Using Sufficiency to Find the MVU Estimator.
5.6 Extension to a Vector Parameter ....... .
5A Proof of Neyman-Fisher Factorization Theorem (Scalar Parameter) .
5B Proof of Rao-Blackwell-Lehmann-Scheffe Theorem (Scalar Parameter)
6 Best Linear Unbiased Estimators
6.1 Introduction.......
6.2 Summary . . . . . . . .
6.3 Definition of the BLUE
6.4 Finding the BLUE ...
6.5 Extension to a Vector Parameter
6.6 Signal Processing Example
6A Derivation of Scalar BLUE
6B Derivation of Vector BLUE
7 Maximum Likelihood Estimation
7.1 Introduction.
7.2 Summary . . . .
7.3 An Example ...
7.4 Finding the MLE
7.5 Properties of the MLE
7.6 MLE for Transformed Parameters
7.7 Numerical Determination of the MLE
7.8 Extension to a Vector Parameter
7.9 Asymptotic MLE ..... .
7.10 Signal Processing Examples ...
7A Monte Carlo Methods . . . . . .
7B Asymptotic PDF of MLE for a Scalar Parameter
7C Derivation of Conditional Log-Likelihood for EM Algorithm Example
8 Least Squares
8.1 Introduction.
8.2 Summary . .
83
83
83
83
86
94
101
101
101
102
104
107
116
127
130
133
133
133
134
136
139
141
151
153
157
157
157
158
162
164
173
177
182
190
191
205
211
214
219
219
219
CONTENTS
3 The Least Squares Approach
8.
8.4 Linear Least Squares . . . . .
8.5 Geometrical Interpretations
8.6 Order-Recursive Least Squares
8.7 Sequential Least Squares . .
8.8 Constrained Least Squares . . .
8.9 Nonlinear Least Squares ... .
8.10 Signal Processing Examples ......... .
8A Derivation of Order-Recursive Least Squares.
8B Derivation of Recursive Projection Matrix
8C Derivation of Sequential Least Squares
9 Method of Moments
9.1 Introduction ....
9.2 Summary . . . . .
9.3 Method of Moments
9.4 Extension to a Vector Parameter
9.5 Statistical Evaluation of Estimators
9.6 Signal Processing Example
10 The Bayesian Philosophy
10.1 Introduction ...... .
10.2 Summary . . . . . . . .
10.3 Prior Knowledge and Estimation
10.4 Choosing a Prior PDF ..... .
10.5 Properties of the Gaussian PDF.
10.6 Bayesian Linear Model . . . . . .
10.7 Nuisance Parameters ............... .
10.8 Bayesian Estimation for Deterministic Parameters
lOA Derivation of Conditional Gaussian PDF.
11 General Bayesian Estimators
11.1 Introduction ..
11.2 Summary ......... .
11.3 Risk Functions ...... .
11.4 Minimum Mean Square Error Estimators
11.5 Maximum A Posteriori Estimators . . . .
11.6 Performance Description ......... .
11.7 Signal Processing Example ......... : ........... .
llA Conversion of Continuous-Time System to DIscrete-TIme System
12 Linear Bayesian Estimators
12.1 Introduction ....... .
12.2 Summary . . . . . . . . .
12.3 Linear MMSE Estimation
ix
220
223
226
232
242
251
254
260
282
285
286
289
289
289
289
292
294
299
309
309
309
310
316
321
325
328
330
337
341
341
341
342
344
350
359
365
375
379
379
379
380
x
12.4 Geometrical Interpretations ..
12.5 The Vector LMMSE Estimator
12.6 Sequential LMMSE Estimation
12.7 Signal Processing Examples - Wiener Filtering
12A Derivation of Sequential LMMSE Estimator
13 Kalman Filters
13.1 Introduction ....... .
13.2 Summary . . . . . . . . .
13.3 Dynamical Signal Models
13.4 Scalar Kalman Filter
13.5 Kalman Versus Wiener Filters.
13.6 Vector Kalman Filter. . . .
13.7 Extended Kalman Filter . . . .
13.8 Signal Processing Examples . . . . .
13A Vector Kalman Filter Derivation ..
13B Extended Kalman Filter Derivation.
14 Sununary of Estimators
14.1 Introduction. . . . . .
14.2 Estimation Approaches.
14.3 Linear Model . . . . . .
14.4 Choosing an Estimator.
15 Extensions for Complex Data and Parameters
15.1 Introduction .......... .
15.2 Summary . . . . . . . . . . . . . . . .
15.3 Complex Data and Parameters . . . .
15.4 Complex Random Variables and PDFs
15.5 Complex WSS Random Processes ...
15.6 Derivatives, Gradients, and Optimization
15.7 Classical Estimation with Complex Data.
15.8 Bayesian Estimation ........ .
15.9 Asymptotic Complex Gaussian PDF . . .
15.10Signal Processing Examples ....... .
15A Derivation of Properties of Complex Covariance Matrices
15B Derivation of Properties of Complex Gaussian PDF.
15C Derivation of CRLB and MLE Formulas . . . . . . .
Al Review of Important Concepts
Al.l Linear and Matrix Algebra ............... .
Al.2 Probability, Random Processes. and Time Series Models
A2 Glc>ssary of Symbols and Abbreviations
INDEX
CONTENTS
384
389
392
400
415
419
419
419
420
431
442
446
449
452
471
476
479
479
479
486
489
493
493
493
494
500
513
517
524
532
535
539
555
558
563
567
567
574
583
589
Preface
Parameter estimation is a subject that is standard fare in the many books available
on statistics. These books range from the highly theoretical expositions written by
statisticians to the more practical treatments contributed by the many users of applied
statistics. This text is an attempt to strike a balance between these two extremes.
The particular audience we have in mind is the community involved in the design
and implementation of signal processing algorithms. As such, the primary focus is
on obtaining optimal estimation algorithms that may be implemented on a digital
computer. The data sets are therefore assumed. to be sa~ples of a continuous-t.ime
waveform or a sequence of data points. The chOice of tOpiCS reflects what we believe
to be the important approaches to obtaining an optimal estimator and analyzing its
performance. As a consequence, some of the deeper theoretical issues have been omitted
with references given instead.
It is the author's opinion that the best way to assimilate the material on parameter
estimation is by exposure to and working with good examples. Consequently, there are
numerous examples that illustrate the theory and others that apply the theory to actual
signal processing problems of current interest. Additionally, an abundance of homework
problems have been included. They range from simple applications of the theory to
extensions of the basic concepts. A solutions manual is available from the publisher.
To aid the reader, summary sections have been provided at the beginning of each
chapter. Also, an overview of all the principal estimation approaches and the rationale
for choosing a particular estimator can be found in Chapter 14. Classical estimation
is first discussed in Chapters 2-9, followed by Bayesian estimation in Chapters 10-13.
This delineation will, hopefully, help to clarify the basic differences between these two
principal approaches. Finally, again in the interest of clarity, we present the estimation
principles for scalar parameters first, followed by their vector extensions. This is because
the matrix algebra required for the vector estimators can sometimes obscure the main
concepts.
This book is an outgrowth of a one-semester graduate level course on estimation
theory given at the University of Rhode Island. It includes somewhat more material
than can actually be covered in one semester. We typically cover most of Chapters
1-12, leaving the subjects of Kalman filtering and complex data/parameter extensions
to the student. The necessary background that has been assumed is an exposure to the
basic theory of digital signal processing, probability and random processes, and linear
xi
xii PREFACE
and matrix algebra. This book can also be used for self-study and so should be useful
to the practicing engin.eer as well as the student.
The author would like to acknowledge the contributions of the many people who
over the years have provided stimulating discussions of research problems, opportuni-
ties to apply the results of that research, and support for conducting research. Thanks
are due to my colleagues L. Jackson, R. Kumaresan, L. Pakula, and D. Tufts of the
University of Rhode Island, and 1. Scharf of the University of Colorado. Exposure to
practical problems, leading to new research directions, has been provided by H. Wood-
sum of Sonetech, Bedford, New Hampshire, and by D. Mook, S. Lang, C. Myers, and
D. Morgan of Lockheed-Sanders, Nashua, New Hampshire. The opportunity to apply
estimation theory to sonar and the research support of J. Kelly of the Naval Under-
sea Warfare Center, Newport, Rhode Island, J. Salisbury of Analysis and Technology,
Middletown, Rhode Island (formerly of the Naval Undersea Warfare Center), and D.
Sheldon of th.e Naval Undersea Warfare Center, New London, Connecticut, are also
greatly appreciated. Thanks are due to J. Sjogren of the Air Force Office of Scientific
Research, whose continued support has allowed the author to investigate the field of
statistical estimation. A debt of gratitude is owed to all my current and former grad-
uate students. They have contributed to the final manuscript through many hours of
pedagogical and research discussions as well as by their specific comments and ques-
tions. In particular, P. Djuric of the State University of New York proofread much
of the manuscript, and V. Nagesha of the University of Rhode Island proofread the
manuscript and helped with the problem solutions.
Steven M. Kay
University of Rhode Island
Kingston, RI 02881
r
t
Chapter 1
Introduction
1.1 Estimation in Signal Processing
Modern estimation theory can be found at the heart of many electronic signal processing
systems designed to extract information. These systems include
1. Radar
2. Sonar
3. Speech
4. Image analysis
5. Biomedicine
6. Communications
7. Control
8. Seismology,
and all share the common problem of needing to estimate the values of a group of pa-
rameters. We briefly describe the first three of these systems. In radar we are mterested
in determining the position of an aircraft, as for example, in airport surveillance radar
[Skolnik 1980]. To determine the range R we transmit an electromagnetic pulse that is
reflected by the aircraft, causin an echo to be received b the antenna To seconds later~
as shown in igure 1.1a. The range is determined by the equation TO = 2R/c, where
c is the speed of electromagnetic propagation. Clearly, if the round trip delay To can
be measured, then so can the range. A typical transmit pulse and received waveform
a:e shown in Figure 1.1b. The received echo is decreased in amplitude due to propaga-
tIon losses and hence may be obscured by environmental nois~. Its onset may also be
perturbed by time delays introduced by the electronics of the receiver. Determination
of the round trip delay can therefore require more than just a means of detecting a
jump in the power level at the receiver. It is important to note that a typical modern
l
2
Transmit/
receive
antenna
Transmit pulse
'-----+01 Radar processing
system
(a) Radar
....................... - - .................... -- ... -1
Received waveform
:---------_ ... -_... _-------,
TO ~--------- ... ------ .. -- __ ..!
CHAPTER 1. INTRODUCTION
Time
Time
(b) Transmit and received waveforms
Figure 1.1 Radar system
radar s!,stem will input the received continuous-time waveform into a digital computer
by takmg samples via an analog-to-digital convertor. Once the waveform has been
sampled, the data compose a time series. (See also Examples 3.13 and 7.15 for a more
detailed description of this problem and optimal estimation procedures.)
Another common application is in sonar, in which we are also interested in the
posi~ion of a target, such as a submarine [Knight et al. 1981, Burdic 1984] . A typical
passive sonar is shown in Figure 1.2a. The target radiates noise due to machiner:y
on board, propellor action, etc. This noise, which is actually the signal of interest,
propagates through the water and is received by an array of sensors. The sensor outputs
1.1. ESTIMATION IN SIGNAL PROCESSING 3
Sea surface
Towed array
Sea bottom
---------------~~---------------------------~
(a) Passive sonar
Sensor 1 output
~ Time
~'C7~ Time
Sensor 3 output
f ~ ~ / Time
(b) Received signals at array sensors
Figure 1.2 Passive sonar system
are then transmitted to a tow ship for input to a digital computer. Because of the
positions of the sensors relative to the arrival angle of the target signal, we receive
the signals shown in Figure 1.2b. By measuring TO, the delay between sensors, we can
determine the bearing f3 Z<!T t~e ~~ress.!.o~
(
eTO)
f3 = arccos d (1.1)
where c is the speed of sound in water and d is the distance between sensors (see
Examples 3.15 and 7.17 for a more detailed description). Again, however, the received
4 CHAPTER 1. INTRODUCTION
-..... :~
<0
-.....
"&
0
.;, -1
-2
-3
0 2 4 6 8 10 12 14 16 18 20
Time (ms)
o 8 10 14
Time (ms)
Figure 1.3 Examples of speech sounds
waveforms are not "clean" as shown in Figure 1.2b but are embedded in noise, making,
the determination of To more difficult. The value of (3 obtained from (1.1) is then onli(
an estimate.
- Another application is in speech processing systems [Rabiner and Schafer 1978].
A particularly important problem is speech recognition, which is the recognition of
speech by a machine (digital computer). The simplest example of this is in recognizing
individual speech sounds or phonemes. Phonemes are the vowels, consonants, etc., or
the fundamental sounds of speech. As an example, the vowels /al and /e/ are shown
in Figure 1.3. Note that they are eriodic waveforms whose eriod is called the pitch.
To recognize whether a sound is an la or an lei the following simple strategy might
be employed. Have the person whose speech is to be recognized say each vowel three
times and store the waveforms. To reco nize the s oken vowel com are it to the
stored vowe s and choose the one that is closest to the spoken vowel or the one that
1.1. ESTIMATION IN SIGNAL PROCESSING
S
:s
-.....
<0
-.....
<0
....
.,
u
"
C.
"'
u.
p.. -10
...:l
"'0 -20
<=
:d
-30~
S
E !
!?!' -40-+
-0 !
c -50 I
.;::
" 0
p..
S
::=.. 30
1
-.....
'" 2°i
-.....
<0
t 10-+
u
::;
C.
0
'"
U
"- -10
...:l
"2 -20
id I
~ -30
1
i
500 1000 1500
Frequency (Hz)
2000
I
2500
~ -40-+
~ -50il--------~I--------_r1--------TI--------T-------~----
0:: 0 500 1000 1500 2000 2500
Frequency (Hz)
Figure 1.4 LPC spectral modeling
5
minimizes some distance measure. Difficulties arise if the itch of the speaker's voice
c anges from the time he or s e recor s the sounds (the training session) to the time
when the speech recognizer is used. This is a natural variability due to the nature of
human speech. In practice, attributes, other than the waveforms themselves, are used
to measure distance. Attributes are chosen that are less sllsceptible to variation. For
example, the spectral envelope will not change with pitch since the Fourier transform
of a periodic signal is a sampled version of the Fourier transform of one period of the
signal. The period affects only the spacing between frequency samples, not the values.
To extract the s ectral envelo e we em 10 a model of s eech called linear predictive
coding LPC). The parameters of the model determine the s ectral envelope. For the
speec soun SIll 19ure 1.3 the power spectrum (magnitude-squared Fourier transform
divided by the number of time samples) or periodogram and the estimated LPC spectral
envelope are shown in Figure 1.4. (See Examples 3.16 and 7.18 for a description of how
6 CHAPTER 1. INTRODUCTION
the parameters of the model are estimated and used to find the spectral envelope.) It
is interesting that in this example a human interpreter can easily discern the spoken
vowel. The real problem then is to design a machine that is able to do the same. In
the radar/sonar problem a human interpreter would be unable to determine the target
position from the received waveforms, so that the machine acts as an indispensable
tool.
In all these systems we are faced with the problem of extracting values of parameters
bas~ on continuous-time waveforms. Due to the use of di ital com uters to sample
and store e contmuous-time wave orm, we have the equivalent problem of extractin
parameter values from a discrete-time waveform or a data set. at ematically, we have
the N-point data set {x[O], x[I], ... ,x[N -In which depends on an unknown parameter
(). We wish to determine () based on the data or to define an estimator
{J = g(x[O],x[I], ... ,x[N - 1]) (1.2)
where 9 is some function. This is the problem of pammeter estimation, which is the
subject of this book. Although electrical engineers at one time designed systems based
on analog signals and analog circuits, the current and future trend is based on discrete-
time signals or sequences and digital circuitry. With this transition the estimation
problem has evolved into one of estimating a parameter based on a time series, which
is just a discrete-time process. Furthermore, because the amount of data is necessarily
finite, we are faced with the determination of 9 as in (1.2). Therefore, our problem has
now evolved into one which has a long and glorious history, dating back to Gauss who
in 1795 used least squares data analysis to predict planetary m(Wements [Gauss 1963
(English translation)]. All the theory and techniques of statisti~al estimation are at
our disposal [Cox and Hinkley 1974, Kendall and Stuart 1976-1979, Rao 1973, Zacks
1981].
Before concluding our discussion of application areas we complete the previous list.
4. Image analysis - Elstimate the position and orientation of an object from a camera
image, necessary when using a robot to pick up an object [Jain 1989]
5. Biomedicine - estimate the heart rate of a fetu~ [Widrow and Stearns 1985]
6. Communications - estimate the carrier frequency of a signal so that the signal can
be demodulated to baseband [Proakis 1983]
1. Control - estimate the position of a powerboat so that corrective navigational
action can be taken, as in a LORAN system [Dabbous 1988]
8. Seismology - estimate the underground distance of an oil deposit based on SOUD&
reflections dueto the different densities of oil and rock layers [Justice 1985].
Finally, the multitude of applications stemming from analysis of data from physical
experiments, economics, etc., should also be mentioned [Box and Jenkins 1970, Holm
and Hovem 1979, Schuster 1898, Taylor 1986].
1.2. THE MATHEMATICAL ESTIMATION PROBLEM 7
x[O]
Figure 1.5 Dependence of PDF on unknown parameter
1.2 The Mathematical Estimation Problem
In determining good .estimators the first step is to mathematically model the data.
~ecause the data are mherently random, we describe it by it§, probability density func-
tion (PDF) 01:" p(x[O], x[I], ... ,x[N - 1]; ()). The PDF is parameterized by the unknown
l2arameter ()J I.e., we have a class of PDFs where each one is different due to a different
value of (). We will use a semicolon to denote this dependence. As an example, if N = 1
and () denotes the mean, then the PDF of the data might be
p(x[O]; ()) = .:-." exp [__I_(x[O] _ ())2]
v 27rO'2 20'2
which is shown in Figure 1.5 for various values of (). It should be intuitively clear that
because the value of () affects the probability of xiO], we should be able to infer the value
of () from the observed value of x[OL For example, if the value of x[O] is negative, it is
doubtful tha~ () =:'.()2' :rhe value. (). = ()l might be more reasonable, This specification
of th~ PDF IS cntlcal m determmmg a good estima~. In an actual problem we are
not glv~n a PDF but .must choose one that is not only consistent with the problem
~onstramts and any pnor knowledge, but one that is also mathematically tractable. To
~llus~rate the appr~ach consider the hypothetical Dow-Jones industrial average shown
IP. FIgure 1.6. It. mIght be conjectured that this data, although appearing to fluctuate
WIldly, actually IS "on the average" increasing. To determine if this is true we could
assume that the data actually consist of a straight line embedded in random noise or
x[n] =A+Bn+w[n] n = 0, 1, ... ,N - 1.
~ reasonable model for the noise is that win] is white Gaussian noise (WGN) or each
sample of win] has the PDF N(0,O'2
) (denotes a Gaussian distribution with a mean
of 0 and a variance of 0'2) and is uncorrelated with all the other samples. Then, the
unknown parameters are A and B, which arranged as a vector become the vector
parameter 9 = [A Bf. Letting x = [x[O] x[I] ... x[N - lW, the PDF is
1 [1 N-l ]
p(x; 9) = (27rO'2)~ exp - 20'2 ~ (x[n]- A - Bn)2 . (1.3)
The choice of a straight line for the signal component is consistent with the knowledge
that the Dow-Jones average is hovering around 3000 (A models this) and the conjecture
8 CHAPTER 1. INTRODUCTION
3200
3150
~
<'$
3100
...
" 3050
~
'Il
3000
"
~
0
.-, 2950
~
0
2900
0
2850
2800
0 10 20 30 40 50 60 70 80 90 100
Day number
Figure 1.6 Hypothetical Dow-Jones average
that it is increasing (B > 0 models this). The assumption of WGN is justified by the
need to formulate a mathematically tractable model so that closed form estimators can
be found. Also, it is reasonable unless there is strong evidence to the contrary, such as
highly correlated noise. Of course, the performance of any estimator obtained will be
critically dependent on the PDF assum tions. We can onl hope the estimator obtained
is robust, in that slight changes in the PDF do not severely affect t per ormance of the
estimator. More conservative approaches utilize robust statistical procedures [Huber
1981J.
Estimation based on PDFs such as (1.3) is termed classical estimation in that the
parameters of interest are assumed to be deterministic but unknown. In the Dow-Jo~
average example we know a priori that the mean is somewhere around 3000. It seems
inconsistent with reality, then, to choose an estimator of A that can result in values as
low as 2000 or as high as 4000. We might be more willing to constrain the estimator
to produce values of A in the range [2800, 3200J. To incorporate this prior knowledge
we can assume that A is no Ion er deterministic but a random variable and assign it a
DF, possibly uni orm over the [2800, 3200J interval. Then, any subsequent estImator
will yield values in this range. Such an approach is termed Bayesian estimation. The
parameter we are attem tin to estimate is then viewed as a realization of the randQ;
, the data are described by the joint PDF
p(x,9) =p(xI9)p(9)
1.3. ASSESSING ESTIMATOR PERFORMANCE
3.0
i
2.5-+
1.0
"F.
fl 0.5
0.0
-1.0
-1.5
-{)'5~
-2.0-r--'I--il--il--il--il--II--II__"1'I__-+1----<1
o ill W ~ ~ M W m W 00 ~
n
Figure 1.7 Realization of DC level in noise
Once the PDF has been specified the problem becomes one f d t ..
" . ' 0 e ermmmg an
optImal estImator or functlOn of the data as in (1 2) Note that t' t
' . . an es Ima or may
depend on other par~meters, but only if they are known. An estimator may be thought
of as a rule that ~Sl ns a value to 9 for each realization of x. The estimate of 9 is
the va ue o. 9 obtal~ed .for a given realization of x. This distinction is analogous to a
random vanable (whIch IS a f~nction defined on the sample space) and the value it takes
on. Althoug~ some authors dIstinguish between the two by using capital and lowercase
letters, we WIll not do so. The meaning will, hopefully, be clear from the context.
1.3 Assessing Estimator Performance
Consider.the data set shown in Figure 1.7. From a cursory inspection it appears that
x[n] conslst~ of.a DC.level A in noise. (The use of the term DC is in reference to direct
current, whIch IS eqUlvalent to the constant function.) We could model the data as
x[nJ =A +wIn]
;~re w n denotes so~e zero ~ean noise ro~~ss. B~ed on the data set {x[O], x[1], .. .,
(
[[ .l]), we would .hke to estImate A. IntUltlvely, smce A is the average level of x[nJ
w nJ IS zero mean), It would be reasonable to estimate A as
I N-l
.4= N Lx[nJ
n=O
or by the sample mean of the data. Several questions come to mind:
I 1. How close will .4 be to A?
' 2. Are there better estimators than the sample mean?
9
10 CHAPTER 1. INTRODUCTION
For the data set in Figure 1.7 it turns out that .1= 0.9, which is close to the true value
of A = 1. Another estimator might be
A=x[o].
Intuitively, we would not expect this estimator to perform as well since it does not
make use of all the data. There is no averaging to reduce the noise effects. However,
for the data set in Figure 1.7, A = 0.95, which is closer to the true value of A than
the sample mean estimate. Can we conclude that A is a better estimator than A?
The answer is of course no. Because an estimator is a function of the data, which
are random variables, it too is a random variable, subject to many possible outcomes.
The fact that A is closer to the true value only means that for the given realization of
data, as shown in Figure 1.7, the estimate A = 0.95 (or realization of A) is closer to
the true value than the estimate .1= 0.9 (or realization of A). To assess performance
we must do so statistically. One possibility would be to repeat the experiment that
generated the data and apply each estimator to every data set. Then, we could ask
which estimator produces a better estimate in the majority of the cases. Suppose we
repeat the experiment by fixing A =1 and adding different noise realizations of win] to
generate an ensemble of realizations of x[n]. Then, we determine the values of the two
estimators for each data set and finally plot the histograms. (A histogram describes the
number of times the estimator produces a given range of values and is an approximation
to the PDF.) For 100 realizations the histograms are shown in Figure 1.8. It should
now be evident that A is a better estimator than A because the values obtained are
more concentrated about the true value of A =1. Hence, Awill uliWl-lly produce a value
closer to the true one than A. The skeptic, however, might argue-that if we repeat the
experiment 1000 times instead, then the histogram of A will be more concentrated. To
dispel this notion, we cannot repeat the experiment 1000 times, for surely the skeptic
would then reassert his or her conjecture for 10,000 experiments. To prove that A is
better we could establish that the variance is less. The modeling assumptions that we
must employ are that the w[n]'s, in addition to being zero mean, are uncorrelated and
have equal variance u 2
. Then, we first show that the mean of each estimator is the true
value or
(
1 N-l )
E(A) E N ~ x[nJ
1 N-l
= N L E(x[n])
n=O
A
E(A) E(x[O])
A
so that on the average the estimators produce the true value. Second, the variances are
(
1 N-l )
var(A) = var N ~ x[nJ
1.3. ASSESSING ESTIMATOR PERFORMANCE
30
r
'" 25j
'0 15
lil
1
20
j
r:~i------rl- r - I------;--------1~JI~m---r---I_
-3 -2 -1 0 2 ~
-1
Sample mean value, A
o
'I
1
i
I
First sample value, A
2 3
Figure 1.8 Histograms for sample mean and first sample estimator
1 N-l
N2 L var(x[nJ)
n=O
1 2
N2
Nu
u2
N
since the w[nJ's are uncorrelated and thus
var(A) var(x[OJ)
u2
> var(A).
11
12 CHAPTER 1. INTRODUCTION
Furthermore, if we could assume that w[n] is Gaussian, we could also conclude that the
probability of a given magnitude error is less for A. than for A (see Problem 2.7).
ISeveral important points are illustrated by the previous example, which should
always be ept in mind.
1. An estimator is a random variable. As such, its erformance can onl be com-
pletely descri e statistical y or by its PDF.
2. The use of computer simulations for assessing estimation performance, although
quite valuable for gaiiiing insight and motivating conjectures, is never conclusive.
At best, the true performance may be obtained to the desired degree of accuracy.
At worst, for an insufficient number of experiments and/or errors in the simulation
techniques employed, erroneous results may be obtained (see Appendix 7A for a
further discussion of Monte Carlo computer techniques).
Another theme that we will repeatedly encounter is the tradeoff between perfor:
mance and computational complexity. As in the previous example, even though A
has better performance, it also requires more computation. We will see that QPtimal
estimators can sometimes be difficult to implement, requiring a multidimensional opti-
mization or inte ration. In these situations, alternative estimators that are suboptimal,
but which can be implemented on a igita computer, may be preferred. For any par-
ticular application, the user must determine whether the loss in performance is offset
by the reduced computational complexity of a suboptimal estimator.
1.4 Some Notes to the Reader
Our philosophy in presenting a theory of estimation is to provide the user with the
main ideas necessary for determining optimal estimator.§. We have included results
that we deem to be most useful in practice, omitting some important theoretical issues.
The latter can be found in many books on statistical estimation theory which have
been written from a more theoretical viewpoint [Cox and Hinkley 1974, Kendall and
Stuart 1976--1979, Rao 1973, Zacks 1981]. As mentioned previously, our goal is t<;)
obtain an 0 timal estimator, and we resort to a subo timal one if the former cannot
be found or is not implementa ~. The sequence of chapters in this book follows this
approach, so that optimal estimators are discussed first, followed by approximately
optimal estimators, and finally suboptimal estimators. In Chapter 14 a "road map" for
finding a good estimator is presented along with a summary of the various estimators
and their properties. The reader may wish to read this chapter first to obtain an
overview.
We have tried to maximize insight by including many examples and minimizing
long mathematical expositions, although much of the tedious algebra and proofs have
been included as appendices. The DC level in noise described earlier will serve as a
standard example in introducing almost all the estimation approaches. It is hoped
that in doing so the reader will be able to develop his or her own intuition by building
upon previously assimilated concepts. Also, where possible, the scalar estimator is
REFERENCES 13
presented first followed by the vector estimator. This approach reduces the tendency
of vector/matrix algebra to obscure the main ideas. Finally, classical estimation is
described first, followed by Bayesian estimation, again in the interest of not obscuring
the main issues. The estimators obtained using the two approaches, although similar
in appearance, are fundamentally different.
The mathematical notation for all common symbols is summarized in Appendix 2.
The distinction between a continuous-time waveform and a discrete-time waveform or
sequence is made through the symbolism x(t) for continuous-time and x[n] for discrete-
time. Plots of x[n], however, appear continuous in time, the points having been con-
nected by straight lines for easier viewing. All vectors and matrices are boldface with
all vectors being column vectors. All other symbolism is defined within the context of
the discussion.
References
Box, G.E.P., G.M. Jenkins, Time Series Analysis: Forecasting and Contro~ Holden-Day, San
Francisco, 1970.
Burdic, W.S., Underwater Acoustic System Analysis, Prentice-Hall, Englewood Cliffs, N.J., 1984.
Cox, D.R., D.V. Hinkley, Theoretical Statistics, Chapman and Hall, New York, 1974.
Dabbous, T.E., N.U. Ahmed. J.C. McMillan, D.F. Liang, "Filtering of Discontinuous Processes
Arising in Marine Integrated Navigation," IEEE Trans. Aerosp. Electron. Syst., Vol. 24,
pp. 85-100, 1988.
Gauss, K.G., Theory of Motion of Heavenly Bodies, Dover, New York, 1963.
Holm, S., J.M. Hovem, "Estimation of Scalar Ocean Wave Spectra by the Maximum Entropy
Method," IEEE J. Ocean Eng., Vol. 4, pp. 76-83, 1979.
Huber, P.J., Robust Statistics, J. Wiley, ~ew York, 1981.
Jain, A.K., Fundamentals of Digital Image ProceSSing, Prentice-Hall, Englewood Cliffs, N.J., 1989.
Justice, J.H.. "Array Processing in Exploration Seismology," in Array Signal Processing, S. Haykin,
ed., Prentice-HaU, Englewood Cliffs, N.J., 1985.
Kendall, Sir M., A. Stuart, The Advanced Theory of Statistics, Vols. 1-3, Macmillan, New York,
1976--1979.
Knight, W.S., RG. Pridham, S.M. Kay, "Digital Signal Processing for Sonar," Proc. IEEE, Vol.
69, pp. 1451-1506. Nov. 1981.
Proakis, J.G., Digital Communications, McGraw-Hill, New York, 1983.
Rabiner, L.R., RW. Schafer, Digital Processing of Speech Signals, Prentice-Hall, Englewood Cliffs,
N.J., 1978.
Rao, C.R, Linear Statistical Inference and Its Applications, J. Wiley, New York, 1973.
Schuster, !," "On the Investigation of Hidden Periodicities with Application to a Supposed 26 Day
.PerIod of Meterological Phenomena," Terrestrial Magnetism, Vol. 3, pp. 13-41, March 1898.
Skolmk, M.L, Introduction to Radar Systems, McGraw-Hill, ~ew York, 1980.
Taylor, S., Modeling Financial Time Series, J. Wiley, New York, 1986.
Widrow, B., Stearns, S.D., Adaptive Signal Processing, Prentice-Hall, Englewood Cliffs, N.J., 1985.
Zacks, S., Parametric Statistical Inference, Pergamon, New York, 1981.
14
Problems
CHAPTER 1. INTRODUCTION
1. In a radar system an estimator of round trip delay To has the PDF To ~ N(To, (J~a)"
where 7< is the true value. If the range is to be estimated, propose an estimator R
and find its PDF. Next determine the standard deviation (J-ra so that 99% of th~
time the range estimate will be within 100 m of the true value. Use c = 3 x 10
mls for the speed of electromagnetic propagation.
2. An unknown parameter fJ influences the outcome of an experiment which is mod-
eled by the random variable x. The PDF of x is
p(x; fJ) = vkexp [-~(X -fJ?) .
A series of experiments is performed, and x is found to always be in the interval
[97, 103]. As a result, the investigator concludes that fJ must have been 100. Is
this assertion correct?
3. Let x = fJ +w, where w is a random variable with PDF Pw(w)..IfbfJ is (a dfJet)er~in~
istic parameter, find the PDF of x in terms of pw and denote It y P x; ... ex
assume that fJ is a random variable independent of wand find the condltlO?al
PDF p(xlfJ). Finally, do not assume that eand ware independent and determme
p(xlfJ). What can you say about p(x; fJ) versus p(xlfJ)?
4. It is desired to estimate the value of a DC level A in WGN or
x[n] = A +w[n] n = 0,1, ... , N - C
where w[n] is zero mean and uncorrelated, and each sample has variance (J2 = l.
Consider the two estimators
1 N-I
N 2:: x[n]
n=O
A =
A _1_ (2X[0] +~ x[n] +2x[N - 1]) .
N + 2 n=1
Which one is better? Does it depend on the value of A?
5. For the same data set as in Problem 1.4 the following estimator is proposed:
{
x [0]
A= ~'~x[n] A2 = A2 < 1000.
.,.2 -
The rationale for this estimator is that for a high enough signal-to-noise ratio
(SNR) or A2/(J2, we do not need to reduce the effect of.noise by averaging and
hence can avoid the added computation. Comment on thiS approach.
Chapter 2
Minimum Variance Unbiased
Estimation
2.1 Introduction
In this chapter we will be in our search for good estimators of unknown deterministic
parame ers. e will restrict our attention to estimators which on the average yield
the true parameter value. Then, within this class of estimators the goal will be to find
the one that exhibits the least variability. Hopefully, the estimator thus obtained will
produce values close to the true value most of the time. The notion of a minimum
variance unbiased estimator is examined within this chapter, but the means to find it
will require some more theory. Succeeding chapters will provide that theory as well as
apply it to many of the typical problems encountered in signal processing.
2.2 Summary
An unbiased estimator is defined by (2.1), with the important proviso that this holds for
all possible values of the unknown parameter. Within this class of estimators the one
with the minimum variance is sought. The unbiased constraint is shown by example
to be desirable from a practical viewpoint since the more natural error criterion, the
minimum mean square error, defined in (2.5), generally leads to unrealizable estimators.
Minimum variance unbiased estimators do not, in eneral, exist. When they do, several
methods can be used to find them. The methods reI on the Cramer-Rao ower oun
and the concept of a sufficient statistic. If a minimum variance unbiase estimator
does not exist or if both of the previous two approaches fail, a further constraint on the
estimator, to being linear in the data, leads to an easily implemented, but suboptimal,
estimato!,;
15
16 CHAPTER 2. MINIMUM VARIANCE UNBIASED ESTIMATION
2.3 Unbiased Estimators
For an estimator to be unbiased we mean that on the average the estimator will yield
the true value of the unknown parameter. Since the parameter value may in general be
anywhere in the interval a < 8 < b, unbiasedness asserts that no matter what the true
value of 8, our estimator will yield it on the average. Mathematically, an estimator i~
~~~il •
E(iJ) = 8 (2.1)
where (a,b) denotes the range of possible values of 8.
Example 2.1 - Unbiased Estimator for DC Level in White Gaussian Noise
Consider the observatioJ!s
x[n) = A +w[n) n = 0, 1, ... ,N - 1
where A is the parameter to be estimated and w[n] is WGN. The parameter A can
take on any value in the interval -00 < A < 00. Then, a reasonable estimator for the
average value of x[n] is
or the sample mean. Due to the linearity properties of the expectation operator
[
1 N-1 ]
E(A.) = E N ~ x[n)
1 N-1
N L E(x[nJ)
n=O
N-1
~LA
n=O
= A
for all A. The sample mean estimator is therefore unbiased.
(2.2)
<>
In this example A can take on any value, although in general the values of an unknown
parameter may be restricted by physical considerations. Estimating the resistance R
of an unknown resistor, for example, would necessitate an interval 0 < R < 00.
Unbiased estimators tend to have symmetric PDFs centered about the true value of
8, although this is not necessary (see Problem 2.5). For Example 2.1 the PDF is shown
in Figure 2.1 and is easily shown to be N(A, (72/N) (see Problem 2.3).
The restriction that E(iJ) =8 for all 8 is an important one. Lettin iJ =
x = [x 0 x , it asserts that
E(iJ) = Jg(x)p(x; 8) dx = 8 for all 8. (2.3)
2.3. UNBIASED ESTIMATORS 17
A
Figure 2.1 Probability density function for sample mean estimator
It is possible, however, that (2.3) may hold for some values of 8 and not others as the
next example illustrates. '
Example 2.2 - Biased Estimator for DC Level in White Noise
Consider again Example 2.1 but with the modified sample mean estimator
Then,
_ 1 N-1
A= 2N Lx[n].
E(A)
n=O
~A
2
A if A=O
# A if A # o.
It is seen that (2.3) holds for the modified estimator only for A = o. Clearly, A is a
biased estimator. <>
That an estimator is unbiased does not necessarily mean that it is a good estimator.
It only guarantees that on the average it will attain the true value. On the other hand
biased estimators are ones that are characterized by a systematic error, which presum~
ably should not be present. A persistent bias will always result in a poor estimator.
As an example, the unbiased property has an important implication when several es-
timators are combined (see Problem 2.4). ~t s?metimes occurs that multiple estimates
~th~ same paran:eter ar.e available, i.e., {81, 82 , •.. , 8n }. A reasonable procedure is to
combme these estimates mto, hopefully, a better one by averaging them to form
. 1 ~.
8=- ~8i.
n i=l
(2.4)
Assuming the estimators are unbiased, with the same variance, and uncorrelated with
each other,
E(iJ) = 8
X[~] ~ N (f)/ 1 )
X [1] -'?-  f'I C~, 1) , &;; 0 I
') N (f), 2) ) 9-.( 0 (
i1" ~ [X[,1+ x[l] )
'2
O
2 > !.. ('2. Xfr,J + X[11)
q,
ry 1"
tt;:; [~1 ~2 ~ -- &p 1
~= [9~~~.~Sp]" I
-----.• _-_........_. ..
f. (~);: @'1' Qi 1. ~. <(. J L i £. p -=>
' 1 '9t I - t
GJ-~')~D-.-----:-===:====---:::::--~-...------:;-----====--
'- tlt¥;0: 0'"" J:, ~ e, l : r I (!. ~~,_
f J.!.*
~il ;! ,1
-I '" '>1 ( 1 'v H 1
'" ,". [J
!:'
tJ )"7. "'A J IJ ~~ Ju Z.
~ k<(-' :r&; - ~ ?: t .1
1; ~ 9. .f C J~ L~( I
N ,=, tV '>1 1~' )/'J 1:" 1
~@( ' ")2 .. Q. ~ ~ '" ~ ("'--1 .J -1 1 J 1~ ~ ~I-
" 1: -=., i ~i - -::; E Z&i . t :c9i i- -; to ?:Sr: .
~ '<I N '''-' ,tl IJ ,.,
~ .(}oJ A)1 ~ (t ) IV A ~Z r ~ 1fI A ~ ). JJ ;
;, -;- t "[: @,: - - E. L 1)1 = - t (L" e
i
 _ E [~
IV ,. I /VZ ,-., N' ;., I ,<I
~ ~~9~~: _t f-(e~-E~e~n ;t(l;~e/~_E1S
i ~2)
q ~ I ,., .... I .. -.~~ 1 .=: I
/ ~ ')
i; ~~6/(..(li~/~ E
~t~~i~ f~~J'1?~it~914
tl"~tt~&.~ i~-ks--J> 1~0; ~~C ~&j ~"t, { G1 ~_
18
p(8)
CHAPTER 2. MINIMUM VARIANCE UNBIASED ESTIMATION
n increases
~
__+-____~__~__~~____ O
IJ
p(O)
(a) Unbiased estimator
n increases
..
__~~--~--~~-------O
__+-_'-...,.---'..,.....--------- 0
E(9) IJ
(b) Biased estimator
Figure 2.2 Effect of combining estimators
and
= var(8d
n
so that as more estimates are averaged, the variance will decrease. Ultimately, as
n -+ 0Cl, {} -+ 8. However, if the estimators are biased or E(8i ) = 8 + b(8), then
8 + b(8)
and no matter how many estimators are avera ed 8 will not conver e to the true value.
This is depicted in igure 2.2. Note that, in general,
b(8) = E(8) - 8
is defined as the bias of the estimator.
2.4. MINIMUM VARIANCE CRITERION 19
2.4 Minimum Variance Criterion
In searching for optimal estimators we need to ado t some 0
natura one is the mean square error (MSE), defined as
(2.5)
This measures the avera e mean s uared deviation of the estimator from the t e value.
Unfortunate y, adoption of this natural criterion leads to unrealizable estimators ones
that cannot be written solely as a function of the data. To understand the pr~blem
which arises we first rewrite the MSE as
mse(8) = E{[(8-E(O))+(E(O)-8)f}
var(O) + [E(8) - 8f
var(8) +b2
(8) (2.6)
which shows that the MSE is composed of errors due to the variance of the estimator as
well as the bias. As an example, for the problem in Example 2.1 consider the modified
estimator
. 1 N-l
A=a
N
Lx[nj
n=O
for some c:onstant a. We wi~l attempt to find the a which results in the minimum MSE.
Since E(A) = aA and var(A) = a2(j2/N, we have, from (2.6),
Differentiating the MSE with respect to a yields
dmse(A) 2a(j2 2
da = N + 2(a - l)A
which upon setting to zero and solving yields the optimum value
A2
a - -------
opt - A2 +(j2/N'
It is seen that, unfortunately, the optimal value of a depends upon the unknown param-
eter A. ~he estimator is therefore not realizable. In retrospect the estimator depends
up~n A smce the bias ter~ in ~2.6) is a function of A. It would seem that any criterion
which depends on the bias Will lead to an unrealizable estimator. Although this is
gene~~lly true, on occasion realizable minimum MSE estimators can be found [Bibby
and Ioutenburg 1977, Rao 1973, Stoica and Moses 1990j.
20 CHAPTER 2. MINIMUM VARIANCE UNBIASED ESTIMATION
var(8)
_ iii
-+-----
-4-------- 8
2
-1_____--- 83 = MVU estimator
--~------------------- 9
(a)
var(8)
_+-__----- 81
- - 82
03
NoMVU
estimator
--+-----~~----------- 9
90
(b)
Figure 2.3 Possible dependence of estimator variance with (J
From a practical view oint the minimum MSE estimator needs to be abandoned.
An alternative approach is to constrain t e bias to be zero and find the estimator which
minimizes the variance. Such an estimator is termed the minimum variance unbiased
(MVU) estimator. Note that from (2.6) that the MSE of an unbiased estimator is just
the variance.
Minimizing the variance of an unbiased estimator also has the effect of concentrating
the PDF of the estimation error, 0- B, about zero (see Problem 2.7). The estimatiw
error Will therefore be less likely to be large.
2.5 Existence of the Minimum Variance Unbiased
Estimator
The uestion arises as to whether a MVU estimator exists Le., an unbiased estimator
wit minimum variance for all B. Two possible situations are describe in Figure ..
If there are three unbiased estimators that exist and whose variances are shown in
Figure 2.3a, then clearly 03 is the MVU estimator. If the situation in Figure 2.3b
exists, however, then there is no MVU estimator since for B < Bo, O
2 is better, while
for iJ > Bo, 03 is better. In the former case 03 is sometimes referred to as the uniformly
minimum variance unbiased estimator to emphasize that the variance is smallest for
all B. In general, the MVU estimator does not always exist, as the following example
illustrates. .
Example 2.3 - Counterexample to Existence of MVU Estimator
Ifthe form ofthe PDF changes with B, then it would be expected that the best estimator
would also change with B. Assume that we have two independent observations x[Q] and
x[l] with PDF
x [0]
x[l]
N(B,l)
{
N(B,l)
N(B,2)
if B~ Q
if B< O.
••7F."'F'~· __ ·_ _
' ' ' ' '. . . . ._ _ _ _- _ _
:"'$
FINDING THE MVU ESTIMATOR
2.6.
var(ii)
_------"127
/
36
•••••••••.••.•.••••·2·ciiji :~!~?.................. ii2
21
18/36 01
__---------t--------------- 9
Figure 2.4 Illustration of nonex-
istence of minimum variance unbi-
ased estimator
The two estimators
1
- (x[Q] + x[l])
2
2 1
-x[Q] + -x[l]
3 3
can easily be shown to be unbiased. To compute the variances we have that
so that
and
1
- (var(x[O]) +var(x[l]))
4
4 1
-var(x[O]) + -var(x[l])
9 9
¥s if B< O.
The variances are shown in Figure 2.4. Clearly, between these two esti~~tors no M:'U
estimator exists. It is shown in Problem 3.6 that for B ~ 0 the mInimum possible
variance of an unbiased estimator is 18/36, while that for B < 0 is 24/36. Hence, no
single estimator can have a variance uniformly less than or equal to the minima shown
in Figure 2.4. 0
To conclude our discussion of existence we should note that it is also possible that there
may not exist even a single unbiased estima.!2E (see Problem 2.11). In this case any
search for a MVU estimator is fruitless.
2.6 Finding the Minimum Variance
Unbiased Estimator
Even if a MVU estimator exists, we may not be able to find it.
urn-t e-crank" procedure which will always produce the estimator.
chapters we shall discuss several possible approaches. They are:
is no known
In the next few
22 CHAPTER 2. MINIMUM VARIANCE UNBIASED ESTIMATION
var(O)
............ •••••••••••••••····CRLB Figure 2.5 Cramer-Rao
----------------r-------------------- 9 lower bound on variance of unbiased
estimator
1. Determine the Cramer-Rao lower bound CRLB and check to see if some estimator
satisfies it Chapters 3 and 4).
2. Apply the Rao-Blackwell-Lehmann-Scheffe (RBLS) theorem (Chapter 5).
3. Further restrict the class of estimators to be not only unbiased but also linear. Then,
find the minimum variance estimator within this restricted class (Chapte~ 6).
Approaches 1 and 2 may produce the MVU estimator, while 3 will yield it only if the
MVU estimator is linear III the data.
The CRLB allows us to determine that for any unbiased estimator the variance
must be greater than or equal to a given value, as shown in Figure 2.5. If an estimator
exists whose variance equals the CRLB for each value of (), then it must be the MVU
estimator. In this case, the theory of the CRLB immediately yields the estimator. It
may happen that no estimator exists whose variance equals the bound. Yet, a MV~
estimator may still exist, as for instance in the case of ()! in Figure 2.5. Then, we
must resort to the Rao-Blackwell-Lehmann-Scheffe theorem. Thts procedure first find;
a su czen s atistic, one whic uses a the data efficient! and then nds a unction
of the su dent statistic which is an unbiased estimator oL(}. With a slight restriction
of the PDF of the data this procedure will then be guaranteed to produce the MVU
estimator. The third approach requires the estimator to be linear, a sometimes severe
restriction, and chooses the best linear estimator. Of course, only for particular data
sets can this approach produce the MVU estimator.
.
2.7 Extension to a Vector Parameter
If 8 = [(}l (}2 •.. (}"jT is a vector of unknown parameter~, then we say that an estimator
, A A 'T •
8 = [(}1 (}2 •.• (},,] is unbiase~j,f
ai < (}i < b; (2.7)
for i = 1,2, ... ,p. By defining
REFERENCES 23
we can equivalently define an unbiased estimator to have the property
E(8) = 8
for every 8 contained wjthjn the space defined in (2.7). A MVU estimator has the
~ditional property that var(Bi) for i = 1,2, ... ,p is minimum among all unbiased
estimators.
References
Bibbv, J .. H. Toutenburg, Prediction and Improved Estimation in Linear Models, J. Wiley, New
. York, 1977.
Rao, C.R., Linear Statistical Inference and Its Applications, J. Wiley, New York, 1973.
Stoica, P., R. Moses, "On Biased Estimators and the Unbiased Cramer-Rao Lower Bound," Signal
Process., Vol. 21, pp. 349-350, 1990.
Problems
2.1 The data {x[O], x[I], ... ,x[N - I]} are observed where the x[n]'s are independent
and identically distributed (lID) as N(0,a2
). We wish to estimate the variance
a2
as
Is this an unbiased estimator? Find the variance of ;2 and examine what happens
as N -t 00.
2.2 Consider the data {x[O],x[I], ... ,x[N -l]}, where each sample is distributed as
U[O, ()] and the samples are lID. Can you find an unbiased estimator for ()? The
range of () is 0 < () < 00.
2.3 Prove that the PDF of Agiven in Example 2.1 is N(A, a2
IN).
2.4 The heart rate h of a patient is automatically recorded by a computer every 100 ms.
In 1 s the measurements {hI, h2 , ••• , hlO } are averaged to obtain h. If E(h;) = ah
for some constant a and var(hi) = 1 for each i, determine whether averaging
improves the estimator if a = 1 and a = 1/2. Assume each measurement is
uncorrelated.
2.5 Two samples {x[0], x[l]} are independently observed from a N(O, a2 ) distribution.
The estimator
A 1
a2
= 2'(x2
[0] + x2
[1])
is unbiased. Find the PDF of ;2 to determine if it is symmetric about a2 •
24
CHAPTER 2. MINIMUM VARIANCE UNBIASED ESTIMATION
2.6 For the problem described in Example 2.1 the more general estimator
.v-I
A = L anx[n]
n=O
is proposed. Find the an's so that the estimator is unbiased and the variance is
minimized. Hint: Use Lagrangian mUltipliers with unbiasedness as the constraint
equation.
2.7 Two unbiased estimators are proposed whose variances satisfy var(O) < var(B). If
both estimators are Gaussian, prove that
for any f: > O. This says that the estimator with less variance is to be preferred
since its PDF is more concentrated about the true value.
2.8 For the problem described in Example 2.1 show that as N -t 00, A-t A by using
the results of Problem 2.3. To do so prove that
lim Pr {IA - AI> f:} = 0
N-+oo
for any f: > O. In this case the estimator A is said to be consistent. Investigate
what happens if the alternative estimator A = 2~ L::OI x[n] is used instead.
2.9 This problem illustrates what happens to an unbiased est!1nator when it undergoes
a nonlinear transformation. In Example 2.1, if we choose to estimate the unknown
parameter () = A2 by
0= (~ ~Ix[n]r,
can we say that the estimator is unbiased? What happens as N -t oo?
2.10 In Example 2.1 assume now that in addition to A, the value of 172 is also unknown.
We wish to estimate the vector parameter
Is the estimator
, N Lx[n]
A n=O
[
1 N-I ]
[,;,1~ N ~1 ~(x[n] - A)'
unbiased?
PROBLEMS 25
. I bservation x[O] from the distribution Ufo, 1/(}], it is desired to
2.11 Given a sm
g
r e. 0 d that () > O. Show that for an estimator 0= g(x[O]) to
estimate (). t IS assume
be unbiased we must have
1#g(u)du = l.
. that a function 9 cannot be found to satisfy this condition for all () > O.
Next prove
Chapter 3
Cramer-Rao Lower Bound
3.1 Introduction
Being able to place a lower bound on the variance of any unbiased estimator proves
to be extremely useful in practice. At best, it allows us to assert that an estimator is
the MVU estimator. This will be the case if the estimator attains the bound for all
values of the unknown parameter. At worst, it provides a benchmark against which we
can compare the performance of any unbiased estimator. Furthermore, it alerts us to
the physical impossibility of finding an unbiased estimator whose variance is less than
the bound. The latter is often useful in signal processing feasibility studies. Although
many such variance bounds exist [McAulay and Hofstetter 1971, Kendall and Stuart
1979, Seidman 1970, Ziv and Zakai 1969], the Cramer-Rao lower bound (CRLB) is by
far the easiest to determine. Also, the theory allows us to immediately determine if
an estimator exists that attains the bound. If no such estimator exists, then all is not
lost since estimators can be found that attain the bound in an approximate sense, as
described in Chapter 7. For these reasons we restrict our discussion to the CRLB.
3.2 Summary
The CRLB for a scalar parameter is given by (3.6). If the condition (3.7) is satisfied,
then the bound will be attained and the estimator that attains it is readily found.
An alternative means of determining the CRLB is given by (3.12). For a signal with
an unknown parameter in WGN, (3.14) provides a convenient means to evaluate the
bound. When a function of a parameter is to be estimated, the CRLB is given by
(3.16). Even though an efficient estimator may exist for (), in general there will not be
one for a function of () (unless the function is linear). For a vector parameter the CRLB
is determined using (3.20) and (3.21). As in the scalar parameter case, if condition
(3.25) holds, then the bound is attained and the estimator that attains the bound is
easily found. For a function of a vector parameter (3.30) provides the bound. A general
formula for the Fisher information matrix (used to determine the vector CRLB) for a
multivariate Gaussian PDF is given by (3.31). Finally, if the data set comes from a
27
28
CHAPTER 3. CRAMER-RAO LOWER BOUND
PI (x[0] =3; A)
p2(X[0] = 3; A)
__r--r__~-r--~-r--r-------- A
2 3 4 5 6
2 3 4 5 6
(a) <11 = 1/3 (b) <12 = 1
Figure 3.1 PDF dependence on unknown parameter
WSS Gaussian random process, then an approximate CRLB, that depends on the PSD,
is given by (3.34). It is valid asymptotically or as the data record length becomes large.
3.3 Estimator Accuracy Considerations
Before stating the CRLB theorem, it is worthwhile to expose the hidden factors that
determine how well we can estimate a parameter. Since all our information is embodied
in the observed data and the underlying PDF for that data it is not sur risin that the
estimation accuracy depen s uect Yon the PDF. For instance, we should not expect
to be able to estimate a parameter with any degree of accuracy if the PDF depends
only weakly upon that parameter, or in the extreme case, i!.the PDF does not depend
on it at all. In general, the more the PDF is influenced by the unknown parameter, the
better we shou e a e to estimate it.
Example 3.1 _ PDF Dependence on Unknown Parameter
If a single sample is observed as
x[O] = A + w[O]
where w[O] '" N(O, 0-2 ), and it is desired to estimate A, then we expect a better estimate
if 0-2 is small. Indeed, a good unbiased estimator is A. = x[O]. The variance is, of course,
just 0-2, so that the estimator accuracy improves as 0-
2
decreases. An alternative way
of viewing this is shown in Figure 3.1, where the PDFs for two different variances are
shown. They are
Pi(X[O]; A) =.)21 2 exp r- 21 2 (x[O] - A)21
21rO-i l o-i
for i = 1,2. The PDF has been plotted versus the unknown parameter A for a given
value of x[O]. If o-~ < o-~, then we should be able to estimate A more accurately based
on PI (x[O]; A). We may interpret this result by referring to Figure 3.1. If x[O] = 3 and
al = 1/3, then as shown in Figure 3.1a, the values of A> 4 are highly unlikely. To see
3.3. ESTIMATOR ACCURACY CONSIDERATIONS 29
this we determine the probability of observing x[O] in the interval [ [ ]-
[3 _ J/2, 3 +J/2] when A takes on a given value or x 0 J/2, x[0]+J/2] =
{ J J} r3+~
Pr 3 - 2" :::; x[O] :::; 3 + 2" = J3-~ Pi(U; A) du
2
which for J small is Pi (x[O] = 3; A)J. But PI (x[O] = 3' A = 4)J - .
3; A = 3)J = 1.20J. The probability of observing x[O] 'in a l~ O.OlJ, while PI (x [0] =
x[O] = 3 when A = 4 is small with respect to that h S;=- ~nterval centered about
A > 4 can be eliminated from consideration. It mthte~ - . Hence, the values ?f
the interval 3 ± 3a l = [2,4] are viable candidates ~or t~ea~~ed. tha~ values of A III
is a much weaker dependence on A. Here our . b'l d'd Fill. Figure 3.1b there
interval 3 ± 3a2 = [0,6]. via e can I ates are III the much wider
o
When the PDF is viewed as a function of th k
it is termed the likelihood function. Two exam l:su~f ~~w~ paramete: (with x fixed),
in Figure 3.1. Intuitively, the "sharpness" of &e likelih~h~o~d fu.nctlOns we:e shown
accurately we can estimate the unknown t T 0 unctIOn determllles how
ha h parame er 0 quantify thi f b
t t t e sharpness is effectively measured b th . . s no Ion 0 serve
the logarithm of the likelihood function at i~s :a~eg;t~~e .of the second derivative of
likelihood function. In Example 3 1 if .P
d ·h IS IS the curvature of the log-
., we consI er t e natural logarithm of the PDF
Inp(x[O]; A) = -In v'21ra2 - _l_(x[O]_ A)2
2a2
then the first derivative is
81np(x[0]; A) 1
8A = 0-2 (x[O] - A) (3.2)
and the negative of the second derivative becomes
_ 82
1np(x[0];A) 1
8A2 -a2 '
(3.3)
'1;.'he curvature increases as a2
decreases Since
A =x[O] has variance a2 then cor thO . 1 we already know that the estimator
, l' IS examp e
var(A.) = 1
82lnp(x[0]; A) (3.4)
8A2
and the variance decreases as the curvat .
second derivative does not depend on x[O u~~ Illcrease~. ~lthough in this example the
~easure of curvature is ], general It Will. Thus, a more appropriate
_ E [82
1np(x[O]; A)]
8A2 (3.5)
30 CHAPTER 3. CRAMER-RAO LOWER BOUND
which measures the avemge curvature of the log-likelihood function. The expectation
is taken with respect to p(x[O]; A), resultin in a function of A onl . The expectation
ac nowe ges the fact that t e i elihood function, which depends on x[O], is itself a
random variable. The larger the quantity in (3.5), the smaller the variance of the
estimator.
3.4 Cramer-Rao Lower Bound
We are now ready to state the CRLB theorem.
Theorem 3.1 (Cramer-Rao Lower Bound - Scalar Parameter) It is assumed
that the PDF p(x; 9) satisfies the "regularity" condition
E[81n~~X;9)] =0 for all 9
where the expectation is taken with respect to p(X; 9). Then, the variance of any unbiased
estimator {) must satisfy
var(8) > _~,.....1---:-~-=­
-_ [82
Inp(X;9)]
E 892
(3.6)
where the derivative is evaluated at the true value of 9 and the expectation is taken with
respect to p(X; 9). Furthermore, an unbiased estimator may be found that attains the
bound for all 9 if and only if •
8Inp(x;6} =I(9)(g(x) _ 9}
89
(3.7)
for some functions 9 and I. That estimator which is the MVU estimator is {) = x),
and the minimum variance is 1 1(9).
The expectation in (3.6) is explicitly given by
E [8
2
1n p(X; 9}] = J8
2
1np(x; 9) ( . 9) d
892 892 P X, X
since the second derivative is a random variable dependent on x. Also, the bound will
depend on 9 in general, so that it is displayed as in Figure 2.5 (dashed curve). An
example of a PDF that does not satisfy the regularity condition is given in Problem
3.1. For a proof of the theorem see Appendix 3A.
Some examples are now given to illustrate the evaluation of the CRLB.
Example 3.2 - CRLB for Example 3.1
For Example 3.1 we see that from (3.3) and (3.6)
for all A.
3.4. CRAMER-RAO LOWER BOUND 31
Thus, no unbiased estimator can exist wpose variance is lower than a2 for even a single
value of A. But III fa~t we know tha~ if A - x[O], then var(A} = a2 for all A. Since x[O]
is unbiasea and attaIns the CRLB, It must therefore be the MVU estimator. Had we
been unable to guess that x[O] would be a good estimator, we could have used (3.7).
From (3.2) and (3.7) we make the identification
9
I(9}
g(x[O])
A
1
a2
= x[O)
so that (3.7) is satisfied. Hence, A= g(x[O]) = x[O] is the MVU estimator. Also, note
that var(A) = a2
= 1/1(9), so that according to (3.6) we must have
We will return to this after the next example. See also Problem 3.2 for a generalization
to the non-Gaussian case. <>
Example 3.3 - DC Level in White Gaussian Noise
Generalizing Example 3.1, consider the multiple observations
x[n) = A +w[n) n = 0, 1, ... ,N - 1
where w[n] is WGN with variance a2
. To determine the CRLB for A
p(x; A)
Taking the first derivative
8lnp(x;A)
8A
N-l 1 [1 ]
11V2rra2 exp - 2a2 (x[n]- A?
1 [1 N-l ]
(2rra2)~ exp - 2a2 ~ (x[n]- A)2 .
8 [ 1 N-l ]
- -In[(2rra2)~]- - "(x[n]- A)2
8A 2a2 L..
n=O
1 N-l
2' L(x[n]- A)
a n=O
N
= -(x-A)
a2 (3.8)
32 CHAPTER 3. CRAMER-RAO LOWER BOUND
where x is the sample mean. Differentiating again
82
Inp(x;A) N
8A2 =- q2
and noting that the second derivative is a constant, ~ from (3.6)
(3.9)
as the CRLB. Also, by comparing (3.7) and (3.8) we see that the sample mean estimator
attains the bound and must therefore be the MVU estimator. Also, once again the
minimum variance is given by the reciprocal of the constant N/q2 in (3.8). (See also
Problems 3.3-3.5 for variations on this example.) <>
We now prove that when the CRLB is attained
where
. 1
var(8) = /(8)
From (3.6) and (3.7)
and
var(9) __-..".-."...,--1-,..---:-:-:-
-_ [82Inp(X; 0)]
E 802
8Inp(x; 0) = /(0)(9 _ 0).
88
Differentiating the latter produces
8
2
Inp(x; 0) = 8/(0) ({) _ 0) _ /(0)
802
80
and taking the negative expected value yields
and therefore
-E [82
In p(X; 0)]
802
- 8/(0) [E(9) - 0] + /(0)
80
/(0)
A 1
var(O) = /(0)'
In the next example we will see that the CRLB is not always satisfied.
(3.10)
3.4. CRAMER-RAO LOWER BOUND
Example 3.4 - Phase Estimation
Assume that we wish to estimate the phase ¢ of a sinusoid embedded in WGN or
x[n] =Acos(21lJon +¢) + wIn] n = 0, 1, ... , N - 1.
33
The ampiitude A and fre uenc 0 are assumed known (see Example 3.14 for the case
when t ey are unknown). The PDF is
1 {I N-l }
p(x; ¢) = Ii. exp --2
2 E [x[n]- Acos(21lJon +4»f .
(27rq2) 2 q n=O
Differentiating the log-likelihood function produces
and
-
8Inp(x; ¢)
84>
1 .'-1
-2 E [x[n]- Acos(27rfon + cP)]Asin(27rfon +¢)
q n=O
A N-l A
- q2 E [x[n]sin(27rfon +4» - "2 sin(47rfon + 24»]
n=O
821 (¢) A N-l
n;2
X
; = -2 E [x[n] cos(27rfon +¢) - Acos(47rfon + 2¢)].
¢ q n=O
Upon taking the negative expected value we have
A N-l
2 E [Acos2
(27rfon + ¢) - A cos(47rfon + 2¢)]
q n=O
A2N-l[11 ]
2" E -+ - cos(47rfon + 2¢) - cos(47rfon +2¢)
q n=O 2 2
NA2
2q2
since
1 N-l
N E cos(47rfon + 2¢) ~ 0
n=O
for 10 not near 0 or 1/2 (see Problem 3.7). Therefore,
In this example the condition for the bound to hold is not satisfied. Hence, a phase
estimator does not eXIst whIch IS unbiased and attains the CRLB. It is still possible,
however, that an MVU estimator may exist. At this point we do not know how to
34
................ _...........
93
................
•• 91 and CRLB
-------4--------------------- 0
(a) (h efficient and MVU
CHAPTER 3. CRAMER-RAO LOWER BOUND
var(9)
..............................
-------+-------------------- e
(b) 81 MVU but not efficient
Figure 3.2 Efficiency VB. minimum variance
determine whether an MVU estimator exists, and if it does, how to find it. The theory
of sufficient statistics presented in Chapter 5 will allow us to answer these questions.
o
An estimator which is unbiased and attains the CRLB, as the sample mean estimator
in Example 3.3 does, IS said to be efficient in that it efficiently uses the data. An MVU
estimator rna or may not be efficient. For instance, in Figure 3.2 the variances of all
possible estimators or purposes of illustration there are three unbiased estimators)
are displayed. In Figure 3.2a, 81 is efficient in that it attains the CRLB. Therefore, it
is also the MVU estimator. On the other hand, in Figure 3.2b, 81 does not attain the
CRLB, and hence it is not efficient. However, since its varianoe is uniformly less than
that of all other unbiased estimators, it is the MVU estimator.-
The CRLB given by (3.6) may also be expressed in a slightly different form. Al-
though (3.6) is usually more convenient for evaluation, the alternative form is sometimes
useful for theoretical work. It follows from the identity (see Appendix 3A)
(3.11)
so that
(3.12)
(see Problem 3.8).
The denominator in (3.6) is referred to as the Fisher information /(8) for the data
x or
(3.13)
As we saw previously, when the CRLB is attained, the variance is the reciprocal of the
Fisher information. Int"iirtrvely, the more information, the lower the bound. It has the
essentiaI properties of an information measure in that it is
3.5. GENERAL CRLB FOR SIGNALS IN WGN 35
1. nonnegative due to (3.11)
2. additive for independent observations.
The latter property leads to the result that the CRLB for N lID observations is 1 N
times t a, or one 0 servation. To verify this, note that for independent observations
N-I
lnp(x; 8) = L lnp(x[n]; 8).
n==O
This results in
-E [fJ2 lnp(x; 8)] = _ .~I E [fJ2ln p(x[n];8)]
[)82 L., [)82
n=O
and finally for identically distributed observations
/(8) = Ni(8)
where
i(8) = -E [[)2In~~[n];8)]
is the Fisher information for one sam Ie. For nonindependent samples we might expect
!!J.at the in ormation will be less than Ni(8), as Problem 3.9 illustrates. For completely
dependent samples, as for example, x[O] = x[l] = ... = x[N-1], we will have /(8) = i(8)
(see also Problem 3.9). Therefore, additional observations carry no information, and
the CRLB will not decrease with increasing data record length.
3.5 General CRLB for Signals
in White Gaussian Noise
Since it is common to assume white Gaussian noise, it is worthwhile to derive the
CRLB for this case. Later, we will extend this to nonwhite Gaussian noise and a vector
parameter as given by (3.31). Assume that a deterministic signal with an unknown
p'arameter 8 is observed in WGN as
x[n] = s[nj 8] +w[n] n = 0, 1, ... ,N - 1.
The dependence of the signal on 8 is explicitly noted. The likelihood function is
p(x; 8) = N exp - - L (x[n] - s[nj 8])2 .
1 {I N-I }
(211"172).. 2172 n=O
Differentiating once produces
[)lnp(xj8) = ~ ~I( [ ]_ [ . ll]) [)s[nj 8]
[)8 172 L., X n s n, u [)8
n=O
36 CHAPTER 3. CRAMER-RAO LOWER BOUND
and a second differentiation results in
02lnp(x;(J) =2- ~l{( []_ [.(J])8
2
s[n;(J]_ (8S[n;(J])2}.
8(J2 (12 L...J x n s n, 8(J2 8(J
n=O
Taking the expected value yields
E (82
Inp(x; (J)) = _2- ~ (8s[n; 0]) 2
802 (12 L...J 80
n=O
so that finally
(3.14)
The form of the bound demonstrates the importance of the si nal de endence on O.
Signals that c ange rapidly as t e un nown parameter changes result in accurate esti-
mators. A simple application of (3.14) to Example 3.3, in which s[n; 0] = 0, produces
a CRLB of (12/N. The reader should also verify the results of Example 3.4. As a final
example we examine the problem of frequency estimation.
Example 3.5 - Sinusoidal Frequency Estimation
We assume that the signal is sinusoidal and is represented as
s[n; fo] =A cos(271Ion + cP)
1
0< fo < 2
where-the amplitude and phase are known (see Example 3.14 for the case when they
are unknown). From (3.14) the CRLB becomes
(12
var(jo) ~ -N---l------- (3.15)
A2 L [21Tnsin(21Tfon + cP)]2
n=O
The CRLB is plotted in Figure 3.3 Vf>TSUS frequency for an SNR of A2
/(l2 = 1, a data
record length of N = 10, and a phase of cP = O. It is interesting to note that there
appea.r to be preferred frequencies (see also Example 3.14) for an approximation to
(3.15)). Also, as fo -+ 0, the CRLB goes to infinity. This is because for fo close to zero
a slight change in frequency will not alter the signal significantly. 0
3.6. TRANSFORMATION OF PARAMETERS
5.0-:-
-g 4.5-+
j 4.0~
~ 3.5~
Ja 3.0~
~ I
~ 2.5-+
~ 2.0~
'"
...
C) 1.5-+
I
1.o-l.
1
~'>Lr----r---r---r-----t----''---r---r---r:>.L----t--
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
Frequency
Figure 3.3 Cramer-Rao lower bound for sinusoidal frequency estimation
3.6 Transformation of Parameters
37
It fre uentl occurs in practice that the parameter we wish to estimate is a function
o some more fun amenta parameter. or mstance, in Example 3.3 we may not be
interested in the sign of A but instead may wish to estimate A2 or the power of the
signal. Knowing the CRLB for A, we can easily obtain it for A2 or in general for ~
function of A. As shown in Appendix 3A, if it is desired to estimate ex = g(0), then the
CRLBi§
(3.16)
For the present example this becomes ~ = g(A) = A2 and
(A
2) > (2A)2 = 4A2(12
var - N/(l2 N' (3.17)
Note that in using (3.16) the CRLB is expressed in terms of O.
We saw in Example 3.3 that the sample mean estimator was efficient for A. It might
be sUj)OSed that x2 is efficient for A2. To uickl dispel this notion we first show that
is not even an unbiased estimator. Since x ""' (A, (I
(12
= E2(:C) +var(x) = A2 + N
1= AZT' (3.18)
Hence, we immediately conclude that the efficiency of an estimator is destroyed by a
ftonlinear transformatioTil That it is maintained for linear (actually affine) transfor-
mations is easily verified. Assume that an efficient estimator for 0 exists and is given
38 CHAPTER 3. CRAMER-RAO LOWER BOUND
'by O. It is desired to estimate g{O) = aO + b. As our estimator of g(O), we choose
.#) = g(0) = aO + b. Then,
E(a8 +b) = aE(8) +b= a8 +b>
= g(8} ,
so that g(O) is unbiased. The CRLB for g(O), is from (3.16),
(
8g )2
;:;;.j~
- I(O)
var(g(8))
( 8~~)) 2 var(0)
a2
var(8)O
But var{g(O)) = var(aO + b) = a2
var(O), so that the CRLB is achieved.
Althou h efficienc is reserved onl over linear transformations, it is approximatel
maintained over nonlinear transformations if the data record is large enoug. IS as
great practical significance in that we are frequently interested in estimating functions
of parameters. To see why this property holds, we return to the previous example of
estimating A2 by x2. Although x2is biased, we note from (3.18) that x2is asymptotically
unbiased or unbiased as N ~ 00. Furthermore, since x '" N(~0'2 IN), we can evaluate
tIle variance
var(x2
) =E(x4
) - E2(x2
}J
by using the result that if ~ '" N(J,L, 0'2), then
and therefore
E(e) = p.2 +q2, J
E(~4) = p.4 +6p.2q2 +3q~/
var(e) E(e) - E2
(e)
4J,L20'2 + 20'4.
For our problem we have then
-
Hence, as N ~ 00, the variance approaches 4A2
0'2 IN, the last term in (3.19) converging
to zero faster than the first, But this is just the CRLB as given by (3.17). Our assertion
that x2is an asymptotically efficient estimator of A2
is verified. Intuitively, this situation
occurs due to the statistical linearity of the transformation, as illustrated in Figure 3.4,
As N increases, the PDF of xbecomes more concentrated about the mean A. Therefore,
EXTENSION TO A VECTOR PARAMETER
3.7.
39
~~----------~~--~------- x
A-~ A
-IN
(a) Small N
30"
A+-
-IN
A-~ A
-IN
(b) Large N
Figure 3.4 Statistical linearity of nonlinear transformations
30"
A+-
-IN
the values of x that are observed lie in a small interval about x = A (the ±3 standard
deviation interval is displayed). Over this small interval the nonlinear transformation
is approximately linear. Therefore, the transformation may be replaced by a linear one
since a value of x in the nonlinear region rarely occurs. In fact, if we linearize g about
A, we have the approximation
dg(A) ,",
g(x) ~ g(A) + d}i(x - A)."
It follows that, to within this approximation,
E[g(x)] = g(A) = A2
or the estimator is unbiased (asymptotically). Als<?!...
var[g(x)] [
d9(A)]2 (-)
dA var x
(2A)20'2
N
4A20'2
N
so that the estimator achieves the CRLB (asymptotically). Therefore, it is asymp-
totically efficient. This result also yields insight into the form of the CRLB given by
(3.16).
3.7 Extension to a Vector Parameter
We now extend the results of the previous sections to the case where we wish to estimate
a vector parameter (} = [01 02 " • 8p j1:: We will assume that the estimator (J is unbiased
·w CHAPTER 3. CRAMER-RAO LOWER BOUND
:IS defined in Section 2.7. The vector parameter CRLB will allow us to place a bound
"ll the variance of each element. As derived in Appendix 3B, the CRLB is found as the
.i. i] element of the inverse of a matrix or
var(6i) ~ [rl(9)]ii . (3.20)
where 1(8) is the p x p Fisher information matrix. The latter is defined by
[1(8)] .. = -E [82
lnp(X;8)] v
" 88;l}8j it
(3.21)
f,)r 1= 1,2, ... ,p;j = 1,2, .... In evaluating (3.21) the true value of 8 is used.
'ote t at in t e scalar case (p = 1), = J(O) and we ave the scalar CRLB. Some
,'xamples follow.
Example 3.6 - DC Level in White Gaussian Noise (Revisited)
"e now extend Example 3.3 to the case where in addition to A the noise variance a 2
:~ also unknown. The parameter vector is 8 = [Aa2
f, and hence p = 2. The 2 x 2
Fisher information matrix is
_E[82
ln
p
(x;8)] 1
8A8a2
-E [82
lnp(X; 8)] .
8 22
a ..
It is clear from (3.21) that the matrix is symmetric since the order of partial differenti-
.Ilion may be interchanged and can also be shown to be positive definite (see Problem
;UO). The log-likelihood function is, from Example 3.3,
N N 1 N-l
lnp(x' 8) = --ln271" - -lna2
- - L (x[n]- A)2.
, 2 2 2a2
n=O
l'he derivatives are easily found as
8 lnp(x; 8)
8A
8 lnp(x; 8)
8a2
82 lnp(x; 8)
8A2
82
lnp(x; 8)
8A8a2
82
lnp(x;8)
8a22
1 N-l
"2 L(x[n]- A)
a n=O
N 1 N-l
-- + - "(x[n]- A)2
2a2 2a4 L
n=O
N
a2
1 N-l
-- '" (x[n]- A)
a4 L
n=O
N 1 N-l
-- - - "(x[n]- A)2.
2a~ a6 L
n=O
3.7. EXTE.VSION TO A VECTOR PARAMETER 41
Upon taking tne negative expectations, the Fisher information matrix becomes
1(8) = [~ ~] .
2a4
Although not true in general, for this example the Fisher information matrix is diagonal
and hence easily inverted to yield
0
var(A)
a-
~
N
var(d2
)
2a4
~
N
Note that the CRLB for A. is the same as for the case when a2
is known due to the
diagonal nature of the matrix. Again this is not true in general, as the next example
illustrates. 0
Example 3.7 - Line Fitting
Consider the problem of line fitting or given the observations
x[n] = A + Bn + w[n] n = 0, 1, ... , N - 1
where w[n] is WGN, determine the CRLB for the slope B and the intercept A. .The
parameter vector in this case is 8 = [AB]T. We need to first compute the 2 x 2 FIsher
information matrix,
1(8) =
[
_ [82
ln p(X;8)]
E 8A2
_ [82
ln p(x;8)]
E .8B8A
The likelihood function is
p(x; 8) = N exp --2 L (x[n] - A - Bn)2
1 {I N-l }
(271"a2
) "2 2a n=O
from which the derivatives follow as
8lnp(x;8)
8A
8lnp(x;8)
8B
1 N-l
- "(x[n]- A - Bn)
a2 L
n=O
1 N-l
- "(x[n]- A - Bn)n
a2 L
n=O
42
CHAPTER 3. CRAMER-RAO LOWER BOUND
and
o2Inp(x; 9)
OA2
o2Inp(x; 9)
oA8B
o2Inp(x; 9)
OB2
N
0'2
1 N-I
- 0'2 L n
n=O
1 N-I
- 0'2 L n
2
.
n=O
Since the second-order derivatives do not depend on x, we have immediately that
[ N~' 1
1(9)
1 N 2:>
n-O
0'2 %n ~n2
~ :'[
N
N(N -1)
2
N(N -1) N(N - 1)(2N - 1)
2 6
where we have used the identities
N-I
Ln
n=O
Inverting the matrix yields
N(N -1)
2
N(N - 1)(2N - 1)
6
[
2(2N -1)
1-1(9) = 0'2 _N(N: 1) N(:+ 1) j.
12
N(N2 - 1)
N(N + 1)
It follows from (3.20) that the CRLB is
var(A)
2 2(2N - 1)0'2
N(N + 1)
var(.8) >
N(N2 - 1)'
1
(3.22)
3.7. EXTENSION TO A VECTOR PARAMETER 43
x[nJ x[nJ
4-
3 3-
2 2-
n
1 -
1
n
2 3 4 2 3 4
(a) A = 0, B = 0 to A = 1, B = 0 (b) A = 0, B = 0 to A = 0, B = 1
Figure 3.5 Sensitivity of observations to parameter changes-no noise
Some interestine; observations follow from examination of the CRLB. Note first that
the CRLB for A has increased over that obtained when B is known, for in the latter
case we have
, 1 0'2
var(A) 2 - E [02Inp(X; A)] = N
OA2
and for N 2 2, 2(2N - l)/(N + 1) > 1. This is a quite general result that asserts that
the CRLB always increases as we estimate more pammeters '(see Problems 3.11 and
3.12). A second point is that
CRLB(A) (2N - l)(N - 1)
---,--;,,-'- = > 1
CRLB(B) 6
for N 2 3. Hence, B is easier to estimate, its CRLB decreasing as 1/N3
as opposed to
the l/N dependence for the CRLB of A. ~differing dependences indicate that x[n]
is more sensitive to changes in B than to changes in A. A simple calculation reveals
~x[n]
~x[n]
~ ox[n] ~A = ~A
oA
~ o;~] ~B = n~B.
Changes in B are magnified by n, as illustrated in Figure 3.5. This effect is reminiscent
of (3.14), and indeed a similar type of relationship is obtained in the vector parameter
case (see (3.33)). See Problem 3.13 for a generalization of this example. 0
As an alternative means of computing the CRLB we can use the identity
E [olnp(x; 9) 81np(x; 8)] = -E [021np(X; 8)}.:.,
8Bi 8Bj 8Bi 8Bj .
(3.23)
44 CHAPTER 3. CRAMER-RAO LOWER BOUND
as shown in Appendix 3B. The form given on the right-hand side is usually easier to
evaluate, however.
We now formally state the CRLB theorem for a vector parameter. Included in the
theorem are conditions for equality. The bound is stated in terms of the covariance
matrix of 8, denoted by Co, from which (3.20) follows.
Theorem 3.2 (Cramer-Rao Lower Bound - Vector Parameter) It is assumed
~hat the PDF p(x; 0) satisfies the "regularity" conditions
for alIO,"
where the expectation is taken with respect to p(x; 0). Then, the covariance matrix of
any unbiased estimator IJ satisfies
[I(8)li' = -E [EJ2inP(X; 0)] :7
3 8fM}(}j g
where the derivatives are evaluated at the true value of 0 and the e ectation is taken
with respect to X' 0). Furthermore, an unbiased estimator may be found that attains
the bound in that Co = 1-1(0) if and only if
8lnp(x; 8) =I(8)(g(x) _ 0)
08
for some p-dimensional function g and some p x p matrix I. That estimator, which is
the MVU estimator, is 0 = g(x), and its covariance matrix is 1-1 (0).
The proof is given in Appendix 3B. That (3.20) follows from (3.24) is shown by noting
that for a positive semidefinite matrix the diagonal elements are nonnegative. Hence,
and therefore
8lnp(x;0)
80 [
8ln;~;0) 1
8lnp(x;0)
8B
3.8. VECTOR PARAMETER CRLB FOR TRANSFORMATIONS
1 N-l
- "(x[n]- A - Bn)
(J"2 ~
n=O
1 N-l
- "(x[n] - A - Bn)n
(J"2 ~
n=O
Although'not obvious, this may be rewritten as
[
~ N(~(J"~ 1)
8lnp(x;0) _ u
80 N(N - 1) N(N - 1)(2N - 1)
2(J"2 6(J"2
where
2(2N - 1) N-l 6 N-l
N(N + 1) ~ x[n]- N(N + 1) ~ nx[n]
6 N-l 12 N-I
N(N + 1) ~ x[n] + N(N2 _ 1) ~ nx[n].
45
(3.28)
(3.29)
A A T h e
Hence, the conditions for equality are satisfied and [A~] is .an efficiefnt hand t er.elore
MVU estimator. Furthermore, the matrix in (3.29) IS the mverse 0 t e covanance
matrix.
If the equality conditions hold, the reader may ask whether we can be assured that
(j is unbiased. Because the regularity conditions
[
8ln p(x;0)] = 0
E 80
are always assumed to hold, we can apply them to (3.25). This then yields E[g(x)] =
E(8) =O. .
In finding MVU estimators for a vector parameter the CI~.LB theorem ~rovldes a
powerful tool. In particular, it allows us to find the MVU estlI~ator. for an.l~portant
class of data models. This class is the linear mode'and is descnbed m detail m Chap-
ter 4. The line fitting example just discussed is a special case. Suffice. it to say t~at
if we can model our data m the linear model form, then the MVU estimator and Its
performance are easily found.
3.8 Vector Parameter CRLB for Transformations
The discussion in Section 3.6 extends readily to the vector case. Assume that it .is
desired to estimate 0: = g(O) for g, an r-dimensional function. Then, as shown m
Appendix 3B
C - _ 8g(0)I-1(8)8g(0)T > On
a 88 88-' (3.30)
46
CHAPTER 3. CRAMER-RAO LOWER BOUND
where, as before > 0 is to b . t d . .
. th J 'b' ..~ III erprete as posItIve semidefinite. In (3.30) 8g(O)/80
IS e r x p &co Ian matrut"defined as ' ~_.....
_........_
8g1 (0) 8g1 (0) 8g1 (0)
an;- ao;- ao;-
8g(O) 8g2(0) 8g2(O) 8g2(O)
89= an;- ao;- ao;-
8gr (0) 8gr (0) 8gr (O)
801 ao;- ao;-
Example 3.8 - CRLB for Signal-to-Noise Ratio
Consider a DC level in WGN with A and 2 k lX'· .
U un nown. He wIsh to estImate
A2
a=-
u2
w(~)c:~~n/Obe_co~sid;red to be the SN~ for a single sample. Here 0 = [A u 2JT and
9 1 2 - A /u . Then, as shown III Example 3.6,
The Jacobian is
so that
8g(O)
fii}
Finally, since a is a scalar
[
IV
1(0) = ~2 o 1
IV .
2u4
[~~ -:: 1[: 2~'][-~f 1
4A2 2A4
= -+-
IVu2 IVu4
4a + 2a2
= IV
(') > 4a + 2a
2
var a - IV .
o
3.9. CRLB FOR THE GENERAL GAUSSIAN CASE 47
As discussed in Section 3.6, efficiency is maintained over linear transformations
a = g(O) = AO +tf
where A is an r x p matrix and b is an r x 1 vector. If a = AiJ + b, and iJ is efficient
or ~ - I 1(0), then
so that a is unbiased and
E(a) =AO +b =a:
Co. ::1:) ACfjAT = Arl(O)AT
= 8g(O)r1(0)8g(O)y ~.;
80 80 ,.
the latter being the CRL,E. For nonlinear transformations efficiency is maintained only
as IV -+ 00. (This assumes that the PDF of Ii becomes concentrated about the true
value of 0 as IV -+ 00 or that 0 is consistent.) Again this is due to the statistical
linearity of g(O) about the true value of o.
3.9 CRLB for the General Gaussian Case·
It is quite convenient at times to have a general expression for the CRLB. In the case
of Gaussian observations we can derive the CRLB that generalizes (3.14). Assume that
x '" N (1'(0), C(0))
so that both the mean and covariance may depend on O. Then, as shown in Appendix
3C, the Fisher information matrix is given by -
where
8JL(0)
80i
8[JL(O)h
80i
8[JL(O)h
80i
(3.31)
48
~--=-=:. 3. CRAMER-RAO LOWER BOUND
and
8[C(8))l1 8[C f
80i
8C(8) 8[C(8)bI 8[C F
an:- = 80i c-;r-
8[C(8))NI 8[C 6
--
-
8[C(8)hN
80i
8[C(8)bN
80;
80i CIt
For the scalar parameter case in which
X rv N(J.t1 F -
this reduces to
1(0) = [8J.t(O)]T _ -.~IJIO)]
80 L _JB
+ ~tr [IL _ .:C'O)) 2]
2  .~e (3.32)
which generalizes (3.14). We now illustrate tn~ ~:---c-ation 'th
- _. WI some examples.
Example 3.9 - Parameters of as· al' - .
Ign m .... nJ;e GaussIan Noise
Assume that we wish to estimate a scalar signa :~~eter ;for the data set
x[n) = s[n; 0) +w[n) . = ~ ... N _ 1
where w[n) is WGN. The covariance matrix is C= -I ~ d
second term in (3.32) is therefore zero. The £:-;:- _=:::: .~ld~oes not depend on O. The
1(0) =
which agrees with (3.14).
o
Generalizing to a vector signal t · .
from (3.31) parame er estIILa-;-=-..: ..:: :.Je presence of WGN, we have
[1(8)b= [8~~~)r.~= _'~8)]
•
3.9. CRLB FOR THE GENERAL GAUSSIAN CASE 49
which yields
[1(8)) = 2. '~l 8s[n; 8) 8s[n; 8)
IJ 0"2 L 80 80·
n=O 1 J
(3.33)
as the elements of the Fisher information matrix.
Example 3.10 - Parameter of Noise
Assume that we observe
x[n] = w[n] n = 0, 1, ... ,N - 1
where w[n] is WGN with unknown variance 0 = 0"2. Then, according to (3.32), since
C(0"2) = 0"21, we have
1(0"2) ~tr [( C-I(0"2) 8~~2)r]
~tr [ ( (:2) (I)r]
~tr [2.1]
2 0"4
N
20"4
which agrees with the results in Example 3.6. A slightly more complicated example
follows. 0
Example 3.11 - Random DC Level in WGN
Consider the data
x[n] = A +w[n) n = 0,1, ... ,N - 1
where w[n) is WGN and A, the DC level, is a Gaussian random variable with zero mean
and variance O"~. Also, A is independent of w[n]. The power of the signal or variance
O"~ is the unknown parameter. Then, x = [x[O) x[I] ... x[N - I)f is Gaussian with zero
mean and an N x N covariance matrix whose [i,j] element is
Therefore,
E [xli - I)x[j - 1))
E [(A +w[i - I])(A +w[j - 1))]
O"i +0"2Jij .
.50
CHAPTER 3. CRAMER-RAO LOWER BOUND
where 1 = [11 .. . 1]T. Using Woodbury's identity (see Appendix 1), we have
Also, since
we have that
C-I(a2)8C(a~)= 1 T
A"2 2 N211
uaA a + a.4
Substituting this in (3.32) produces
1 ( N )2
2 a 2
+ Na~
so that the CRLB is
var(a.~) ~ 2 (a.~ + ~) 2
Note tha~ .even as N --+ 00, the CRLB does not decrease below 2a~. This is I)P'""T"""
each addItIOnal data sample yields the same value of A (see Proble~ 3.14).
3.10 Asymptotic CRLB for WSS Gaussian Random '
Processes'
. - . - ------------
O ASYMPTOTIC CRLB
3.1 .
-!
G
1 -Ie
-2
Q(f)
-11 h
?ss(f; Ie)
4
Ie - 11 Ie Ie + h
fernin = h
1
fernax = :2 - 12
Figure 3.6 Signal PSD for center frequency estimation
51
I
1
2'
I
1
2'
As shown in Appendix 3D, the elements of the Fisher information are approximately
(as N--+ 00)
[1(9)J .. = N j! 8lnP",,,,(jj9) 8InP",,,,(jj9) dl
'1 2 _! 88; 88j '
(3.34)
A typical problem is to estimate the center frequency Ie of a PSD which otherwise is
known. Given
we wish to determine the CRLB for Ie assuming that Q(f) and a2
are known. We
view the process as consisting of a random signal embedded in WGN. The center
.JlE~uen(:y of the signal PSD is to be estimated. The real function Q(f) and the signal
;Ie) are shown in Figure 3.6. Note that the possible center frequencies are
,'CO:nstlrailled to be in the interval [ir, 1/2 - h]. For these center frequencies the signal
for I ~ 0 will be contained in the [0,1/2] interval. Then, since (J = Ie is a scalar,
52 CHAPTER 3. CRAMER-RAO LOWER BOUND
we have from (3.34)
But
alnPxx(f; Ie)
ale
aln [Q(f - Ie) + Q(- I - Ie) + (j2]
ale
aQ(f - Ie) + aQ(- I - Ie)
ale ale
Q(f - Ie) + Q(- I - Ie) + (j2·
This is an odd function of I, so that
J! (alnPxx(f;le))2 dl = 2 {! (alnPxx(f;le))2 df.
_! ale Jo ale
Also, for I 2: 0 we have that Q(- I - Ie) = 0, and thus its derivative is zero due to the
assumption illustrated in Figure 3.6. It follows that
var(f~) 2:
1
1 ( aQ(f-Ie) )2
N 2 ale dI
1 Q(f - Ie) + (j2 -
1
N a(f Ie) dl
1
! ( a
Q
(f-Ie_l) )2
o Q(f - Ie) + (j2
1
2 Ie ---aT'
1_ ( aQ(f') ) 2
N 1-fe Q(f') + (j2 dj'
where we have let j' = I - Ie. But 1/2-Ie 2: 1/2-lem .. = hand - Ie:::; - lem'n = -11,
SO that we may change the limits of integration to the interval [-1/2,1/2]. Thus,
1
3.11. SIGNAL PROCESSING EXAMPLES 53
As an example, consider
where (jf « 1/2, so that Q(f) is bandlimited as shown in Figure 3.6. Then, if Q(f) »
(j2, we have approximately
Narrower bandwidth (smaller (jJ) spectra yield lower bounds for the center freguency
since the PSD changes more rapidly as te changes. See also Problem 3.16 for another
example. 0
3.11 Signal Processing Examples
We now apply the theory of the CRLB to several signal processing problems of interest.
The problems to be considered and some of their areas of application are:
1. Range estimation - sonar, radar, robotics
2. Frequency estimation - sonar, radar, econometrics, spectrometry
3. Bearing estimation - sonar, radar
4. Autoregressive parameter estimation - speech, econometrics.
These examples will be revisited in Chapter 7, in which actual estimators that asymp-
totically attain the CRLB will be studied.
Example 3.13 - Range Estimation
In radar or active sonar a si nal ulse is transmitted. The round tri dela -,; from
the transmItter to t e target and back is related to the range R as TO - 2R/c, ~
is the speed of propagation. ~stimation of range is therefore equivalent to estimation
of the time deley, assuming that c is known. If s(t) is the transmitted signal, a simp~
model for the received continuous waveform is
x(t) =s(t - To) +w(t) 0:::; t:::; T.
The transmitted signal pulse is assumed to be nonzero over the interval [0, T.l. Addi-
tionally, the signal is assumed to be essentially bandlimited to B Hz. If the maximum
time delay is Tomll
, then the observation interval is chosen to include the entire signal
by letting T = T. +TOma:. The noise is modeled as Gaussian with PSD and ACF as
54
-B
PSD of wit)
No/2
B
F(Hz)
CHAPTER 3. CRAMER-RAO LOWER BOUND
1
2B
ACF ofw(t)
sin 211"rB
rww(r) = NoB---
1 1
2B B
211"rB
r
Figure 3.7 Properties of Gaussian observation noise"':-
shown in Figure 3.7. The bandlimited nature of the noise results from filtering the con-
tinuous waveform to the signal bandwidth of B Hz. The continuous received waveform
is sampled at the Nyquist rate or samples are taken every ~ =1/(2B1seconds ~
the observed data
x(n~) = s(n~ - TO) +w(n~) n=O,I, ... ,N-L)
Letting x[n] and wIn] be the sampled sequences. we have our discrete data model
x[n] =s(n~ - TO) + wIn].
{
wIn}
x[n) = s(n~ - To) +wIn)
wIn)
O:::;n:::;no-l ,J
no:::;n:::;no+M-l:;
no+M:::;n:::;N-l''>
(3.35)
(3.36)
where M is the len th of the sampled signal and no = T; ~ is the dela in sam les.
For simplicity we assume that ~ is so small that To/~ can be approximated by an
integer.) With this formulation we can apply (3.14) in evaluating the CRLB.
-
n:~O-1 (8s(n!o- TO) r
(72
no+M-1 (dS(t) I )2
T~O ili t=nD.-To
'"
J
SIGNAL PROCESSING EXAMPLES
3.11.
=
EI(dS(t) )2
n=O dt t=nD.
55
since To = no~. Assuming that ~ is small enough to approximate the sum by an
integral, we have
- var(fo) 2: 2. fT, (dS(t))2 dt'
~ 10 dt
r-inally, noting that ~ = 1/(2B) and (72 = NoB, we have
No
A '--2."
var(To) 2: 1To (dS(t).~2 . ",')
_. dtr;
o dt,·7
An alternative form observes that the energy £ is
which results in
where
.....--
1
var(fo) 2: £_
No/2F2
fT, (dS(t))2 dt
F
2 _ 10 dt
- T
l's2(t)dt
(3.37)
(3.38)
It can be shown that £/(No!2) is a SNR [Van Trees ~968]. Also, F2 is a ~easure of
the bandwidth of the signal since, using standard FOUrier transform properties,
_ JOO (27rF)2IS(FWdF
F2 - 00 (3.39)
- [ : IS(F)12dF
where F denotes continuous-time frequency, and S(F) is the Fourier trans~orm of s(t).
In this form it becomes clear that F2 is the mean square bandwidth of the signa!. From
(3.38) and (3.39), the larger the mean square bandwi~th, the lower the CRr~{ ;~
instance, assume tfiat the SIgnal is a Gaussian pulse .glven by s(t) exp( 2 F( _
T./2)2) and that s(t) is essentially nonzero over the mterval [0, T.}. Then IS(F)I -
56 CHAPTER 3. CRAMER-RAO LOWER BOUND
(aF(';27r)exp(-27r2F2/a~) and F2 = a~/2. As the mean square bandwidth increases,
the signal pulse becomes narrower and it becomes easier to estimate the time delay.
Finally, by noting that R =cTo/2 and using (3.16), the CRLB for range is
, c2
/4
var(R) :::: .
_c_F2
No/2
(3.40)
(>
Example 3.14 - Sinusoidal Parameter Estimation
In many fields we are confronted with the roblem of estimatin the arameters of a
sinusoidal signal. Economic data w ich are cyclical in nature may naturally fit such a
model, while in sonar and radar physical mechanisms cause the observed signal to be
sinusoidal. Hence, we examine the determination of the CRLB for the am litu
fre uency fa, and phase 0 a smusoid embedded in WGN. This example generalizes
xarnples . and 3.5. The data are assumed to be
x[nJ = Acos(27rfon + </>} + w[n) n = 0,1, ... ,N-1
where A > 0 and 0 < fo < 1(2 (otherwise the parameters are not identifiable, as is
verIfied by considering A = 1, </> = 0 versus A = -1, rP = 7r or fo = 0 with A = 1/2, rP = 0
versus A =1/,;2,</> =7r(4). Since ~ultiple parameters are un!nown, we use (3.33)
(1(8)] .. = ~ f:os[n;8) os[n;8J.'
'1 a2 n=O aOi oOj .
for fJ = [A fo tV. In evaluating the CRLB it is assumed that 0 is not near 0 or 1 2,
which allows us to ma e certam simp i cations based on the approximations [Stoica
1989J (see also Problem 3.7):
1 N-I .
N,+I L n' sin(47rfon + 2</>} ~ 0
n==O
1 N-I
Ni+l L ni cos(47rfon + 2rP) ~ 0
n=O
for i = 0, 1,2. Using these approximations and letting 0 =27rfan + </>, we have
[1(fJ)]u
1 N-I 1 N-I (1 1 ) N
- '"' cos2
0 = - '"' - + - cos 20 ~-
a2 L.. cr2 L.. 2 2 2cr2
n=O n=O
[1(6)h2
1 N-I A N-I
-2 L A27rncososino = -~ L nsin20 ~ 0
cr =0 cr n_
3.11.
SIGNAL PROCESSING EXAMPLES
[1(8}b
[1(8}b
[1(fJ)]J3 =
The Fisher information matrix becom:~_____---11
_ "F"'"iti
1
1(8) = 2"
cr
2
o
o
Using (3.22), we have upon inversion
var(~) ~
o o
n=O n=O
12
(27r)21]N(N2 - 1)
2(2N -1)
1]N(N +1)
57
(3.41)
. estimation of a sinusoid is of considerable
where 1] = A2/(2a:a)is the SNR. Frequetcy decreases as the SNR increases and
interest. Note that the CRLB/Nf~r th~. req~e~~ite sensitive to data record length. See
that the bound decreases as 1 ,m. mg
also Problem 3.17 for a variation of thIS example. (>
Example 3.15 - Bearing Estimation
In sonar it is of interest to estimate the in
do so the acoustic pressure field is observed b
58
y
o d
2
Planar
wavefronts
CHAPTER 3. CRAMER-RAO LOWER BOUND
x
M-l
Figure 3.8 Geometry of array for
bearing estimation
~'. Ass~ming that the target radiates a sinusoidal signal Acos(21TFot + c/J), then the
r~ce1ved S1 nal at the nth sensor is A cos 21TFo t - tn)+c/J), where tn is the pro a ati;lli
t1me to the nth sensor. If the array is located far rom the target, then the circular
waveFronts can be considered to be planar at the array. As shown ill Figure 3.8, the
wavefront at t~e (n. l)st sensor lags that at the nth sensor by dcos{3/c due to the
extra propagatlOn d1stance. Thus, the propagation time to the nth sensor is
d
tn = to - n- cos{3
c
n = 0, 1, ... ,M - 1
where to is the propagation time to the zeroth sensor, and the observed signal at the
nth sensor is
sn(t) = A cos [21rFo(t - to +n~ cos{3) +c/Jt.
If a single "snapshot" of data is taken or the array element outputs are sampled at a
given time t., then -
d
sn(t.) =Acos[21r(Fo- cos{3)n +c/J'] (3.42)
c
wher~ 4J' ~ c/J+2~Fo(t. -to). In this form it bec0Il:l,es clear that the spatial observations
~usOldal w1th frequency I. = Fo(d/c) cos (3. To complete the description of the
data we assume that the sensor outputs are corrupted by Gaussian noise with zero mean
and variance (]2 which is independent from sensor to sensor. The data are modeled as
x[n] = sn(t.) +w[n] n = 0, 1, ... ,M - 1
where w[n] is WGN. Since typicall A, are unknown, as well as {3 we have the roblem
of estimating {A, I., c/J' based on (3.42) as in Example 3.14. Onc~ the CRLB for these
parameters is determined, we can use the transformation of parameters formula. The
transformation is for 8 - [A I. c/J't -
-
Q ~0(9) ~ [ lJ~[~"o'11,~) ].
3.11. SIGNAL PROCESSING EXAMPLES
The Jacobian is
so that from (3.30)
o _ c 0
[
1 0 0 1
Fodsin{3
0 0 1
[
C. - 8g(8)I_1 (8/g(8)T] > 0
a 88 88 22
- '
Because of the diagonal Jacobian this yields
A [8g(8)]2[_1 1
var({3):::: {fiJ 22 I (8) 22'
But from (3.41) we have
[
-1 ( 1 12
I 8) 22 = (21T)2TJM(M2 _ 1)
and therefore
A 12 c2
var«(3) :::: (21r)2TJM(M2 -1) FgtPsin2 {3
or finally
• 12
var({3) ~ 2 M
+1 (L)2 • 2 ,
(21r) MTJ M _ 1 >: sm (3,
Example 3.16 - Autoregressive Parameter Estimation
59
(3.43)
In speech processing an important model for speech production is the autoregressive
(AR) process. As shown in Figure 3.9, the data are modeled as the output of a causal
all-pole discrete filter excited at the input by WGN un. The excitation noise urn] is
an III erent part of the model, necessary to ensure that x n is a WSS random rocess.
The all-pole filter acts to niode e vocal tract, while the excitation noise models the
:Orclll of air throu h a constriction in the throat necessary to produce an unvoiced
sound such as an "s." The effect of t e Iter is to color the white noise so as to model
PSDs wjth several resonances. This model is also referred to as a linear predictive coding
(LPC) modef'[Makhoul 1975]. Since the AR model is capable of producing a variety
of PSDs, depending on the choice of the AR filter parameters {a[1], a[2]' ... ,a(p]) and
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf
Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf

More Related Content

Similar to Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf

Final Report - Major Project - MAP
Final Report - Major Project - MAPFinal Report - Major Project - MAP
Final Report - Major Project - MAP
Arjun Aravind
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDF
aniruddh Tyagi
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDF
Aniruddh Tyagi
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDF
aniruddh Tyagi
 
Introduction to Radial Basis Function Networks
Introduction to Radial Basis Function NetworksIntroduction to Radial Basis Function Networks
Introduction to Radial Basis Function Networks
ESCOM
 
Morton john canty image analysis and pattern recognition for remote sensing...
Morton john canty   image analysis and pattern recognition for remote sensing...Morton john canty   image analysis and pattern recognition for remote sensing...
Morton john canty image analysis and pattern recognition for remote sensing...
Kevin Peña Ramos
 
M.Sc thesis
M.Sc thesisM.Sc thesis
M.Sc thesis
Davide Nardone
 
High_Voltage_Digital_to_Analog_Converter_thesis
High_Voltage_Digital_to_Analog_Converter_thesisHigh_Voltage_Digital_to_Analog_Converter_thesis
High_Voltage_Digital_to_Analog_Converter_thesis
Jeffrey Kittredge, PE
 
Implementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkleyImplementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkley
Farhad Gholami
 

Similar to Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf (20)

Final Report - Major Project - MAP
Final Report - Major Project - MAPFinal Report - Major Project - MAP
Final Report - Major Project - MAP
 
Machine learning-cheat-sheet
Machine learning-cheat-sheetMachine learning-cheat-sheet
Machine learning-cheat-sheet
 
Mak ms
Mak msMak ms
Mak ms
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDF
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDF
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDF
 
Introduction to Radial Basis Function Networks
Introduction to Radial Basis Function NetworksIntroduction to Radial Basis Function Networks
Introduction to Radial Basis Function Networks
 
book.pdf
book.pdfbook.pdf
book.pdf
 
Morton john canty image analysis and pattern recognition for remote sensing...
Morton john canty   image analysis and pattern recognition for remote sensing...Morton john canty   image analysis and pattern recognition for remote sensing...
Morton john canty image analysis and pattern recognition for remote sensing...
 
M.Sc thesis
M.Sc thesisM.Sc thesis
M.Sc thesis
 
Thesis
ThesisThesis
Thesis
 
High_Voltage_Digital_to_Analog_Converter_thesis
High_Voltage_Digital_to_Analog_Converter_thesisHigh_Voltage_Digital_to_Analog_Converter_thesis
High_Voltage_Digital_to_Analog_Converter_thesis
 
time_series.pdf
time_series.pdftime_series.pdf
time_series.pdf
 
Machine Learning Project - Neural Network
Machine Learning Project - Neural Network Machine Learning Project - Neural Network
Machine Learning Project - Neural Network
 
Dm
DmDm
Dm
 
Implementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkleyImplementation of a Localization System for Sensor Networks-berkley
Implementation of a Localization System for Sensor Networks-berkley
 
A Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative OptimizationsA Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative Optimizations
 
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingBig Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
 
Big data-and-the-web
Big data-and-the-webBig data-and-the-web
Big data-and-the-web
 
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
 

Recently uploaded

一比一定(购)卡尔顿大学毕业证(CU毕业证)成绩单学位证
一比一定(购)卡尔顿大学毕业证(CU毕业证)成绩单学位证一比一定(购)卡尔顿大学毕业证(CU毕业证)成绩单学位证
一比一定(购)卡尔顿大学毕业证(CU毕业证)成绩单学位证
wpkuukw
 
Jual Obat Aborsi Semarang ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Semarang ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Semarang ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Semarang ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...
Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...
Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...
drmarathore
 
Design-System - FinTech - Isadora Agency
Design-System - FinTech - Isadora AgencyDesign-System - FinTech - Isadora Agency
Design-System - FinTech - Isadora Agency
Isadora Agency
 
How to Build a Simple Shopify Website
How to Build a Simple Shopify WebsiteHow to Build a Simple Shopify Website
How to Build a Simple Shopify Website
mark11275
 
一比一定(购)滑铁卢大学毕业证(UW毕业证)成绩单学位证
一比一定(购)滑铁卢大学毕业证(UW毕业证)成绩单学位证一比一定(购)滑铁卢大学毕业证(UW毕业证)成绩单学位证
一比一定(购)滑铁卢大学毕业证(UW毕业证)成绩单学位证
wpkuukw
 
一比一原版(WLU毕业证)罗瑞尔大学毕业证成绩单留信学历认证原版一模一样
一比一原版(WLU毕业证)罗瑞尔大学毕业证成绩单留信学历认证原版一模一样一比一原版(WLU毕业证)罗瑞尔大学毕业证成绩单留信学历认证原版一模一样
一比一原版(WLU毕业证)罗瑞尔大学毕业证成绩单留信学历认证原版一模一样
awasv46j
 
In Saudi Arabia Jeddah (+918761049707)) Buy Abortion Pills For Sale in Riyadh
In Saudi Arabia Jeddah (+918761049707)) Buy Abortion Pills For Sale in RiyadhIn Saudi Arabia Jeddah (+918761049707)) Buy Abortion Pills For Sale in Riyadh
In Saudi Arabia Jeddah (+918761049707)) Buy Abortion Pills For Sale in Riyadh
ahmedjiabur940
 
一比一原版澳洲堪培拉大学毕业证(UC毕业证书)毕业证成绩单留信认证
一比一原版澳洲堪培拉大学毕业证(UC毕业证书)毕业证成绩单留信认证一比一原版澳洲堪培拉大学毕业证(UC毕业证书)毕业证成绩单留信认证
一比一原版澳洲堪培拉大学毕业证(UC毕业证书)毕业证成绩单留信认证
khuurq8kz
 
如何办理(Columbia College毕业证书)纽约市哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(Columbia College毕业证书)纽约市哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(Columbia College毕业证书)纽约市哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(Columbia College毕业证书)纽约市哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证
ugzga
 
Resume all my skills and educations and achievement
Resume all my skills and educations and  achievement Resume all my skills and educations and  achievement
Resume all my skills and educations and achievement
210303105569
 

Recently uploaded (20)

一比一定(购)卡尔顿大学毕业证(CU毕业证)成绩单学位证
一比一定(购)卡尔顿大学毕业证(CU毕业证)成绩单学位证一比一定(购)卡尔顿大学毕业证(CU毕业证)成绩单学位证
一比一定(购)卡尔顿大学毕业证(CU毕业证)成绩单学位证
 
Furniture & Joinery Details_Designs.pptx
Furniture & Joinery Details_Designs.pptxFurniture & Joinery Details_Designs.pptx
Furniture & Joinery Details_Designs.pptx
 
Jual Obat Aborsi Semarang ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Semarang ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Semarang ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Semarang ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...
Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...
Abortion pills in Kuwait 🚚+966505195917 but home delivery available in Kuwait...
 
Design-System - FinTech - Isadora Agency
Design-System - FinTech - Isadora AgencyDesign-System - FinTech - Isadora Agency
Design-System - FinTech - Isadora Agency
 
How to Build a Simple Shopify Website
How to Build a Simple Shopify WebsiteHow to Build a Simple Shopify Website
How to Build a Simple Shopify Website
 
Gamestore case study UI UX by Amgad Ibrahim
Gamestore case study UI UX by Amgad IbrahimGamestore case study UI UX by Amgad Ibrahim
Gamestore case study UI UX by Amgad Ibrahim
 
18+ Young ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Girl Serviℂ...
18+ Young ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Girl Serviℂ...18+ Young ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Girl Serviℂ...
18+ Young ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Girl Serviℂ...
 
Essential UI/UX Design Principles: A Comprehensive Guide
Essential UI/UX Design Principles: A Comprehensive GuideEssential UI/UX Design Principles: A Comprehensive Guide
Essential UI/UX Design Principles: A Comprehensive Guide
 
一比一定(购)滑铁卢大学毕业证(UW毕业证)成绩单学位证
一比一定(购)滑铁卢大学毕业证(UW毕业证)成绩单学位证一比一定(购)滑铁卢大学毕业证(UW毕业证)成绩单学位证
一比一定(购)滑铁卢大学毕业证(UW毕业证)成绩单学位证
 
一比一原版(WLU毕业证)罗瑞尔大学毕业证成绩单留信学历认证原版一模一样
一比一原版(WLU毕业证)罗瑞尔大学毕业证成绩单留信学历认证原版一模一样一比一原版(WLU毕业证)罗瑞尔大学毕业证成绩单留信学历认证原版一模一样
一比一原版(WLU毕业证)罗瑞尔大学毕业证成绩单留信学历认证原版一模一样
 
Redefining Affordable Housing in Gurgaon The Role of Housing Architects from ...
Redefining Affordable Housing in Gurgaon The Role of Housing Architects from ...Redefining Affordable Housing in Gurgaon The Role of Housing Architects from ...
Redefining Affordable Housing in Gurgaon The Role of Housing Architects from ...
 
TRose UXPA Experience Design Concord .pptx
TRose UXPA Experience Design Concord .pptxTRose UXPA Experience Design Concord .pptx
TRose UXPA Experience Design Concord .pptx
 
In Saudi Arabia Jeddah (+918761049707)) Buy Abortion Pills For Sale in Riyadh
In Saudi Arabia Jeddah (+918761049707)) Buy Abortion Pills For Sale in RiyadhIn Saudi Arabia Jeddah (+918761049707)) Buy Abortion Pills For Sale in Riyadh
In Saudi Arabia Jeddah (+918761049707)) Buy Abortion Pills For Sale in Riyadh
 
Spring Summer 26 Colors Trend Book Peclers Paris
Spring Summer 26 Colors Trend Book Peclers ParisSpring Summer 26 Colors Trend Book Peclers Paris
Spring Summer 26 Colors Trend Book Peclers Paris
 
一比一原版澳洲堪培拉大学毕业证(UC毕业证书)毕业证成绩单留信认证
一比一原版澳洲堪培拉大学毕业证(UC毕业证书)毕业证成绩单留信认证一比一原版澳洲堪培拉大学毕业证(UC毕业证书)毕业证成绩单留信认证
一比一原版澳洲堪培拉大学毕业证(UC毕业证书)毕业证成绩单留信认证
 
如何办理(Columbia College毕业证书)纽约市哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(Columbia College毕业证书)纽约市哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(Columbia College毕业证书)纽约市哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(Columbia College毕业证书)纽约市哥伦比亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Academic Portfolio (2017-2021) .pdf
Academic Portfolio (2017-2021)      .pdfAcademic Portfolio (2017-2021)      .pdf
Academic Portfolio (2017-2021) .pdf
 
NO1 Top Pakistani Amil Baba Real Amil baba In Pakistan Najoomi Baba in Pakist...
NO1 Top Pakistani Amil Baba Real Amil baba In Pakistan Najoomi Baba in Pakist...NO1 Top Pakistani Amil Baba Real Amil baba In Pakistan Najoomi Baba in Pakist...
NO1 Top Pakistani Amil Baba Real Amil baba In Pakistan Najoomi Baba in Pakist...
 
Resume all my skills and educations and achievement
Resume all my skills and educations and  achievement Resume all my skills and educations and  achievement
Resume all my skills and educations and achievement
 

Fundamentals Of Statistical Signal Processing--Estimation Theory-Kay.pdf

  • 1. PRENTICE HALL SIGNAL PROCESSING SERIES Alan V. Oppenheim, Series Editor ANDREWS AND HUNT BRIGHAMThe Fast Fourier Tmnsform BRIGHAM BURDIC CASTLEMAN Digital Image Processing COWAN AND GRANT Adaptive Filters CROCHIERE AND RABINER DUDGEON AND MERSEREAU HAMMING Digital Filters, 3/E HAYKIN, ED. HAYKIN, ED. Array Signal Processing JAYANT AND NOLL JOHNSON AND DUDGEON KAY KAY Modern Spectral Estimation KINO LEA, ED. LIM LIM, ED. Speech Enhancement LIM AND OPPENHEIM,EDS. MARPLE MCCLELLAN AND RADER MENDEL OPPENHEIM, ED. OPPENHEIM AND NAWAB, EDS. OPPENHEIM, WILLSKY, WITH YOUNG OPPENHEIM AND SCHAFER Digital Signal Processing OPPENHEIM AND SCHAFERDiscrete- Time Signal Processing QUACKENBUSH ET AL. Objective Measures of Speech Quality RABINER AND GOLD RABINER AND SCHAFERDigital Processing of Speech Signals ROBINSON AND TREITEL STEARNS AND DAVID STEARNS AND HUSH TRIBOLET VAIDYANATHAN WIDROW AND STEARNS Digital Image Restomtion The Fast Fourier Transform and Its Applications Underwater Acoustic System Analysis, 2/E Multimte Digital Signal Processing Multidimensional Digital Signal Processing Advances in Spectrum Analysis and Array Processing, Vols. I € 5 II Digital Coding of waveforms Array Signal Processing: Concepts and Techniques Fundamentals of Statistical Signal Processing: Estimation Theory Acoustic Waves: Devices, Imaging, and Analog Signal Processing Trends in Speech Recognition Two-Dimensional Signal and Image Processing Advanced Topics in Signal Processing Digital Spectral Analysis with Applications Lessons in Digital Estimation Theory Number Theory a n Digital Signal Processing Applications of Digital Signal Processing Symbolic and Knowledge-Based Signal Processing Signals and Systems Theory and Applications of Digital Signal Processing Geophysical Signal Analysis Signal Processing Algorithms Digital Signal Analysis, 2/E Seismic Applications of Homomorphic Signal Processing Multimte Systems and Filter Banks Adaptive Signal Processing Fundamentals of Statistical Signal Processing: Estimation Theory Steven M. Kay University of Rhode Island For book and bookstoreinformation I I http://wmn.prenhrll.com gopherto gopher.prenhall.com Upper Saddle River, NJ 07458
  • 2. Contents Preface xi 1 Introduction 1 1.1 Estimation in Signal Processing . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The Mathematical Estimation Problem . . . . . . . . . . . . . . . . . . 7 1.3 Assessing Estimator Performance . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Some Notes to the Reader . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Minimum Variance Unbiased Estimation 15 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Unbiased Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Minimum Variance Criterion . . . . . . . . . . . . . . . . . . . . . . . . 19 2.5 Existence of the Minimum Variance Unbiased Estimator . . . . . . . . . 20 2.6 Finding the Minimum Variance Unbiased Estimator . . . . . . . . . . . 21 2.7 Extension to a Vector Parameter . . . . . . . . . . . . . . . . . . . . . . 22 3 Cramer-Rao Lower Bound 27 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 Estimator Accuracy Considerations . . . . . . . . . . . . . . . . . . . . . 28 3.4 Cramer-Rao Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5 General CRLB for Signals in White Gaussian Noise . . . . . . . . . . . . 35 3.6 Transformation of Parameters . . . . . . . . . . . . . . . . . . . . . . . . 37 3.7 Extension to a Vector Parameter . . . . . . . . . . . . . . . . . . . . . . 39 3.8 Vector Parameter CRLB for Transformations . . . . . . . . . . . . . . . 45 3.9 CRLB for the General Gaussian Case . . . . . . . . . . . . . . . . . . . 47 3.10 Asymptotic CRLB for WSS Gaussian Random Processes . . . . . . . . . 50 3.11 Signal Processing Examples . . . . . . . . . . . . . . . . . . . . . . . . . 53 3A Derivation of Scalar Parameter CRLB . . . . . . . . . . . . . . . . . . . 67 3B Derivation of Vector Parameter CRLB . . . . . . . . . . . . . . . . . . . 70 3C Derivation of General Gaussian CRLB . . . . . . . . . . . . . . . . . . . 73 3D Derivation of Asymptotic CRLB . . . . . . . . . . . . . . . . . . . . . . 77 vii
  • 3. viii CONTENTS 4 Linear Models 4.1 Introduction ....... . 4.2 Summary ........ . 4.3 Definition and Properties 4.4 Linear Model Examples 4.5 Extension to the Linear Model 5 General Minimum Variance Unbiased Estimation 5.1 Introduction ... . 5.2 Summary ......... . 5.3 Sufficient Statistics . . . . . 5.4 Finding Sufficient Statistics 5.5 Using Sufficiency to Find the MVU Estimator. 5.6 Extension to a Vector Parameter ....... . 5A Proof of Neyman-Fisher Factorization Theorem (Scalar Parameter) . 5B Proof of Rao-Blackwell-Lehmann-Scheffe Theorem (Scalar Parameter) 6 Best Linear Unbiased Estimators 6.1 Introduction....... 6.2 Summary . . . . . . . . 6.3 Definition of the BLUE 6.4 Finding the BLUE ... 6.5 Extension to a Vector Parameter 6.6 Signal Processing Example 6A Derivation of Scalar BLUE 6B Derivation of Vector BLUE 7 Maximum Likelihood Estimation 7.1 Introduction. 7.2 Summary . . . . 7.3 An Example ... 7.4 Finding the MLE 7.5 Properties of the MLE 7.6 MLE for Transformed Parameters 7.7 Numerical Determination of the MLE 7.8 Extension to a Vector Parameter 7.9 Asymptotic MLE ..... . 7.10 Signal Processing Examples ... 7A Monte Carlo Methods . . . . . . 7B Asymptotic PDF of MLE for a Scalar Parameter 7C Derivation of Conditional Log-Likelihood for EM Algorithm Example 8 Least Squares 8.1 Introduction. 8.2 Summary . . 83 83 83 83 86 94 101 101 101 102 104 107 116 127 130 133 133 133 134 136 139 141 151 153 157 157 157 158 162 164 173 177 182 190 191 205 211 214 219 219 219 CONTENTS 3 The Least Squares Approach 8. 8.4 Linear Least Squares . . . . . 8.5 Geometrical Interpretations 8.6 Order-Recursive Least Squares 8.7 Sequential Least Squares . . 8.8 Constrained Least Squares . . . 8.9 Nonlinear Least Squares ... . 8.10 Signal Processing Examples ......... . 8A Derivation of Order-Recursive Least Squares. 8B Derivation of Recursive Projection Matrix 8C Derivation of Sequential Least Squares 9 Method of Moments 9.1 Introduction .... 9.2 Summary . . . . . 9.3 Method of Moments 9.4 Extension to a Vector Parameter 9.5 Statistical Evaluation of Estimators 9.6 Signal Processing Example 10 The Bayesian Philosophy 10.1 Introduction ...... . 10.2 Summary . . . . . . . . 10.3 Prior Knowledge and Estimation 10.4 Choosing a Prior PDF ..... . 10.5 Properties of the Gaussian PDF. 10.6 Bayesian Linear Model . . . . . . 10.7 Nuisance Parameters ............... . 10.8 Bayesian Estimation for Deterministic Parameters lOA Derivation of Conditional Gaussian PDF. 11 General Bayesian Estimators 11.1 Introduction .. 11.2 Summary ......... . 11.3 Risk Functions ...... . 11.4 Minimum Mean Square Error Estimators 11.5 Maximum A Posteriori Estimators . . . . 11.6 Performance Description ......... . 11.7 Signal Processing Example ......... : ........... . llA Conversion of Continuous-Time System to DIscrete-TIme System 12 Linear Bayesian Estimators 12.1 Introduction ....... . 12.2 Summary . . . . . . . . . 12.3 Linear MMSE Estimation ix 220 223 226 232 242 251 254 260 282 285 286 289 289 289 289 292 294 299 309 309 309 310 316 321 325 328 330 337 341 341 341 342 344 350 359 365 375 379 379 379 380
  • 4. x 12.4 Geometrical Interpretations .. 12.5 The Vector LMMSE Estimator 12.6 Sequential LMMSE Estimation 12.7 Signal Processing Examples - Wiener Filtering 12A Derivation of Sequential LMMSE Estimator 13 Kalman Filters 13.1 Introduction ....... . 13.2 Summary . . . . . . . . . 13.3 Dynamical Signal Models 13.4 Scalar Kalman Filter 13.5 Kalman Versus Wiener Filters. 13.6 Vector Kalman Filter. . . . 13.7 Extended Kalman Filter . . . . 13.8 Signal Processing Examples . . . . . 13A Vector Kalman Filter Derivation .. 13B Extended Kalman Filter Derivation. 14 Sununary of Estimators 14.1 Introduction. . . . . . 14.2 Estimation Approaches. 14.3 Linear Model . . . . . . 14.4 Choosing an Estimator. 15 Extensions for Complex Data and Parameters 15.1 Introduction .......... . 15.2 Summary . . . . . . . . . . . . . . . . 15.3 Complex Data and Parameters . . . . 15.4 Complex Random Variables and PDFs 15.5 Complex WSS Random Processes ... 15.6 Derivatives, Gradients, and Optimization 15.7 Classical Estimation with Complex Data. 15.8 Bayesian Estimation ........ . 15.9 Asymptotic Complex Gaussian PDF . . . 15.10Signal Processing Examples ....... . 15A Derivation of Properties of Complex Covariance Matrices 15B Derivation of Properties of Complex Gaussian PDF. 15C Derivation of CRLB and MLE Formulas . . . . . . . Al Review of Important Concepts Al.l Linear and Matrix Algebra ............... . Al.2 Probability, Random Processes. and Time Series Models A2 Glc>ssary of Symbols and Abbreviations INDEX CONTENTS 384 389 392 400 415 419 419 419 420 431 442 446 449 452 471 476 479 479 479 486 489 493 493 493 494 500 513 517 524 532 535 539 555 558 563 567 567 574 583 589 Preface Parameter estimation is a subject that is standard fare in the many books available on statistics. These books range from the highly theoretical expositions written by statisticians to the more practical treatments contributed by the many users of applied statistics. This text is an attempt to strike a balance between these two extremes. The particular audience we have in mind is the community involved in the design and implementation of signal processing algorithms. As such, the primary focus is on obtaining optimal estimation algorithms that may be implemented on a digital computer. The data sets are therefore assumed. to be sa~ples of a continuous-t.ime waveform or a sequence of data points. The chOice of tOpiCS reflects what we believe to be the important approaches to obtaining an optimal estimator and analyzing its performance. As a consequence, some of the deeper theoretical issues have been omitted with references given instead. It is the author's opinion that the best way to assimilate the material on parameter estimation is by exposure to and working with good examples. Consequently, there are numerous examples that illustrate the theory and others that apply the theory to actual signal processing problems of current interest. Additionally, an abundance of homework problems have been included. They range from simple applications of the theory to extensions of the basic concepts. A solutions manual is available from the publisher. To aid the reader, summary sections have been provided at the beginning of each chapter. Also, an overview of all the principal estimation approaches and the rationale for choosing a particular estimator can be found in Chapter 14. Classical estimation is first discussed in Chapters 2-9, followed by Bayesian estimation in Chapters 10-13. This delineation will, hopefully, help to clarify the basic differences between these two principal approaches. Finally, again in the interest of clarity, we present the estimation principles for scalar parameters first, followed by their vector extensions. This is because the matrix algebra required for the vector estimators can sometimes obscure the main concepts. This book is an outgrowth of a one-semester graduate level course on estimation theory given at the University of Rhode Island. It includes somewhat more material than can actually be covered in one semester. We typically cover most of Chapters 1-12, leaving the subjects of Kalman filtering and complex data/parameter extensions to the student. The necessary background that has been assumed is an exposure to the basic theory of digital signal processing, probability and random processes, and linear xi
  • 5. xii PREFACE and matrix algebra. This book can also be used for self-study and so should be useful to the practicing engin.eer as well as the student. The author would like to acknowledge the contributions of the many people who over the years have provided stimulating discussions of research problems, opportuni- ties to apply the results of that research, and support for conducting research. Thanks are due to my colleagues L. Jackson, R. Kumaresan, L. Pakula, and D. Tufts of the University of Rhode Island, and 1. Scharf of the University of Colorado. Exposure to practical problems, leading to new research directions, has been provided by H. Wood- sum of Sonetech, Bedford, New Hampshire, and by D. Mook, S. Lang, C. Myers, and D. Morgan of Lockheed-Sanders, Nashua, New Hampshire. The opportunity to apply estimation theory to sonar and the research support of J. Kelly of the Naval Under- sea Warfare Center, Newport, Rhode Island, J. Salisbury of Analysis and Technology, Middletown, Rhode Island (formerly of the Naval Undersea Warfare Center), and D. Sheldon of th.e Naval Undersea Warfare Center, New London, Connecticut, are also greatly appreciated. Thanks are due to J. Sjogren of the Air Force Office of Scientific Research, whose continued support has allowed the author to investigate the field of statistical estimation. A debt of gratitude is owed to all my current and former grad- uate students. They have contributed to the final manuscript through many hours of pedagogical and research discussions as well as by their specific comments and ques- tions. In particular, P. Djuric of the State University of New York proofread much of the manuscript, and V. Nagesha of the University of Rhode Island proofread the manuscript and helped with the problem solutions. Steven M. Kay University of Rhode Island Kingston, RI 02881 r t Chapter 1 Introduction 1.1 Estimation in Signal Processing Modern estimation theory can be found at the heart of many electronic signal processing systems designed to extract information. These systems include 1. Radar 2. Sonar 3. Speech 4. Image analysis 5. Biomedicine 6. Communications 7. Control 8. Seismology, and all share the common problem of needing to estimate the values of a group of pa- rameters. We briefly describe the first three of these systems. In radar we are mterested in determining the position of an aircraft, as for example, in airport surveillance radar [Skolnik 1980]. To determine the range R we transmit an electromagnetic pulse that is reflected by the aircraft, causin an echo to be received b the antenna To seconds later~ as shown in igure 1.1a. The range is determined by the equation TO = 2R/c, where c is the speed of electromagnetic propagation. Clearly, if the round trip delay To can be measured, then so can the range. A typical transmit pulse and received waveform a:e shown in Figure 1.1b. The received echo is decreased in amplitude due to propaga- tIon losses and hence may be obscured by environmental nois~. Its onset may also be perturbed by time delays introduced by the electronics of the receiver. Determination of the round trip delay can therefore require more than just a means of detecting a jump in the power level at the receiver. It is important to note that a typical modern l
  • 6. 2 Transmit/ receive antenna Transmit pulse '-----+01 Radar processing system (a) Radar ....................... - - .................... -- ... -1 Received waveform :---------_ ... -_... _-------, TO ~--------- ... ------ .. -- __ ..! CHAPTER 1. INTRODUCTION Time Time (b) Transmit and received waveforms Figure 1.1 Radar system radar s!,stem will input the received continuous-time waveform into a digital computer by takmg samples via an analog-to-digital convertor. Once the waveform has been sampled, the data compose a time series. (See also Examples 3.13 and 7.15 for a more detailed description of this problem and optimal estimation procedures.) Another common application is in sonar, in which we are also interested in the posi~ion of a target, such as a submarine [Knight et al. 1981, Burdic 1984] . A typical passive sonar is shown in Figure 1.2a. The target radiates noise due to machiner:y on board, propellor action, etc. This noise, which is actually the signal of interest, propagates through the water and is received by an array of sensors. The sensor outputs 1.1. ESTIMATION IN SIGNAL PROCESSING 3 Sea surface Towed array Sea bottom ---------------~~---------------------------~ (a) Passive sonar Sensor 1 output ~ Time ~'C7~ Time Sensor 3 output f ~ ~ / Time (b) Received signals at array sensors Figure 1.2 Passive sonar system are then transmitted to a tow ship for input to a digital computer. Because of the positions of the sensors relative to the arrival angle of the target signal, we receive the signals shown in Figure 1.2b. By measuring TO, the delay between sensors, we can determine the bearing f3 Z<!T t~e ~~ress.!.o~ ( eTO) f3 = arccos d (1.1) where c is the speed of sound in water and d is the distance between sensors (see Examples 3.15 and 7.17 for a more detailed description). Again, however, the received
  • 7. 4 CHAPTER 1. INTRODUCTION -..... :~ <0 -..... "& 0 .;, -1 -2 -3 0 2 4 6 8 10 12 14 16 18 20 Time (ms) o 8 10 14 Time (ms) Figure 1.3 Examples of speech sounds waveforms are not "clean" as shown in Figure 1.2b but are embedded in noise, making, the determination of To more difficult. The value of (3 obtained from (1.1) is then onli( an estimate. - Another application is in speech processing systems [Rabiner and Schafer 1978]. A particularly important problem is speech recognition, which is the recognition of speech by a machine (digital computer). The simplest example of this is in recognizing individual speech sounds or phonemes. Phonemes are the vowels, consonants, etc., or the fundamental sounds of speech. As an example, the vowels /al and /e/ are shown in Figure 1.3. Note that they are eriodic waveforms whose eriod is called the pitch. To recognize whether a sound is an la or an lei the following simple strategy might be employed. Have the person whose speech is to be recognized say each vowel three times and store the waveforms. To reco nize the s oken vowel com are it to the stored vowe s and choose the one that is closest to the spoken vowel or the one that 1.1. ESTIMATION IN SIGNAL PROCESSING S :s -..... <0 -..... <0 .... ., u " C. "' u. p.. -10 ...:l "'0 -20 <= :d -30~ S E ! !?!' -40-+ -0 ! c -50 I .;:: " 0 p.. S ::=.. 30 1 -..... '" 2°i -..... <0 t 10-+ u ::; C. 0 '" U "- -10 ...:l "2 -20 id I ~ -30 1 i 500 1000 1500 Frequency (Hz) 2000 I 2500 ~ -40-+ ~ -50il--------~I--------_r1--------TI--------T-------~---- 0:: 0 500 1000 1500 2000 2500 Frequency (Hz) Figure 1.4 LPC spectral modeling 5 minimizes some distance measure. Difficulties arise if the itch of the speaker's voice c anges from the time he or s e recor s the sounds (the training session) to the time when the speech recognizer is used. This is a natural variability due to the nature of human speech. In practice, attributes, other than the waveforms themselves, are used to measure distance. Attributes are chosen that are less sllsceptible to variation. For example, the spectral envelope will not change with pitch since the Fourier transform of a periodic signal is a sampled version of the Fourier transform of one period of the signal. The period affects only the spacing between frequency samples, not the values. To extract the s ectral envelo e we em 10 a model of s eech called linear predictive coding LPC). The parameters of the model determine the s ectral envelope. For the speec soun SIll 19ure 1.3 the power spectrum (magnitude-squared Fourier transform divided by the number of time samples) or periodogram and the estimated LPC spectral envelope are shown in Figure 1.4. (See Examples 3.16 and 7.18 for a description of how
  • 8. 6 CHAPTER 1. INTRODUCTION the parameters of the model are estimated and used to find the spectral envelope.) It is interesting that in this example a human interpreter can easily discern the spoken vowel. The real problem then is to design a machine that is able to do the same. In the radar/sonar problem a human interpreter would be unable to determine the target position from the received waveforms, so that the machine acts as an indispensable tool. In all these systems we are faced with the problem of extracting values of parameters bas~ on continuous-time waveforms. Due to the use of di ital com uters to sample and store e contmuous-time wave orm, we have the equivalent problem of extractin parameter values from a discrete-time waveform or a data set. at ematically, we have the N-point data set {x[O], x[I], ... ,x[N -In which depends on an unknown parameter (). We wish to determine () based on the data or to define an estimator {J = g(x[O],x[I], ... ,x[N - 1]) (1.2) where 9 is some function. This is the problem of pammeter estimation, which is the subject of this book. Although electrical engineers at one time designed systems based on analog signals and analog circuits, the current and future trend is based on discrete- time signals or sequences and digital circuitry. With this transition the estimation problem has evolved into one of estimating a parameter based on a time series, which is just a discrete-time process. Furthermore, because the amount of data is necessarily finite, we are faced with the determination of 9 as in (1.2). Therefore, our problem has now evolved into one which has a long and glorious history, dating back to Gauss who in 1795 used least squares data analysis to predict planetary m(Wements [Gauss 1963 (English translation)]. All the theory and techniques of statisti~al estimation are at our disposal [Cox and Hinkley 1974, Kendall and Stuart 1976-1979, Rao 1973, Zacks 1981]. Before concluding our discussion of application areas we complete the previous list. 4. Image analysis - Elstimate the position and orientation of an object from a camera image, necessary when using a robot to pick up an object [Jain 1989] 5. Biomedicine - estimate the heart rate of a fetu~ [Widrow and Stearns 1985] 6. Communications - estimate the carrier frequency of a signal so that the signal can be demodulated to baseband [Proakis 1983] 1. Control - estimate the position of a powerboat so that corrective navigational action can be taken, as in a LORAN system [Dabbous 1988] 8. Seismology - estimate the underground distance of an oil deposit based on SOUD& reflections dueto the different densities of oil and rock layers [Justice 1985]. Finally, the multitude of applications stemming from analysis of data from physical experiments, economics, etc., should also be mentioned [Box and Jenkins 1970, Holm and Hovem 1979, Schuster 1898, Taylor 1986]. 1.2. THE MATHEMATICAL ESTIMATION PROBLEM 7 x[O] Figure 1.5 Dependence of PDF on unknown parameter 1.2 The Mathematical Estimation Problem In determining good .estimators the first step is to mathematically model the data. ~ecause the data are mherently random, we describe it by it§, probability density func- tion (PDF) 01:" p(x[O], x[I], ... ,x[N - 1]; ()). The PDF is parameterized by the unknown l2arameter ()J I.e., we have a class of PDFs where each one is different due to a different value of (). We will use a semicolon to denote this dependence. As an example, if N = 1 and () denotes the mean, then the PDF of the data might be p(x[O]; ()) = .:-." exp [__I_(x[O] _ ())2] v 27rO'2 20'2 which is shown in Figure 1.5 for various values of (). It should be intuitively clear that because the value of () affects the probability of xiO], we should be able to infer the value of () from the observed value of x[OL For example, if the value of x[O] is negative, it is doubtful tha~ () =:'.()2' :rhe value. (). = ()l might be more reasonable, This specification of th~ PDF IS cntlcal m determmmg a good estima~. In an actual problem we are not glv~n a PDF but .must choose one that is not only consistent with the problem ~onstramts and any pnor knowledge, but one that is also mathematically tractable. To ~llus~rate the appr~ach consider the hypothetical Dow-Jones industrial average shown IP. FIgure 1.6. It. mIght be conjectured that this data, although appearing to fluctuate WIldly, actually IS "on the average" increasing. To determine if this is true we could assume that the data actually consist of a straight line embedded in random noise or x[n] =A+Bn+w[n] n = 0, 1, ... ,N - 1. ~ reasonable model for the noise is that win] is white Gaussian noise (WGN) or each sample of win] has the PDF N(0,O'2 ) (denotes a Gaussian distribution with a mean of 0 and a variance of 0'2) and is uncorrelated with all the other samples. Then, the unknown parameters are A and B, which arranged as a vector become the vector parameter 9 = [A Bf. Letting x = [x[O] x[I] ... x[N - lW, the PDF is 1 [1 N-l ] p(x; 9) = (27rO'2)~ exp - 20'2 ~ (x[n]- A - Bn)2 . (1.3) The choice of a straight line for the signal component is consistent with the knowledge that the Dow-Jones average is hovering around 3000 (A models this) and the conjecture
  • 9. 8 CHAPTER 1. INTRODUCTION 3200 3150 ~ <'$ 3100 ... " 3050 ~ 'Il 3000 " ~ 0 .-, 2950 ~ 0 2900 0 2850 2800 0 10 20 30 40 50 60 70 80 90 100 Day number Figure 1.6 Hypothetical Dow-Jones average that it is increasing (B > 0 models this). The assumption of WGN is justified by the need to formulate a mathematically tractable model so that closed form estimators can be found. Also, it is reasonable unless there is strong evidence to the contrary, such as highly correlated noise. Of course, the performance of any estimator obtained will be critically dependent on the PDF assum tions. We can onl hope the estimator obtained is robust, in that slight changes in the PDF do not severely affect t per ormance of the estimator. More conservative approaches utilize robust statistical procedures [Huber 1981J. Estimation based on PDFs such as (1.3) is termed classical estimation in that the parameters of interest are assumed to be deterministic but unknown. In the Dow-Jo~ average example we know a priori that the mean is somewhere around 3000. It seems inconsistent with reality, then, to choose an estimator of A that can result in values as low as 2000 or as high as 4000. We might be more willing to constrain the estimator to produce values of A in the range [2800, 3200J. To incorporate this prior knowledge we can assume that A is no Ion er deterministic but a random variable and assign it a DF, possibly uni orm over the [2800, 3200J interval. Then, any subsequent estImator will yield values in this range. Such an approach is termed Bayesian estimation. The parameter we are attem tin to estimate is then viewed as a realization of the randQ; , the data are described by the joint PDF p(x,9) =p(xI9)p(9) 1.3. ASSESSING ESTIMATOR PERFORMANCE 3.0 i 2.5-+ 1.0 "F. fl 0.5 0.0 -1.0 -1.5 -{)'5~ -2.0-r--'I--il--il--il--il--II--II__"1'I__-+1----<1 o ill W ~ ~ M W m W 00 ~ n Figure 1.7 Realization of DC level in noise Once the PDF has been specified the problem becomes one f d t .. " . ' 0 e ermmmg an optImal estImator or functlOn of the data as in (1 2) Note that t' t ' . . an es Ima or may depend on other par~meters, but only if they are known. An estimator may be thought of as a rule that ~Sl ns a value to 9 for each realization of x. The estimate of 9 is the va ue o. 9 obtal~ed .for a given realization of x. This distinction is analogous to a random vanable (whIch IS a f~nction defined on the sample space) and the value it takes on. Althoug~ some authors dIstinguish between the two by using capital and lowercase letters, we WIll not do so. The meaning will, hopefully, be clear from the context. 1.3 Assessing Estimator Performance Consider.the data set shown in Figure 1.7. From a cursory inspection it appears that x[n] conslst~ of.a DC.level A in noise. (The use of the term DC is in reference to direct current, whIch IS eqUlvalent to the constant function.) We could model the data as x[nJ =A +wIn] ;~re w n denotes so~e zero ~ean noise ro~~ss. B~ed on the data set {x[O], x[1], .. ., ( [[ .l]), we would .hke to estImate A. IntUltlvely, smce A is the average level of x[nJ w nJ IS zero mean), It would be reasonable to estimate A as I N-l .4= N Lx[nJ n=O or by the sample mean of the data. Several questions come to mind: I 1. How close will .4 be to A? ' 2. Are there better estimators than the sample mean? 9
  • 10. 10 CHAPTER 1. INTRODUCTION For the data set in Figure 1.7 it turns out that .1= 0.9, which is close to the true value of A = 1. Another estimator might be A=x[o]. Intuitively, we would not expect this estimator to perform as well since it does not make use of all the data. There is no averaging to reduce the noise effects. However, for the data set in Figure 1.7, A = 0.95, which is closer to the true value of A than the sample mean estimate. Can we conclude that A is a better estimator than A? The answer is of course no. Because an estimator is a function of the data, which are random variables, it too is a random variable, subject to many possible outcomes. The fact that A is closer to the true value only means that for the given realization of data, as shown in Figure 1.7, the estimate A = 0.95 (or realization of A) is closer to the true value than the estimate .1= 0.9 (or realization of A). To assess performance we must do so statistically. One possibility would be to repeat the experiment that generated the data and apply each estimator to every data set. Then, we could ask which estimator produces a better estimate in the majority of the cases. Suppose we repeat the experiment by fixing A =1 and adding different noise realizations of win] to generate an ensemble of realizations of x[n]. Then, we determine the values of the two estimators for each data set and finally plot the histograms. (A histogram describes the number of times the estimator produces a given range of values and is an approximation to the PDF.) For 100 realizations the histograms are shown in Figure 1.8. It should now be evident that A is a better estimator than A because the values obtained are more concentrated about the true value of A =1. Hence, Awill uliWl-lly produce a value closer to the true one than A. The skeptic, however, might argue-that if we repeat the experiment 1000 times instead, then the histogram of A will be more concentrated. To dispel this notion, we cannot repeat the experiment 1000 times, for surely the skeptic would then reassert his or her conjecture for 10,000 experiments. To prove that A is better we could establish that the variance is less. The modeling assumptions that we must employ are that the w[n]'s, in addition to being zero mean, are uncorrelated and have equal variance u 2 . Then, we first show that the mean of each estimator is the true value or ( 1 N-l ) E(A) E N ~ x[nJ 1 N-l = N L E(x[n]) n=O A E(A) E(x[O]) A so that on the average the estimators produce the true value. Second, the variances are ( 1 N-l ) var(A) = var N ~ x[nJ 1.3. ASSESSING ESTIMATOR PERFORMANCE 30 r '" 25j '0 15 lil 1 20 j r:~i------rl- r - I------;--------1~JI~m---r---I_ -3 -2 -1 0 2 ~ -1 Sample mean value, A o 'I 1 i I First sample value, A 2 3 Figure 1.8 Histograms for sample mean and first sample estimator 1 N-l N2 L var(x[nJ) n=O 1 2 N2 Nu u2 N since the w[nJ's are uncorrelated and thus var(A) var(x[OJ) u2 > var(A). 11
  • 11. 12 CHAPTER 1. INTRODUCTION Furthermore, if we could assume that w[n] is Gaussian, we could also conclude that the probability of a given magnitude error is less for A. than for A (see Problem 2.7). ISeveral important points are illustrated by the previous example, which should always be ept in mind. 1. An estimator is a random variable. As such, its erformance can onl be com- pletely descri e statistical y or by its PDF. 2. The use of computer simulations for assessing estimation performance, although quite valuable for gaiiiing insight and motivating conjectures, is never conclusive. At best, the true performance may be obtained to the desired degree of accuracy. At worst, for an insufficient number of experiments and/or errors in the simulation techniques employed, erroneous results may be obtained (see Appendix 7A for a further discussion of Monte Carlo computer techniques). Another theme that we will repeatedly encounter is the tradeoff between perfor: mance and computational complexity. As in the previous example, even though A has better performance, it also requires more computation. We will see that QPtimal estimators can sometimes be difficult to implement, requiring a multidimensional opti- mization or inte ration. In these situations, alternative estimators that are suboptimal, but which can be implemented on a igita computer, may be preferred. For any par- ticular application, the user must determine whether the loss in performance is offset by the reduced computational complexity of a suboptimal estimator. 1.4 Some Notes to the Reader Our philosophy in presenting a theory of estimation is to provide the user with the main ideas necessary for determining optimal estimator.§. We have included results that we deem to be most useful in practice, omitting some important theoretical issues. The latter can be found in many books on statistical estimation theory which have been written from a more theoretical viewpoint [Cox and Hinkley 1974, Kendall and Stuart 1976--1979, Rao 1973, Zacks 1981]. As mentioned previously, our goal is t<;) obtain an 0 timal estimator, and we resort to a subo timal one if the former cannot be found or is not implementa ~. The sequence of chapters in this book follows this approach, so that optimal estimators are discussed first, followed by approximately optimal estimators, and finally suboptimal estimators. In Chapter 14 a "road map" for finding a good estimator is presented along with a summary of the various estimators and their properties. The reader may wish to read this chapter first to obtain an overview. We have tried to maximize insight by including many examples and minimizing long mathematical expositions, although much of the tedious algebra and proofs have been included as appendices. The DC level in noise described earlier will serve as a standard example in introducing almost all the estimation approaches. It is hoped that in doing so the reader will be able to develop his or her own intuition by building upon previously assimilated concepts. Also, where possible, the scalar estimator is REFERENCES 13 presented first followed by the vector estimator. This approach reduces the tendency of vector/matrix algebra to obscure the main ideas. Finally, classical estimation is described first, followed by Bayesian estimation, again in the interest of not obscuring the main issues. The estimators obtained using the two approaches, although similar in appearance, are fundamentally different. The mathematical notation for all common symbols is summarized in Appendix 2. The distinction between a continuous-time waveform and a discrete-time waveform or sequence is made through the symbolism x(t) for continuous-time and x[n] for discrete- time. Plots of x[n], however, appear continuous in time, the points having been con- nected by straight lines for easier viewing. All vectors and matrices are boldface with all vectors being column vectors. All other symbolism is defined within the context of the discussion. References Box, G.E.P., G.M. Jenkins, Time Series Analysis: Forecasting and Contro~ Holden-Day, San Francisco, 1970. Burdic, W.S., Underwater Acoustic System Analysis, Prentice-Hall, Englewood Cliffs, N.J., 1984. Cox, D.R., D.V. Hinkley, Theoretical Statistics, Chapman and Hall, New York, 1974. Dabbous, T.E., N.U. Ahmed. J.C. McMillan, D.F. Liang, "Filtering of Discontinuous Processes Arising in Marine Integrated Navigation," IEEE Trans. Aerosp. Electron. Syst., Vol. 24, pp. 85-100, 1988. Gauss, K.G., Theory of Motion of Heavenly Bodies, Dover, New York, 1963. Holm, S., J.M. Hovem, "Estimation of Scalar Ocean Wave Spectra by the Maximum Entropy Method," IEEE J. Ocean Eng., Vol. 4, pp. 76-83, 1979. Huber, P.J., Robust Statistics, J. Wiley, ~ew York, 1981. Jain, A.K., Fundamentals of Digital Image ProceSSing, Prentice-Hall, Englewood Cliffs, N.J., 1989. Justice, J.H.. "Array Processing in Exploration Seismology," in Array Signal Processing, S. Haykin, ed., Prentice-HaU, Englewood Cliffs, N.J., 1985. Kendall, Sir M., A. Stuart, The Advanced Theory of Statistics, Vols. 1-3, Macmillan, New York, 1976--1979. Knight, W.S., RG. Pridham, S.M. Kay, "Digital Signal Processing for Sonar," Proc. IEEE, Vol. 69, pp. 1451-1506. Nov. 1981. Proakis, J.G., Digital Communications, McGraw-Hill, New York, 1983. Rabiner, L.R., RW. Schafer, Digital Processing of Speech Signals, Prentice-Hall, Englewood Cliffs, N.J., 1978. Rao, C.R, Linear Statistical Inference and Its Applications, J. Wiley, New York, 1973. Schuster, !," "On the Investigation of Hidden Periodicities with Application to a Supposed 26 Day .PerIod of Meterological Phenomena," Terrestrial Magnetism, Vol. 3, pp. 13-41, March 1898. Skolmk, M.L, Introduction to Radar Systems, McGraw-Hill, ~ew York, 1980. Taylor, S., Modeling Financial Time Series, J. Wiley, New York, 1986. Widrow, B., Stearns, S.D., Adaptive Signal Processing, Prentice-Hall, Englewood Cliffs, N.J., 1985. Zacks, S., Parametric Statistical Inference, Pergamon, New York, 1981.
  • 12. 14 Problems CHAPTER 1. INTRODUCTION 1. In a radar system an estimator of round trip delay To has the PDF To ~ N(To, (J~a)" where 7< is the true value. If the range is to be estimated, propose an estimator R and find its PDF. Next determine the standard deviation (J-ra so that 99% of th~ time the range estimate will be within 100 m of the true value. Use c = 3 x 10 mls for the speed of electromagnetic propagation. 2. An unknown parameter fJ influences the outcome of an experiment which is mod- eled by the random variable x. The PDF of x is p(x; fJ) = vkexp [-~(X -fJ?) . A series of experiments is performed, and x is found to always be in the interval [97, 103]. As a result, the investigator concludes that fJ must have been 100. Is this assertion correct? 3. Let x = fJ +w, where w is a random variable with PDF Pw(w)..IfbfJ is (a dfJet)er~in~ istic parameter, find the PDF of x in terms of pw and denote It y P x; ... ex assume that fJ is a random variable independent of wand find the condltlO?al PDF p(xlfJ). Finally, do not assume that eand ware independent and determme p(xlfJ). What can you say about p(x; fJ) versus p(xlfJ)? 4. It is desired to estimate the value of a DC level A in WGN or x[n] = A +w[n] n = 0,1, ... , N - C where w[n] is zero mean and uncorrelated, and each sample has variance (J2 = l. Consider the two estimators 1 N-I N 2:: x[n] n=O A = A _1_ (2X[0] +~ x[n] +2x[N - 1]) . N + 2 n=1 Which one is better? Does it depend on the value of A? 5. For the same data set as in Problem 1.4 the following estimator is proposed: { x [0] A= ~'~x[n] A2 = A2 < 1000. .,.2 - The rationale for this estimator is that for a high enough signal-to-noise ratio (SNR) or A2/(J2, we do not need to reduce the effect of.noise by averaging and hence can avoid the added computation. Comment on thiS approach. Chapter 2 Minimum Variance Unbiased Estimation 2.1 Introduction In this chapter we will be in our search for good estimators of unknown deterministic parame ers. e will restrict our attention to estimators which on the average yield the true parameter value. Then, within this class of estimators the goal will be to find the one that exhibits the least variability. Hopefully, the estimator thus obtained will produce values close to the true value most of the time. The notion of a minimum variance unbiased estimator is examined within this chapter, but the means to find it will require some more theory. Succeeding chapters will provide that theory as well as apply it to many of the typical problems encountered in signal processing. 2.2 Summary An unbiased estimator is defined by (2.1), with the important proviso that this holds for all possible values of the unknown parameter. Within this class of estimators the one with the minimum variance is sought. The unbiased constraint is shown by example to be desirable from a practical viewpoint since the more natural error criterion, the minimum mean square error, defined in (2.5), generally leads to unrealizable estimators. Minimum variance unbiased estimators do not, in eneral, exist. When they do, several methods can be used to find them. The methods reI on the Cramer-Rao ower oun and the concept of a sufficient statistic. If a minimum variance unbiase estimator does not exist or if both of the previous two approaches fail, a further constraint on the estimator, to being linear in the data, leads to an easily implemented, but suboptimal, estimato!,; 15
  • 13. 16 CHAPTER 2. MINIMUM VARIANCE UNBIASED ESTIMATION 2.3 Unbiased Estimators For an estimator to be unbiased we mean that on the average the estimator will yield the true value of the unknown parameter. Since the parameter value may in general be anywhere in the interval a < 8 < b, unbiasedness asserts that no matter what the true value of 8, our estimator will yield it on the average. Mathematically, an estimator i~ ~~~il • E(iJ) = 8 (2.1) where (a,b) denotes the range of possible values of 8. Example 2.1 - Unbiased Estimator for DC Level in White Gaussian Noise Consider the observatioJ!s x[n) = A +w[n) n = 0, 1, ... ,N - 1 where A is the parameter to be estimated and w[n] is WGN. The parameter A can take on any value in the interval -00 < A < 00. Then, a reasonable estimator for the average value of x[n] is or the sample mean. Due to the linearity properties of the expectation operator [ 1 N-1 ] E(A.) = E N ~ x[n) 1 N-1 N L E(x[nJ) n=O N-1 ~LA n=O = A for all A. The sample mean estimator is therefore unbiased. (2.2) <> In this example A can take on any value, although in general the values of an unknown parameter may be restricted by physical considerations. Estimating the resistance R of an unknown resistor, for example, would necessitate an interval 0 < R < 00. Unbiased estimators tend to have symmetric PDFs centered about the true value of 8, although this is not necessary (see Problem 2.5). For Example 2.1 the PDF is shown in Figure 2.1 and is easily shown to be N(A, (72/N) (see Problem 2.3). The restriction that E(iJ) =8 for all 8 is an important one. Lettin iJ = x = [x 0 x , it asserts that E(iJ) = Jg(x)p(x; 8) dx = 8 for all 8. (2.3) 2.3. UNBIASED ESTIMATORS 17 A Figure 2.1 Probability density function for sample mean estimator It is possible, however, that (2.3) may hold for some values of 8 and not others as the next example illustrates. ' Example 2.2 - Biased Estimator for DC Level in White Noise Consider again Example 2.1 but with the modified sample mean estimator Then, _ 1 N-1 A= 2N Lx[n]. E(A) n=O ~A 2 A if A=O # A if A # o. It is seen that (2.3) holds for the modified estimator only for A = o. Clearly, A is a biased estimator. <> That an estimator is unbiased does not necessarily mean that it is a good estimator. It only guarantees that on the average it will attain the true value. On the other hand biased estimators are ones that are characterized by a systematic error, which presum~ ably should not be present. A persistent bias will always result in a poor estimator. As an example, the unbiased property has an important implication when several es- timators are combined (see Problem 2.4). ~t s?metimes occurs that multiple estimates ~th~ same paran:eter ar.e available, i.e., {81, 82 , •.. , 8n }. A reasonable procedure is to combme these estimates mto, hopefully, a better one by averaging them to form . 1 ~. 8=- ~8i. n i=l (2.4) Assuming the estimators are unbiased, with the same variance, and uncorrelated with each other, E(iJ) = 8
  • 14. X[~] ~ N (f)/ 1 ) X [1] -'?- f'I C~, 1) , &;; 0 I ') N (f), 2) ) 9-.( 0 ( i1" ~ [X[,1+ x[l] ) '2 O 2 > !.. ('2. Xfr,J + X[11) q, ry 1" tt;:; [~1 ~2 ~ -- &p 1 ~= [9~~~.~Sp]" I -----.• _-_........_. .. f. (~);: @'1' Qi 1. ~. <(. J L i £. p -=> ' 1 '9t I - t GJ-~')~D-.-----:-===:====---:::::--~-...------:;-----====-- '- tlt¥;0: 0'"" J:, ~ e, l : r I (!. ~~,_ f J.!.* ~il ;! ,1 -I '" '>1 ( 1 'v H 1 '" ,". [J !:' tJ )"7. "'A J IJ ~~ Ju Z. ~ k<(-' :r&; - ~ ?: t .1 1; ~ 9. .f C J~ L~( I N ,=, tV '>1 1~' )/'J 1:" 1 ~@( ' ")2 .. Q. ~ ~ '" ~ ("'--1 .J -1 1 J 1~ ~ ~I- " 1: -=., i ~i - -::; E Z&i . t :c9i i- -; to ?:Sr: . ~ '<I N '''-' ,tl IJ ,., ~ .(}oJ A)1 ~ (t ) IV A ~Z r ~ 1fI A ~ ). JJ ; ;, -;- t "[: @,: - - E. L 1)1 = - t (L" e i _ E [~ IV ,. I /VZ ,-., N' ;., I ,<I ~ ~~9~~: _t f-(e~-E~e~n ;t(l;~e/~_E1S i ~2) q ~ I ,., .... I .. -.~~ 1 .=: I / ~ ') i; ~~6/(..(li~/~ E ~t~~i~ f~~J'1?~it~914 tl"~tt~&.~ i~-ks--J> 1~0; ~~C ~&j ~"t, { G1 ~_
  • 15. 18 p(8) CHAPTER 2. MINIMUM VARIANCE UNBIASED ESTIMATION n increases ~ __+-____~__~__~~____ O IJ p(O) (a) Unbiased estimator n increases .. __~~--~--~~-------O __+-_'-...,.---'..,.....--------- 0 E(9) IJ (b) Biased estimator Figure 2.2 Effect of combining estimators and = var(8d n so that as more estimates are averaged, the variance will decrease. Ultimately, as n -+ 0Cl, {} -+ 8. However, if the estimators are biased or E(8i ) = 8 + b(8), then 8 + b(8) and no matter how many estimators are avera ed 8 will not conver e to the true value. This is depicted in igure 2.2. Note that, in general, b(8) = E(8) - 8 is defined as the bias of the estimator. 2.4. MINIMUM VARIANCE CRITERION 19 2.4 Minimum Variance Criterion In searching for optimal estimators we need to ado t some 0 natura one is the mean square error (MSE), defined as (2.5) This measures the avera e mean s uared deviation of the estimator from the t e value. Unfortunate y, adoption of this natural criterion leads to unrealizable estimators ones that cannot be written solely as a function of the data. To understand the pr~blem which arises we first rewrite the MSE as mse(8) = E{[(8-E(O))+(E(O)-8)f} var(O) + [E(8) - 8f var(8) +b2 (8) (2.6) which shows that the MSE is composed of errors due to the variance of the estimator as well as the bias. As an example, for the problem in Example 2.1 consider the modified estimator . 1 N-l A=a N Lx[nj n=O for some c:onstant a. We wi~l attempt to find the a which results in the minimum MSE. Since E(A) = aA and var(A) = a2(j2/N, we have, from (2.6), Differentiating the MSE with respect to a yields dmse(A) 2a(j2 2 da = N + 2(a - l)A which upon setting to zero and solving yields the optimum value A2 a - ------- opt - A2 +(j2/N' It is seen that, unfortunately, the optimal value of a depends upon the unknown param- eter A. ~he estimator is therefore not realizable. In retrospect the estimator depends up~n A smce the bias ter~ in ~2.6) is a function of A. It would seem that any criterion which depends on the bias Will lead to an unrealizable estimator. Although this is gene~~lly true, on occasion realizable minimum MSE estimators can be found [Bibby and Ioutenburg 1977, Rao 1973, Stoica and Moses 1990j.
  • 16. 20 CHAPTER 2. MINIMUM VARIANCE UNBIASED ESTIMATION var(8) _ iii -+----- -4-------- 8 2 -1_____--- 83 = MVU estimator --~------------------- 9 (a) var(8) _+-__----- 81 - - 82 03 NoMVU estimator --+-----~~----------- 9 90 (b) Figure 2.3 Possible dependence of estimator variance with (J From a practical view oint the minimum MSE estimator needs to be abandoned. An alternative approach is to constrain t e bias to be zero and find the estimator which minimizes the variance. Such an estimator is termed the minimum variance unbiased (MVU) estimator. Note that from (2.6) that the MSE of an unbiased estimator is just the variance. Minimizing the variance of an unbiased estimator also has the effect of concentrating the PDF of the estimation error, 0- B, about zero (see Problem 2.7). The estimatiw error Will therefore be less likely to be large. 2.5 Existence of the Minimum Variance Unbiased Estimator The uestion arises as to whether a MVU estimator exists Le., an unbiased estimator wit minimum variance for all B. Two possible situations are describe in Figure .. If there are three unbiased estimators that exist and whose variances are shown in Figure 2.3a, then clearly 03 is the MVU estimator. If the situation in Figure 2.3b exists, however, then there is no MVU estimator since for B < Bo, O 2 is better, while for iJ > Bo, 03 is better. In the former case 03 is sometimes referred to as the uniformly minimum variance unbiased estimator to emphasize that the variance is smallest for all B. In general, the MVU estimator does not always exist, as the following example illustrates. . Example 2.3 - Counterexample to Existence of MVU Estimator Ifthe form ofthe PDF changes with B, then it would be expected that the best estimator would also change with B. Assume that we have two independent observations x[Q] and x[l] with PDF x [0] x[l] N(B,l) { N(B,l) N(B,2) if B~ Q if B< O. ••7F."'F'~· __ ·_ _ ' ' ' ' '. . . . ._ _ _ _- _ _ :"'$ FINDING THE MVU ESTIMATOR 2.6. var(ii) _------"127 / 36 •••••••••.••.•.••••·2·ciiji :~!~?.................. ii2 21 18/36 01 __---------t--------------- 9 Figure 2.4 Illustration of nonex- istence of minimum variance unbi- ased estimator The two estimators 1 - (x[Q] + x[l]) 2 2 1 -x[Q] + -x[l] 3 3 can easily be shown to be unbiased. To compute the variances we have that so that and 1 - (var(x[O]) +var(x[l])) 4 4 1 -var(x[O]) + -var(x[l]) 9 9 ¥s if B< O. The variances are shown in Figure 2.4. Clearly, between these two esti~~tors no M:'U estimator exists. It is shown in Problem 3.6 that for B ~ 0 the mInimum possible variance of an unbiased estimator is 18/36, while that for B < 0 is 24/36. Hence, no single estimator can have a variance uniformly less than or equal to the minima shown in Figure 2.4. 0 To conclude our discussion of existence we should note that it is also possible that there may not exist even a single unbiased estima.!2E (see Problem 2.11). In this case any search for a MVU estimator is fruitless. 2.6 Finding the Minimum Variance Unbiased Estimator Even if a MVU estimator exists, we may not be able to find it. urn-t e-crank" procedure which will always produce the estimator. chapters we shall discuss several possible approaches. They are: is no known In the next few
  • 17. 22 CHAPTER 2. MINIMUM VARIANCE UNBIASED ESTIMATION var(O) ............ •••••••••••••••····CRLB Figure 2.5 Cramer-Rao ----------------r-------------------- 9 lower bound on variance of unbiased estimator 1. Determine the Cramer-Rao lower bound CRLB and check to see if some estimator satisfies it Chapters 3 and 4). 2. Apply the Rao-Blackwell-Lehmann-Scheffe (RBLS) theorem (Chapter 5). 3. Further restrict the class of estimators to be not only unbiased but also linear. Then, find the minimum variance estimator within this restricted class (Chapte~ 6). Approaches 1 and 2 may produce the MVU estimator, while 3 will yield it only if the MVU estimator is linear III the data. The CRLB allows us to determine that for any unbiased estimator the variance must be greater than or equal to a given value, as shown in Figure 2.5. If an estimator exists whose variance equals the CRLB for each value of (), then it must be the MVU estimator. In this case, the theory of the CRLB immediately yields the estimator. It may happen that no estimator exists whose variance equals the bound. Yet, a MV~ estimator may still exist, as for instance in the case of ()! in Figure 2.5. Then, we must resort to the Rao-Blackwell-Lehmann-Scheffe theorem. Thts procedure first find; a su czen s atistic, one whic uses a the data efficient! and then nds a unction of the su dent statistic which is an unbiased estimator oL(}. With a slight restriction of the PDF of the data this procedure will then be guaranteed to produce the MVU estimator. The third approach requires the estimator to be linear, a sometimes severe restriction, and chooses the best linear estimator. Of course, only for particular data sets can this approach produce the MVU estimator. . 2.7 Extension to a Vector Parameter If 8 = [(}l (}2 •.. (}"jT is a vector of unknown parameter~, then we say that an estimator , A A 'T • 8 = [(}1 (}2 •.• (},,] is unbiase~j,f ai < (}i < b; (2.7) for i = 1,2, ... ,p. By defining REFERENCES 23 we can equivalently define an unbiased estimator to have the property E(8) = 8 for every 8 contained wjthjn the space defined in (2.7). A MVU estimator has the ~ditional property that var(Bi) for i = 1,2, ... ,p is minimum among all unbiased estimators. References Bibbv, J .. H. Toutenburg, Prediction and Improved Estimation in Linear Models, J. Wiley, New . York, 1977. Rao, C.R., Linear Statistical Inference and Its Applications, J. Wiley, New York, 1973. Stoica, P., R. Moses, "On Biased Estimators and the Unbiased Cramer-Rao Lower Bound," Signal Process., Vol. 21, pp. 349-350, 1990. Problems 2.1 The data {x[O], x[I], ... ,x[N - I]} are observed where the x[n]'s are independent and identically distributed (lID) as N(0,a2 ). We wish to estimate the variance a2 as Is this an unbiased estimator? Find the variance of ;2 and examine what happens as N -t 00. 2.2 Consider the data {x[O],x[I], ... ,x[N -l]}, where each sample is distributed as U[O, ()] and the samples are lID. Can you find an unbiased estimator for ()? The range of () is 0 < () < 00. 2.3 Prove that the PDF of Agiven in Example 2.1 is N(A, a2 IN). 2.4 The heart rate h of a patient is automatically recorded by a computer every 100 ms. In 1 s the measurements {hI, h2 , ••• , hlO } are averaged to obtain h. If E(h;) = ah for some constant a and var(hi) = 1 for each i, determine whether averaging improves the estimator if a = 1 and a = 1/2. Assume each measurement is uncorrelated. 2.5 Two samples {x[0], x[l]} are independently observed from a N(O, a2 ) distribution. The estimator A 1 a2 = 2'(x2 [0] + x2 [1]) is unbiased. Find the PDF of ;2 to determine if it is symmetric about a2 •
  • 18. 24 CHAPTER 2. MINIMUM VARIANCE UNBIASED ESTIMATION 2.6 For the problem described in Example 2.1 the more general estimator .v-I A = L anx[n] n=O is proposed. Find the an's so that the estimator is unbiased and the variance is minimized. Hint: Use Lagrangian mUltipliers with unbiasedness as the constraint equation. 2.7 Two unbiased estimators are proposed whose variances satisfy var(O) < var(B). If both estimators are Gaussian, prove that for any f: > O. This says that the estimator with less variance is to be preferred since its PDF is more concentrated about the true value. 2.8 For the problem described in Example 2.1 show that as N -t 00, A-t A by using the results of Problem 2.3. To do so prove that lim Pr {IA - AI> f:} = 0 N-+oo for any f: > O. In this case the estimator A is said to be consistent. Investigate what happens if the alternative estimator A = 2~ L::OI x[n] is used instead. 2.9 This problem illustrates what happens to an unbiased est!1nator when it undergoes a nonlinear transformation. In Example 2.1, if we choose to estimate the unknown parameter () = A2 by 0= (~ ~Ix[n]r, can we say that the estimator is unbiased? What happens as N -t oo? 2.10 In Example 2.1 assume now that in addition to A, the value of 172 is also unknown. We wish to estimate the vector parameter Is the estimator , N Lx[n] A n=O [ 1 N-I ] [,;,1~ N ~1 ~(x[n] - A)' unbiased? PROBLEMS 25 . I bservation x[O] from the distribution Ufo, 1/(}], it is desired to 2.11 Given a sm g r e. 0 d that () > O. Show that for an estimator 0= g(x[O]) to estimate (). t IS assume be unbiased we must have 1#g(u)du = l. . that a function 9 cannot be found to satisfy this condition for all () > O. Next prove
  • 19. Chapter 3 Cramer-Rao Lower Bound 3.1 Introduction Being able to place a lower bound on the variance of any unbiased estimator proves to be extremely useful in practice. At best, it allows us to assert that an estimator is the MVU estimator. This will be the case if the estimator attains the bound for all values of the unknown parameter. At worst, it provides a benchmark against which we can compare the performance of any unbiased estimator. Furthermore, it alerts us to the physical impossibility of finding an unbiased estimator whose variance is less than the bound. The latter is often useful in signal processing feasibility studies. Although many such variance bounds exist [McAulay and Hofstetter 1971, Kendall and Stuart 1979, Seidman 1970, Ziv and Zakai 1969], the Cramer-Rao lower bound (CRLB) is by far the easiest to determine. Also, the theory allows us to immediately determine if an estimator exists that attains the bound. If no such estimator exists, then all is not lost since estimators can be found that attain the bound in an approximate sense, as described in Chapter 7. For these reasons we restrict our discussion to the CRLB. 3.2 Summary The CRLB for a scalar parameter is given by (3.6). If the condition (3.7) is satisfied, then the bound will be attained and the estimator that attains it is readily found. An alternative means of determining the CRLB is given by (3.12). For a signal with an unknown parameter in WGN, (3.14) provides a convenient means to evaluate the bound. When a function of a parameter is to be estimated, the CRLB is given by (3.16). Even though an efficient estimator may exist for (), in general there will not be one for a function of () (unless the function is linear). For a vector parameter the CRLB is determined using (3.20) and (3.21). As in the scalar parameter case, if condition (3.25) holds, then the bound is attained and the estimator that attains the bound is easily found. For a function of a vector parameter (3.30) provides the bound. A general formula for the Fisher information matrix (used to determine the vector CRLB) for a multivariate Gaussian PDF is given by (3.31). Finally, if the data set comes from a 27
  • 20. 28 CHAPTER 3. CRAMER-RAO LOWER BOUND PI (x[0] =3; A) p2(X[0] = 3; A) __r--r__~-r--~-r--r-------- A 2 3 4 5 6 2 3 4 5 6 (a) <11 = 1/3 (b) <12 = 1 Figure 3.1 PDF dependence on unknown parameter WSS Gaussian random process, then an approximate CRLB, that depends on the PSD, is given by (3.34). It is valid asymptotically or as the data record length becomes large. 3.3 Estimator Accuracy Considerations Before stating the CRLB theorem, it is worthwhile to expose the hidden factors that determine how well we can estimate a parameter. Since all our information is embodied in the observed data and the underlying PDF for that data it is not sur risin that the estimation accuracy depen s uect Yon the PDF. For instance, we should not expect to be able to estimate a parameter with any degree of accuracy if the PDF depends only weakly upon that parameter, or in the extreme case, i!.the PDF does not depend on it at all. In general, the more the PDF is influenced by the unknown parameter, the better we shou e a e to estimate it. Example 3.1 _ PDF Dependence on Unknown Parameter If a single sample is observed as x[O] = A + w[O] where w[O] '" N(O, 0-2 ), and it is desired to estimate A, then we expect a better estimate if 0-2 is small. Indeed, a good unbiased estimator is A. = x[O]. The variance is, of course, just 0-2, so that the estimator accuracy improves as 0- 2 decreases. An alternative way of viewing this is shown in Figure 3.1, where the PDFs for two different variances are shown. They are Pi(X[O]; A) =.)21 2 exp r- 21 2 (x[O] - A)21 21rO-i l o-i for i = 1,2. The PDF has been plotted versus the unknown parameter A for a given value of x[O]. If o-~ < o-~, then we should be able to estimate A more accurately based on PI (x[O]; A). We may interpret this result by referring to Figure 3.1. If x[O] = 3 and al = 1/3, then as shown in Figure 3.1a, the values of A> 4 are highly unlikely. To see 3.3. ESTIMATOR ACCURACY CONSIDERATIONS 29 this we determine the probability of observing x[O] in the interval [ [ ]- [3 _ J/2, 3 +J/2] when A takes on a given value or x 0 J/2, x[0]+J/2] = { J J} r3+~ Pr 3 - 2" :::; x[O] :::; 3 + 2" = J3-~ Pi(U; A) du 2 which for J small is Pi (x[O] = 3; A)J. But PI (x[O] = 3' A = 4)J - . 3; A = 3)J = 1.20J. The probability of observing x[O] 'in a l~ O.OlJ, while PI (x [0] = x[O] = 3 when A = 4 is small with respect to that h S;=- ~nterval centered about A > 4 can be eliminated from consideration. It mthte~ - . Hence, the values ?f the interval 3 ± 3a l = [2,4] are viable candidates ~or t~ea~~ed. tha~ values of A III is a much weaker dependence on A. Here our . b'l d'd Fill. Figure 3.1b there interval 3 ± 3a2 = [0,6]. via e can I ates are III the much wider o When the PDF is viewed as a function of th k it is termed the likelihood function. Two exam l:su~f ~~w~ paramete: (with x fixed), in Figure 3.1. Intuitively, the "sharpness" of &e likelih~h~o~d fu.nctlOns we:e shown accurately we can estimate the unknown t T 0 unctIOn determllles how ha h parame er 0 quantify thi f b t t t e sharpness is effectively measured b th . . s no Ion 0 serve the logarithm of the likelihood function at i~s :a~eg;t~~e .of the second derivative of likelihood function. In Example 3 1 if .P d ·h IS IS the curvature of the log- ., we consI er t e natural logarithm of the PDF Inp(x[O]; A) = -In v'21ra2 - _l_(x[O]_ A)2 2a2 then the first derivative is 81np(x[0]; A) 1 8A = 0-2 (x[O] - A) (3.2) and the negative of the second derivative becomes _ 82 1np(x[0];A) 1 8A2 -a2 ' (3.3) '1;.'he curvature increases as a2 decreases Since A =x[O] has variance a2 then cor thO . 1 we already know that the estimator , l' IS examp e var(A.) = 1 82lnp(x[0]; A) (3.4) 8A2 and the variance decreases as the curvat . second derivative does not depend on x[O u~~ Illcrease~. ~lthough in this example the ~easure of curvature is ], general It Will. Thus, a more appropriate _ E [82 1np(x[O]; A)] 8A2 (3.5)
  • 21. 30 CHAPTER 3. CRAMER-RAO LOWER BOUND which measures the avemge curvature of the log-likelihood function. The expectation is taken with respect to p(x[O]; A), resultin in a function of A onl . The expectation ac nowe ges the fact that t e i elihood function, which depends on x[O], is itself a random variable. The larger the quantity in (3.5), the smaller the variance of the estimator. 3.4 Cramer-Rao Lower Bound We are now ready to state the CRLB theorem. Theorem 3.1 (Cramer-Rao Lower Bound - Scalar Parameter) It is assumed that the PDF p(x; 9) satisfies the "regularity" condition E[81n~~X;9)] =0 for all 9 where the expectation is taken with respect to p(X; 9). Then, the variance of any unbiased estimator {) must satisfy var(8) > _~,.....1---:-~-=­ -_ [82 Inp(X;9)] E 892 (3.6) where the derivative is evaluated at the true value of 9 and the expectation is taken with respect to p(X; 9). Furthermore, an unbiased estimator may be found that attains the bound for all 9 if and only if • 8Inp(x;6} =I(9)(g(x) _ 9} 89 (3.7) for some functions 9 and I. That estimator which is the MVU estimator is {) = x), and the minimum variance is 1 1(9). The expectation in (3.6) is explicitly given by E [8 2 1n p(X; 9}] = J8 2 1np(x; 9) ( . 9) d 892 892 P X, X since the second derivative is a random variable dependent on x. Also, the bound will depend on 9 in general, so that it is displayed as in Figure 2.5 (dashed curve). An example of a PDF that does not satisfy the regularity condition is given in Problem 3.1. For a proof of the theorem see Appendix 3A. Some examples are now given to illustrate the evaluation of the CRLB. Example 3.2 - CRLB for Example 3.1 For Example 3.1 we see that from (3.3) and (3.6) for all A. 3.4. CRAMER-RAO LOWER BOUND 31 Thus, no unbiased estimator can exist wpose variance is lower than a2 for even a single value of A. But III fa~t we know tha~ if A - x[O], then var(A} = a2 for all A. Since x[O] is unbiasea and attaIns the CRLB, It must therefore be the MVU estimator. Had we been unable to guess that x[O] would be a good estimator, we could have used (3.7). From (3.2) and (3.7) we make the identification 9 I(9} g(x[O]) A 1 a2 = x[O) so that (3.7) is satisfied. Hence, A= g(x[O]) = x[O] is the MVU estimator. Also, note that var(A) = a2 = 1/1(9), so that according to (3.6) we must have We will return to this after the next example. See also Problem 3.2 for a generalization to the non-Gaussian case. <> Example 3.3 - DC Level in White Gaussian Noise Generalizing Example 3.1, consider the multiple observations x[n) = A +w[n) n = 0, 1, ... ,N - 1 where w[n] is WGN with variance a2 . To determine the CRLB for A p(x; A) Taking the first derivative 8lnp(x;A) 8A N-l 1 [1 ] 11V2rra2 exp - 2a2 (x[n]- A? 1 [1 N-l ] (2rra2)~ exp - 2a2 ~ (x[n]- A)2 . 8 [ 1 N-l ] - -In[(2rra2)~]- - "(x[n]- A)2 8A 2a2 L.. n=O 1 N-l 2' L(x[n]- A) a n=O N = -(x-A) a2 (3.8)
  • 22. 32 CHAPTER 3. CRAMER-RAO LOWER BOUND where x is the sample mean. Differentiating again 82 Inp(x;A) N 8A2 =- q2 and noting that the second derivative is a constant, ~ from (3.6) (3.9) as the CRLB. Also, by comparing (3.7) and (3.8) we see that the sample mean estimator attains the bound and must therefore be the MVU estimator. Also, once again the minimum variance is given by the reciprocal of the constant N/q2 in (3.8). (See also Problems 3.3-3.5 for variations on this example.) <> We now prove that when the CRLB is attained where . 1 var(8) = /(8) From (3.6) and (3.7) and var(9) __-..".-."...,--1-,..---:-:-:- -_ [82Inp(X; 0)] E 802 8Inp(x; 0) = /(0)(9 _ 0). 88 Differentiating the latter produces 8 2 Inp(x; 0) = 8/(0) ({) _ 0) _ /(0) 802 80 and taking the negative expected value yields and therefore -E [82 In p(X; 0)] 802 - 8/(0) [E(9) - 0] + /(0) 80 /(0) A 1 var(O) = /(0)' In the next example we will see that the CRLB is not always satisfied. (3.10) 3.4. CRAMER-RAO LOWER BOUND Example 3.4 - Phase Estimation Assume that we wish to estimate the phase ¢ of a sinusoid embedded in WGN or x[n] =Acos(21lJon +¢) + wIn] n = 0, 1, ... , N - 1. 33 The ampiitude A and fre uenc 0 are assumed known (see Example 3.14 for the case when t ey are unknown). The PDF is 1 {I N-l } p(x; ¢) = Ii. exp --2 2 E [x[n]- Acos(21lJon +4»f . (27rq2) 2 q n=O Differentiating the log-likelihood function produces and - 8Inp(x; ¢) 84> 1 .'-1 -2 E [x[n]- Acos(27rfon + cP)]Asin(27rfon +¢) q n=O A N-l A - q2 E [x[n]sin(27rfon +4» - "2 sin(47rfon + 24»] n=O 821 (¢) A N-l n;2 X ; = -2 E [x[n] cos(27rfon +¢) - Acos(47rfon + 2¢)]. ¢ q n=O Upon taking the negative expected value we have A N-l 2 E [Acos2 (27rfon + ¢) - A cos(47rfon + 2¢)] q n=O A2N-l[11 ] 2" E -+ - cos(47rfon + 2¢) - cos(47rfon +2¢) q n=O 2 2 NA2 2q2 since 1 N-l N E cos(47rfon + 2¢) ~ 0 n=O for 10 not near 0 or 1/2 (see Problem 3.7). Therefore, In this example the condition for the bound to hold is not satisfied. Hence, a phase estimator does not eXIst whIch IS unbiased and attains the CRLB. It is still possible, however, that an MVU estimator may exist. At this point we do not know how to
  • 23. 34 ................ _........... 93 ................ •• 91 and CRLB -------4--------------------- 0 (a) (h efficient and MVU CHAPTER 3. CRAMER-RAO LOWER BOUND var(9) .............................. -------+-------------------- e (b) 81 MVU but not efficient Figure 3.2 Efficiency VB. minimum variance determine whether an MVU estimator exists, and if it does, how to find it. The theory of sufficient statistics presented in Chapter 5 will allow us to answer these questions. o An estimator which is unbiased and attains the CRLB, as the sample mean estimator in Example 3.3 does, IS said to be efficient in that it efficiently uses the data. An MVU estimator rna or may not be efficient. For instance, in Figure 3.2 the variances of all possible estimators or purposes of illustration there are three unbiased estimators) are displayed. In Figure 3.2a, 81 is efficient in that it attains the CRLB. Therefore, it is also the MVU estimator. On the other hand, in Figure 3.2b, 81 does not attain the CRLB, and hence it is not efficient. However, since its varianoe is uniformly less than that of all other unbiased estimators, it is the MVU estimator.- The CRLB given by (3.6) may also be expressed in a slightly different form. Al- though (3.6) is usually more convenient for evaluation, the alternative form is sometimes useful for theoretical work. It follows from the identity (see Appendix 3A) (3.11) so that (3.12) (see Problem 3.8). The denominator in (3.6) is referred to as the Fisher information /(8) for the data x or (3.13) As we saw previously, when the CRLB is attained, the variance is the reciprocal of the Fisher information. Int"iirtrvely, the more information, the lower the bound. It has the essentiaI properties of an information measure in that it is 3.5. GENERAL CRLB FOR SIGNALS IN WGN 35 1. nonnegative due to (3.11) 2. additive for independent observations. The latter property leads to the result that the CRLB for N lID observations is 1 N times t a, or one 0 servation. To verify this, note that for independent observations N-I lnp(x; 8) = L lnp(x[n]; 8). n==O This results in -E [fJ2 lnp(x; 8)] = _ .~I E [fJ2ln p(x[n];8)] [)82 L., [)82 n=O and finally for identically distributed observations /(8) = Ni(8) where i(8) = -E [[)2In~~[n];8)] is the Fisher information for one sam Ie. For nonindependent samples we might expect !!J.at the in ormation will be less than Ni(8), as Problem 3.9 illustrates. For completely dependent samples, as for example, x[O] = x[l] = ... = x[N-1], we will have /(8) = i(8) (see also Problem 3.9). Therefore, additional observations carry no information, and the CRLB will not decrease with increasing data record length. 3.5 General CRLB for Signals in White Gaussian Noise Since it is common to assume white Gaussian noise, it is worthwhile to derive the CRLB for this case. Later, we will extend this to nonwhite Gaussian noise and a vector parameter as given by (3.31). Assume that a deterministic signal with an unknown p'arameter 8 is observed in WGN as x[n] = s[nj 8] +w[n] n = 0, 1, ... ,N - 1. The dependence of the signal on 8 is explicitly noted. The likelihood function is p(x; 8) = N exp - - L (x[n] - s[nj 8])2 . 1 {I N-I } (211"172).. 2172 n=O Differentiating once produces [)lnp(xj8) = ~ ~I( [ ]_ [ . ll]) [)s[nj 8] [)8 172 L., X n s n, u [)8 n=O
  • 24. 36 CHAPTER 3. CRAMER-RAO LOWER BOUND and a second differentiation results in 02lnp(x;(J) =2- ~l{( []_ [.(J])8 2 s[n;(J]_ (8S[n;(J])2}. 8(J2 (12 L...J x n s n, 8(J2 8(J n=O Taking the expected value yields E (82 Inp(x; (J)) = _2- ~ (8s[n; 0]) 2 802 (12 L...J 80 n=O so that finally (3.14) The form of the bound demonstrates the importance of the si nal de endence on O. Signals that c ange rapidly as t e un nown parameter changes result in accurate esti- mators. A simple application of (3.14) to Example 3.3, in which s[n; 0] = 0, produces a CRLB of (12/N. The reader should also verify the results of Example 3.4. As a final example we examine the problem of frequency estimation. Example 3.5 - Sinusoidal Frequency Estimation We assume that the signal is sinusoidal and is represented as s[n; fo] =A cos(271Ion + cP) 1 0< fo < 2 where-the amplitude and phase are known (see Example 3.14 for the case when they are unknown). From (3.14) the CRLB becomes (12 var(jo) ~ -N---l------- (3.15) A2 L [21Tnsin(21Tfon + cP)]2 n=O The CRLB is plotted in Figure 3.3 Vf>TSUS frequency for an SNR of A2 /(l2 = 1, a data record length of N = 10, and a phase of cP = O. It is interesting to note that there appea.r to be preferred frequencies (see also Example 3.14) for an approximation to (3.15)). Also, as fo -+ 0, the CRLB goes to infinity. This is because for fo close to zero a slight change in frequency will not alter the signal significantly. 0 3.6. TRANSFORMATION OF PARAMETERS 5.0-:- -g 4.5-+ j 4.0~ ~ 3.5~ Ja 3.0~ ~ I ~ 2.5-+ ~ 2.0~ '" ... C) 1.5-+ I 1.o-l. 1 ~'>Lr----r---r---r-----t----''---r---r---r:>.L----t-- 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 Frequency Figure 3.3 Cramer-Rao lower bound for sinusoidal frequency estimation 3.6 Transformation of Parameters 37 It fre uentl occurs in practice that the parameter we wish to estimate is a function o some more fun amenta parameter. or mstance, in Example 3.3 we may not be interested in the sign of A but instead may wish to estimate A2 or the power of the signal. Knowing the CRLB for A, we can easily obtain it for A2 or in general for ~ function of A. As shown in Appendix 3A, if it is desired to estimate ex = g(0), then the CRLBi§ (3.16) For the present example this becomes ~ = g(A) = A2 and (A 2) > (2A)2 = 4A2(12 var - N/(l2 N' (3.17) Note that in using (3.16) the CRLB is expressed in terms of O. We saw in Example 3.3 that the sample mean estimator was efficient for A. It might be sUj)OSed that x2 is efficient for A2. To uickl dispel this notion we first show that is not even an unbiased estimator. Since x ""' (A, (I (12 = E2(:C) +var(x) = A2 + N 1= AZT' (3.18) Hence, we immediately conclude that the efficiency of an estimator is destroyed by a ftonlinear transformatioTil That it is maintained for linear (actually affine) transfor- mations is easily verified. Assume that an efficient estimator for 0 exists and is given
  • 25. 38 CHAPTER 3. CRAMER-RAO LOWER BOUND 'by O. It is desired to estimate g{O) = aO + b. As our estimator of g(O), we choose .#) = g(0) = aO + b. Then, E(a8 +b) = aE(8) +b= a8 +b> = g(8} , so that g(O) is unbiased. The CRLB for g(O), is from (3.16), ( 8g )2 ;:;;.j~ - I(O) var(g(8)) ( 8~~)) 2 var(0) a2 var(8)O But var{g(O)) = var(aO + b) = a2 var(O), so that the CRLB is achieved. Althou h efficienc is reserved onl over linear transformations, it is approximatel maintained over nonlinear transformations if the data record is large enoug. IS as great practical significance in that we are frequently interested in estimating functions of parameters. To see why this property holds, we return to the previous example of estimating A2 by x2. Although x2is biased, we note from (3.18) that x2is asymptotically unbiased or unbiased as N ~ 00. Furthermore, since x '" N(~0'2 IN), we can evaluate tIle variance var(x2 ) =E(x4 ) - E2(x2 }J by using the result that if ~ '" N(J,L, 0'2), then and therefore E(e) = p.2 +q2, J E(~4) = p.4 +6p.2q2 +3q~/ var(e) E(e) - E2 (e) 4J,L20'2 + 20'4. For our problem we have then - Hence, as N ~ 00, the variance approaches 4A2 0'2 IN, the last term in (3.19) converging to zero faster than the first, But this is just the CRLB as given by (3.17). Our assertion that x2is an asymptotically efficient estimator of A2 is verified. Intuitively, this situation occurs due to the statistical linearity of the transformation, as illustrated in Figure 3.4, As N increases, the PDF of xbecomes more concentrated about the mean A. Therefore, EXTENSION TO A VECTOR PARAMETER 3.7. 39 ~~----------~~--~------- x A-~ A -IN (a) Small N 30" A+- -IN A-~ A -IN (b) Large N Figure 3.4 Statistical linearity of nonlinear transformations 30" A+- -IN the values of x that are observed lie in a small interval about x = A (the ±3 standard deviation interval is displayed). Over this small interval the nonlinear transformation is approximately linear. Therefore, the transformation may be replaced by a linear one since a value of x in the nonlinear region rarely occurs. In fact, if we linearize g about A, we have the approximation dg(A) ,", g(x) ~ g(A) + d}i(x - A)." It follows that, to within this approximation, E[g(x)] = g(A) = A2 or the estimator is unbiased (asymptotically). Als<?!... var[g(x)] [ d9(A)]2 (-) dA var x (2A)20'2 N 4A20'2 N so that the estimator achieves the CRLB (asymptotically). Therefore, it is asymp- totically efficient. This result also yields insight into the form of the CRLB given by (3.16). 3.7 Extension to a Vector Parameter We now extend the results of the previous sections to the case where we wish to estimate a vector parameter (} = [01 02 " • 8p j1:: We will assume that the estimator (J is unbiased
  • 26. ·w CHAPTER 3. CRAMER-RAO LOWER BOUND :IS defined in Section 2.7. The vector parameter CRLB will allow us to place a bound "ll the variance of each element. As derived in Appendix 3B, the CRLB is found as the .i. i] element of the inverse of a matrix or var(6i) ~ [rl(9)]ii . (3.20) where 1(8) is the p x p Fisher information matrix. The latter is defined by [1(8)] .. = -E [82 lnp(X;8)] v " 88;l}8j it (3.21) f,)r 1= 1,2, ... ,p;j = 1,2, .... In evaluating (3.21) the true value of 8 is used. 'ote t at in t e scalar case (p = 1), = J(O) and we ave the scalar CRLB. Some ,'xamples follow. Example 3.6 - DC Level in White Gaussian Noise (Revisited) "e now extend Example 3.3 to the case where in addition to A the noise variance a 2 :~ also unknown. The parameter vector is 8 = [Aa2 f, and hence p = 2. The 2 x 2 Fisher information matrix is _E[82 ln p (x;8)] 1 8A8a2 -E [82 lnp(X; 8)] . 8 22 a .. It is clear from (3.21) that the matrix is symmetric since the order of partial differenti- .Ilion may be interchanged and can also be shown to be positive definite (see Problem ;UO). The log-likelihood function is, from Example 3.3, N N 1 N-l lnp(x' 8) = --ln271" - -lna2 - - L (x[n]- A)2. , 2 2 2a2 n=O l'he derivatives are easily found as 8 lnp(x; 8) 8A 8 lnp(x; 8) 8a2 82 lnp(x; 8) 8A2 82 lnp(x; 8) 8A8a2 82 lnp(x;8) 8a22 1 N-l "2 L(x[n]- A) a n=O N 1 N-l -- + - "(x[n]- A)2 2a2 2a4 L n=O N a2 1 N-l -- '" (x[n]- A) a4 L n=O N 1 N-l -- - - "(x[n]- A)2. 2a~ a6 L n=O 3.7. EXTE.VSION TO A VECTOR PARAMETER 41 Upon taking tne negative expectations, the Fisher information matrix becomes 1(8) = [~ ~] . 2a4 Although not true in general, for this example the Fisher information matrix is diagonal and hence easily inverted to yield 0 var(A) a- ~ N var(d2 ) 2a4 ~ N Note that the CRLB for A. is the same as for the case when a2 is known due to the diagonal nature of the matrix. Again this is not true in general, as the next example illustrates. 0 Example 3.7 - Line Fitting Consider the problem of line fitting or given the observations x[n] = A + Bn + w[n] n = 0, 1, ... , N - 1 where w[n] is WGN, determine the CRLB for the slope B and the intercept A. .The parameter vector in this case is 8 = [AB]T. We need to first compute the 2 x 2 FIsher information matrix, 1(8) = [ _ [82 ln p(X;8)] E 8A2 _ [82 ln p(x;8)] E .8B8A The likelihood function is p(x; 8) = N exp --2 L (x[n] - A - Bn)2 1 {I N-l } (271"a2 ) "2 2a n=O from which the derivatives follow as 8lnp(x;8) 8A 8lnp(x;8) 8B 1 N-l - "(x[n]- A - Bn) a2 L n=O 1 N-l - "(x[n]- A - Bn)n a2 L n=O
  • 27. 42 CHAPTER 3. CRAMER-RAO LOWER BOUND and o2Inp(x; 9) OA2 o2Inp(x; 9) oA8B o2Inp(x; 9) OB2 N 0'2 1 N-I - 0'2 L n n=O 1 N-I - 0'2 L n 2 . n=O Since the second-order derivatives do not depend on x, we have immediately that [ N~' 1 1(9) 1 N 2:> n-O 0'2 %n ~n2 ~ :'[ N N(N -1) 2 N(N -1) N(N - 1)(2N - 1) 2 6 where we have used the identities N-I Ln n=O Inverting the matrix yields N(N -1) 2 N(N - 1)(2N - 1) 6 [ 2(2N -1) 1-1(9) = 0'2 _N(N: 1) N(:+ 1) j. 12 N(N2 - 1) N(N + 1) It follows from (3.20) that the CRLB is var(A) 2 2(2N - 1)0'2 N(N + 1) var(.8) > N(N2 - 1)' 1 (3.22) 3.7. EXTENSION TO A VECTOR PARAMETER 43 x[nJ x[nJ 4- 3 3- 2 2- n 1 - 1 n 2 3 4 2 3 4 (a) A = 0, B = 0 to A = 1, B = 0 (b) A = 0, B = 0 to A = 0, B = 1 Figure 3.5 Sensitivity of observations to parameter changes-no noise Some interestine; observations follow from examination of the CRLB. Note first that the CRLB for A has increased over that obtained when B is known, for in the latter case we have , 1 0'2 var(A) 2 - E [02Inp(X; A)] = N OA2 and for N 2 2, 2(2N - l)/(N + 1) > 1. This is a quite general result that asserts that the CRLB always increases as we estimate more pammeters '(see Problems 3.11 and 3.12). A second point is that CRLB(A) (2N - l)(N - 1) ---,--;,,-'- = > 1 CRLB(B) 6 for N 2 3. Hence, B is easier to estimate, its CRLB decreasing as 1/N3 as opposed to the l/N dependence for the CRLB of A. ~differing dependences indicate that x[n] is more sensitive to changes in B than to changes in A. A simple calculation reveals ~x[n] ~x[n] ~ ox[n] ~A = ~A oA ~ o;~] ~B = n~B. Changes in B are magnified by n, as illustrated in Figure 3.5. This effect is reminiscent of (3.14), and indeed a similar type of relationship is obtained in the vector parameter case (see (3.33)). See Problem 3.13 for a generalization of this example. 0 As an alternative means of computing the CRLB we can use the identity E [olnp(x; 9) 81np(x; 8)] = -E [021np(X; 8)}.:., 8Bi 8Bj 8Bi 8Bj . (3.23)
  • 28. 44 CHAPTER 3. CRAMER-RAO LOWER BOUND as shown in Appendix 3B. The form given on the right-hand side is usually easier to evaluate, however. We now formally state the CRLB theorem for a vector parameter. Included in the theorem are conditions for equality. The bound is stated in terms of the covariance matrix of 8, denoted by Co, from which (3.20) follows. Theorem 3.2 (Cramer-Rao Lower Bound - Vector Parameter) It is assumed ~hat the PDF p(x; 0) satisfies the "regularity" conditions for alIO," where the expectation is taken with respect to p(x; 0). Then, the covariance matrix of any unbiased estimator IJ satisfies [I(8)li' = -E [EJ2inP(X; 0)] :7 3 8fM}(}j g where the derivatives are evaluated at the true value of 0 and the e ectation is taken with respect to X' 0). Furthermore, an unbiased estimator may be found that attains the bound in that Co = 1-1(0) if and only if 8lnp(x; 8) =I(8)(g(x) _ 0) 08 for some p-dimensional function g and some p x p matrix I. That estimator, which is the MVU estimator, is 0 = g(x), and its covariance matrix is 1-1 (0). The proof is given in Appendix 3B. That (3.20) follows from (3.24) is shown by noting that for a positive semidefinite matrix the diagonal elements are nonnegative. Hence, and therefore 8lnp(x;0) 80 [ 8ln;~;0) 1 8lnp(x;0) 8B 3.8. VECTOR PARAMETER CRLB FOR TRANSFORMATIONS 1 N-l - "(x[n]- A - Bn) (J"2 ~ n=O 1 N-l - "(x[n] - A - Bn)n (J"2 ~ n=O Although'not obvious, this may be rewritten as [ ~ N(~(J"~ 1) 8lnp(x;0) _ u 80 N(N - 1) N(N - 1)(2N - 1) 2(J"2 6(J"2 where 2(2N - 1) N-l 6 N-l N(N + 1) ~ x[n]- N(N + 1) ~ nx[n] 6 N-l 12 N-I N(N + 1) ~ x[n] + N(N2 _ 1) ~ nx[n]. 45 (3.28) (3.29) A A T h e Hence, the conditions for equality are satisfied and [A~] is .an efficiefnt hand t er.elore MVU estimator. Furthermore, the matrix in (3.29) IS the mverse 0 t e covanance matrix. If the equality conditions hold, the reader may ask whether we can be assured that (j is unbiased. Because the regularity conditions [ 8ln p(x;0)] = 0 E 80 are always assumed to hold, we can apply them to (3.25). This then yields E[g(x)] = E(8) =O. . In finding MVU estimators for a vector parameter the CI~.LB theorem ~rovldes a powerful tool. In particular, it allows us to find the MVU estlI~ator. for an.l~portant class of data models. This class is the linear mode'and is descnbed m detail m Chap- ter 4. The line fitting example just discussed is a special case. Suffice. it to say t~at if we can model our data m the linear model form, then the MVU estimator and Its performance are easily found. 3.8 Vector Parameter CRLB for Transformations The discussion in Section 3.6 extends readily to the vector case. Assume that it .is desired to estimate 0: = g(O) for g, an r-dimensional function. Then, as shown m Appendix 3B C - _ 8g(0)I-1(8)8g(0)T > On a 88 88-' (3.30)
  • 29. 46 CHAPTER 3. CRAMER-RAO LOWER BOUND where, as before > 0 is to b . t d . . . th J 'b' ..~ III erprete as posItIve semidefinite. In (3.30) 8g(O)/80 IS e r x p &co Ian matrut"defined as ' ~_..... _........_ 8g1 (0) 8g1 (0) 8g1 (0) an;- ao;- ao;- 8g(O) 8g2(0) 8g2(O) 8g2(O) 89= an;- ao;- ao;- 8gr (0) 8gr (0) 8gr (O) 801 ao;- ao;- Example 3.8 - CRLB for Signal-to-Noise Ratio Consider a DC level in WGN with A and 2 k lX'· . U un nown. He wIsh to estImate A2 a=- u2 w(~)c:~~n/Obe_co~sid;red to be the SN~ for a single sample. Here 0 = [A u 2JT and 9 1 2 - A /u . Then, as shown III Example 3.6, The Jacobian is so that 8g(O) fii} Finally, since a is a scalar [ IV 1(0) = ~2 o 1 IV . 2u4 [~~ -:: 1[: 2~'][-~f 1 4A2 2A4 = -+- IVu2 IVu4 4a + 2a2 = IV (') > 4a + 2a 2 var a - IV . o 3.9. CRLB FOR THE GENERAL GAUSSIAN CASE 47 As discussed in Section 3.6, efficiency is maintained over linear transformations a = g(O) = AO +tf where A is an r x p matrix and b is an r x 1 vector. If a = AiJ + b, and iJ is efficient or ~ - I 1(0), then so that a is unbiased and E(a) =AO +b =a: Co. ::1:) ACfjAT = Arl(O)AT = 8g(O)r1(0)8g(O)y ~.; 80 80 ,. the latter being the CRL,E. For nonlinear transformations efficiency is maintained only as IV -+ 00. (This assumes that the PDF of Ii becomes concentrated about the true value of 0 as IV -+ 00 or that 0 is consistent.) Again this is due to the statistical linearity of g(O) about the true value of o. 3.9 CRLB for the General Gaussian Case· It is quite convenient at times to have a general expression for the CRLB. In the case of Gaussian observations we can derive the CRLB that generalizes (3.14). Assume that x '" N (1'(0), C(0)) so that both the mean and covariance may depend on O. Then, as shown in Appendix 3C, the Fisher information matrix is given by - where 8JL(0) 80i 8[JL(O)h 80i 8[JL(O)h 80i (3.31)
  • 30. 48 ~--=-=:. 3. CRAMER-RAO LOWER BOUND and 8[C(8))l1 8[C f 80i 8C(8) 8[C(8)bI 8[C F an:- = 80i c-;r- 8[C(8))NI 8[C 6 -- - 8[C(8)hN 80i 8[C(8)bN 80; 80i CIt For the scalar parameter case in which X rv N(J.t1 F - this reduces to 1(0) = [8J.t(O)]T _ -.~IJIO)] 80 L _JB + ~tr [IL _ .:C'O)) 2] 2 .~e (3.32) which generalizes (3.14). We now illustrate tn~ ~:---c-ation 'th - _. WI some examples. Example 3.9 - Parameters of as· al' - . Ign m .... nJ;e GaussIan Noise Assume that we wish to estimate a scalar signa :~~eter ;for the data set x[n) = s[n; 0) +w[n) . = ~ ... N _ 1 where w[n) is WGN. The covariance matrix is C= -I ~ d second term in (3.32) is therefore zero. The £:-;:- _=:::: .~ld~oes not depend on O. The 1(0) = which agrees with (3.14). o Generalizing to a vector signal t · . from (3.31) parame er estIILa-;-=-..: ..:: :.Je presence of WGN, we have [1(8)b= [8~~~)r.~= _'~8)] • 3.9. CRLB FOR THE GENERAL GAUSSIAN CASE 49 which yields [1(8)) = 2. '~l 8s[n; 8) 8s[n; 8) IJ 0"2 L 80 80· n=O 1 J (3.33) as the elements of the Fisher information matrix. Example 3.10 - Parameter of Noise Assume that we observe x[n] = w[n] n = 0, 1, ... ,N - 1 where w[n] is WGN with unknown variance 0 = 0"2. Then, according to (3.32), since C(0"2) = 0"21, we have 1(0"2) ~tr [( C-I(0"2) 8~~2)r] ~tr [ ( (:2) (I)r] ~tr [2.1] 2 0"4 N 20"4 which agrees with the results in Example 3.6. A slightly more complicated example follows. 0 Example 3.11 - Random DC Level in WGN Consider the data x[n] = A +w[n) n = 0,1, ... ,N - 1 where w[n) is WGN and A, the DC level, is a Gaussian random variable with zero mean and variance O"~. Also, A is independent of w[n]. The power of the signal or variance O"~ is the unknown parameter. Then, x = [x[O) x[I] ... x[N - I)f is Gaussian with zero mean and an N x N covariance matrix whose [i,j] element is Therefore, E [xli - I)x[j - 1)) E [(A +w[i - I])(A +w[j - 1))] O"i +0"2Jij .
  • 31. .50 CHAPTER 3. CRAMER-RAO LOWER BOUND where 1 = [11 .. . 1]T. Using Woodbury's identity (see Appendix 1), we have Also, since we have that C-I(a2)8C(a~)= 1 T A"2 2 N211 uaA a + a.4 Substituting this in (3.32) produces 1 ( N )2 2 a 2 + Na~ so that the CRLB is var(a.~) ~ 2 (a.~ + ~) 2 Note tha~ .even as N --+ 00, the CRLB does not decrease below 2a~. This is I)P'""T""" each addItIOnal data sample yields the same value of A (see Proble~ 3.14). 3.10 Asymptotic CRLB for WSS Gaussian Random ' Processes' . - . - ------------ O ASYMPTOTIC CRLB 3.1 . -! G 1 -Ie -2 Q(f) -11 h ?ss(f; Ie) 4 Ie - 11 Ie Ie + h fernin = h 1 fernax = :2 - 12 Figure 3.6 Signal PSD for center frequency estimation 51 I 1 2' I 1 2' As shown in Appendix 3D, the elements of the Fisher information are approximately (as N--+ 00) [1(9)J .. = N j! 8lnP",,,,(jj9) 8InP",,,,(jj9) dl '1 2 _! 88; 88j ' (3.34) A typical problem is to estimate the center frequency Ie of a PSD which otherwise is known. Given we wish to determine the CRLB for Ie assuming that Q(f) and a2 are known. We view the process as consisting of a random signal embedded in WGN. The center .JlE~uen(:y of the signal PSD is to be estimated. The real function Q(f) and the signal ;Ie) are shown in Figure 3.6. Note that the possible center frequencies are ,'CO:nstlrailled to be in the interval [ir, 1/2 - h]. For these center frequencies the signal for I ~ 0 will be contained in the [0,1/2] interval. Then, since (J = Ie is a scalar,
  • 32. 52 CHAPTER 3. CRAMER-RAO LOWER BOUND we have from (3.34) But alnPxx(f; Ie) ale aln [Q(f - Ie) + Q(- I - Ie) + (j2] ale aQ(f - Ie) + aQ(- I - Ie) ale ale Q(f - Ie) + Q(- I - Ie) + (j2· This is an odd function of I, so that J! (alnPxx(f;le))2 dl = 2 {! (alnPxx(f;le))2 df. _! ale Jo ale Also, for I 2: 0 we have that Q(- I - Ie) = 0, and thus its derivative is zero due to the assumption illustrated in Figure 3.6. It follows that var(f~) 2: 1 1 ( aQ(f-Ie) )2 N 2 ale dI 1 Q(f - Ie) + (j2 - 1 N a(f Ie) dl 1 ! ( a Q (f-Ie_l) )2 o Q(f - Ie) + (j2 1 2 Ie ---aT' 1_ ( aQ(f') ) 2 N 1-fe Q(f') + (j2 dj' where we have let j' = I - Ie. But 1/2-Ie 2: 1/2-lem .. = hand - Ie:::; - lem'n = -11, SO that we may change the limits of integration to the interval [-1/2,1/2]. Thus, 1 3.11. SIGNAL PROCESSING EXAMPLES 53 As an example, consider where (jf « 1/2, so that Q(f) is bandlimited as shown in Figure 3.6. Then, if Q(f) » (j2, we have approximately Narrower bandwidth (smaller (jJ) spectra yield lower bounds for the center freguency since the PSD changes more rapidly as te changes. See also Problem 3.16 for another example. 0 3.11 Signal Processing Examples We now apply the theory of the CRLB to several signal processing problems of interest. The problems to be considered and some of their areas of application are: 1. Range estimation - sonar, radar, robotics 2. Frequency estimation - sonar, radar, econometrics, spectrometry 3. Bearing estimation - sonar, radar 4. Autoregressive parameter estimation - speech, econometrics. These examples will be revisited in Chapter 7, in which actual estimators that asymp- totically attain the CRLB will be studied. Example 3.13 - Range Estimation In radar or active sonar a si nal ulse is transmitted. The round tri dela -,; from the transmItter to t e target and back is related to the range R as TO - 2R/c, ~ is the speed of propagation. ~stimation of range is therefore equivalent to estimation of the time deley, assuming that c is known. If s(t) is the transmitted signal, a simp~ model for the received continuous waveform is x(t) =s(t - To) +w(t) 0:::; t:::; T. The transmitted signal pulse is assumed to be nonzero over the interval [0, T.l. Addi- tionally, the signal is assumed to be essentially bandlimited to B Hz. If the maximum time delay is Tomll , then the observation interval is chosen to include the entire signal by letting T = T. +TOma:. The noise is modeled as Gaussian with PSD and ACF as
  • 33. 54 -B PSD of wit) No/2 B F(Hz) CHAPTER 3. CRAMER-RAO LOWER BOUND 1 2B ACF ofw(t) sin 211"rB rww(r) = NoB--- 1 1 2B B 211"rB r Figure 3.7 Properties of Gaussian observation noise"':- shown in Figure 3.7. The bandlimited nature of the noise results from filtering the con- tinuous waveform to the signal bandwidth of B Hz. The continuous received waveform is sampled at the Nyquist rate or samples are taken every ~ =1/(2B1seconds ~ the observed data x(n~) = s(n~ - TO) +w(n~) n=O,I, ... ,N-L) Letting x[n] and wIn] be the sampled sequences. we have our discrete data model x[n] =s(n~ - TO) + wIn]. { wIn} x[n) = s(n~ - To) +wIn) wIn) O:::;n:::;no-l ,J no:::;n:::;no+M-l:; no+M:::;n:::;N-l''> (3.35) (3.36) where M is the len th of the sampled signal and no = T; ~ is the dela in sam les. For simplicity we assume that ~ is so small that To/~ can be approximated by an integer.) With this formulation we can apply (3.14) in evaluating the CRLB. - n:~O-1 (8s(n!o- TO) r (72 no+M-1 (dS(t) I )2 T~O ili t=nD.-To '" J SIGNAL PROCESSING EXAMPLES 3.11. = EI(dS(t) )2 n=O dt t=nD. 55 since To = no~. Assuming that ~ is small enough to approximate the sum by an integral, we have - var(fo) 2: 2. fT, (dS(t))2 dt' ~ 10 dt r-inally, noting that ~ = 1/(2B) and (72 = NoB, we have No A '--2." var(To) 2: 1To (dS(t).~2 . ",') _. dtr; o dt,·7 An alternative form observes that the energy £ is which results in where .....-- 1 var(fo) 2: £_ No/2F2 fT, (dS(t))2 dt F 2 _ 10 dt - T l's2(t)dt (3.37) (3.38) It can be shown that £/(No!2) is a SNR [Van Trees ~968]. Also, F2 is a ~easure of the bandwidth of the signal since, using standard FOUrier transform properties, _ JOO (27rF)2IS(FWdF F2 - 00 (3.39) - [ : IS(F)12dF where F denotes continuous-time frequency, and S(F) is the Fourier trans~orm of s(t). In this form it becomes clear that F2 is the mean square bandwidth of the signa!. From (3.38) and (3.39), the larger the mean square bandwi~th, the lower the CRr~{ ;~ instance, assume tfiat the SIgnal is a Gaussian pulse .glven by s(t) exp( 2 F( _ T./2)2) and that s(t) is essentially nonzero over the mterval [0, T.}. Then IS(F)I -
  • 34. 56 CHAPTER 3. CRAMER-RAO LOWER BOUND (aF(';27r)exp(-27r2F2/a~) and F2 = a~/2. As the mean square bandwidth increases, the signal pulse becomes narrower and it becomes easier to estimate the time delay. Finally, by noting that R =cTo/2 and using (3.16), the CRLB for range is , c2 /4 var(R) :::: . _c_F2 No/2 (3.40) (> Example 3.14 - Sinusoidal Parameter Estimation In many fields we are confronted with the roblem of estimatin the arameters of a sinusoidal signal. Economic data w ich are cyclical in nature may naturally fit such a model, while in sonar and radar physical mechanisms cause the observed signal to be sinusoidal. Hence, we examine the determination of the CRLB for the am litu fre uency fa, and phase 0 a smusoid embedded in WGN. This example generalizes xarnples . and 3.5. The data are assumed to be x[nJ = Acos(27rfon + </>} + w[n) n = 0,1, ... ,N-1 where A > 0 and 0 < fo < 1(2 (otherwise the parameters are not identifiable, as is verIfied by considering A = 1, </> = 0 versus A = -1, rP = 7r or fo = 0 with A = 1/2, rP = 0 versus A =1/,;2,</> =7r(4). Since ~ultiple parameters are un!nown, we use (3.33) (1(8)] .. = ~ f:os[n;8) os[n;8J.' '1 a2 n=O aOi oOj . for fJ = [A fo tV. In evaluating the CRLB it is assumed that 0 is not near 0 or 1 2, which allows us to ma e certam simp i cations based on the approximations [Stoica 1989J (see also Problem 3.7): 1 N-I . N,+I L n' sin(47rfon + 2</>} ~ 0 n==O 1 N-I Ni+l L ni cos(47rfon + 2rP) ~ 0 n=O for i = 0, 1,2. Using these approximations and letting 0 =27rfan + </>, we have [1(fJ)]u 1 N-I 1 N-I (1 1 ) N - '"' cos2 0 = - '"' - + - cos 20 ~- a2 L.. cr2 L.. 2 2 2cr2 n=O n=O [1(6)h2 1 N-I A N-I -2 L A27rncososino = -~ L nsin20 ~ 0 cr =0 cr n_ 3.11. SIGNAL PROCESSING EXAMPLES [1(8}b [1(8}b [1(fJ)]J3 = The Fisher information matrix becom:~_____---11 _ "F"'"iti 1 1(8) = 2" cr 2 o o Using (3.22), we have upon inversion var(~) ~ o o n=O n=O 12 (27r)21]N(N2 - 1) 2(2N -1) 1]N(N +1) 57 (3.41) . estimation of a sinusoid is of considerable where 1] = A2/(2a:a)is the SNR. Frequetcy decreases as the SNR increases and interest. Note that the CRLB/Nf~r th~. req~e~~ite sensitive to data record length. See that the bound decreases as 1 ,m. mg also Problem 3.17 for a variation of thIS example. (> Example 3.15 - Bearing Estimation In sonar it is of interest to estimate the in do so the acoustic pressure field is observed b
  • 35. 58 y o d 2 Planar wavefronts CHAPTER 3. CRAMER-RAO LOWER BOUND x M-l Figure 3.8 Geometry of array for bearing estimation ~'. Ass~ming that the target radiates a sinusoidal signal Acos(21TFot + c/J), then the r~ce1ved S1 nal at the nth sensor is A cos 21TFo t - tn)+c/J), where tn is the pro a ati;lli t1me to the nth sensor. If the array is located far rom the target, then the circular waveFronts can be considered to be planar at the array. As shown ill Figure 3.8, the wavefront at t~e (n. l)st sensor lags that at the nth sensor by dcos{3/c due to the extra propagatlOn d1stance. Thus, the propagation time to the nth sensor is d tn = to - n- cos{3 c n = 0, 1, ... ,M - 1 where to is the propagation time to the zeroth sensor, and the observed signal at the nth sensor is sn(t) = A cos [21rFo(t - to +n~ cos{3) +c/Jt. If a single "snapshot" of data is taken or the array element outputs are sampled at a given time t., then - d sn(t.) =Acos[21r(Fo- cos{3)n +c/J'] (3.42) c wher~ 4J' ~ c/J+2~Fo(t. -to). In this form it bec0Il:l,es clear that the spatial observations ~usOldal w1th frequency I. = Fo(d/c) cos (3. To complete the description of the data we assume that the sensor outputs are corrupted by Gaussian noise with zero mean and variance (]2 which is independent from sensor to sensor. The data are modeled as x[n] = sn(t.) +w[n] n = 0, 1, ... ,M - 1 where w[n] is WGN. Since typicall A, are unknown, as well as {3 we have the roblem of estimating {A, I., c/J' based on (3.42) as in Example 3.14. Onc~ the CRLB for these parameters is determined, we can use the transformation of parameters formula. The transformation is for 8 - [A I. c/J't - - Q ~0(9) ~ [ lJ~[~"o'11,~) ]. 3.11. SIGNAL PROCESSING EXAMPLES The Jacobian is so that from (3.30) o _ c 0 [ 1 0 0 1 Fodsin{3 0 0 1 [ C. - 8g(8)I_1 (8/g(8)T] > 0 a 88 88 22 - ' Because of the diagonal Jacobian this yields A [8g(8)]2[_1 1 var({3):::: {fiJ 22 I (8) 22' But from (3.41) we have [ -1 ( 1 12 I 8) 22 = (21T)2TJM(M2 _ 1) and therefore A 12 c2 var«(3) :::: (21r)2TJM(M2 -1) FgtPsin2 {3 or finally • 12 var({3) ~ 2 M +1 (L)2 • 2 , (21r) MTJ M _ 1 >: sm (3, Example 3.16 - Autoregressive Parameter Estimation 59 (3.43) In speech processing an important model for speech production is the autoregressive (AR) process. As shown in Figure 3.9, the data are modeled as the output of a causal all-pole discrete filter excited at the input by WGN un. The excitation noise urn] is an III erent part of the model, necessary to ensure that x n is a WSS random rocess. The all-pole filter acts to niode e vocal tract, while the excitation noise models the :Orclll of air throu h a constriction in the throat necessary to produce an unvoiced sound such as an "s." The effect of t e Iter is to color the white noise so as to model PSDs wjth several resonances. This model is also referred to as a linear predictive coding (LPC) modef'[Makhoul 1975]. Since the AR model is capable of producing a variety of PSDs, depending on the choice of the AR filter parameters {a[1], a[2]' ... ,a(p]) and