ESTIMATION THEORY
4.1. Introduction:
 When we fit the random data by an AR model, we have to determine the process
parameters observed data.
 In RADAR signal processing, we have to determine the location and the velocity of a
target by observing the received noisy data
 In communication, we have to infer about the transmitted signal from the received noisy
data.
Generally estimation includes: Parametric estimation and Non parametric estimation.
RADAR signal processing
For example, consider the problem of estimating the probability density function  Xf x
of a random variable X. We may assume a model for X, say the Gaussian, and find the mean 
and the variance 2
 of the RV. Finding out  and 2
 from the observed value of X is a
problem of parameter estimation. Particularly, we have the problem to find each value of a
signal from noisy observation. This problem is known as the signal estimation problem.
Otherwise, we may be interested to find the true value of  Xf x directly from data for all
values of X without assuming any model for  Xf x .This is the non-parametric method of
estimation.
We will discuss the problem of parameter estimation here:
We have a sequence of observable random variables 1, 2 nX X X , represented by the
vector:
1
2
n
X
X
X
 
 
 
 
 
 
X
X is governed by the joint density function which depends on some unobservable parameter
and is given by:    1 2, ,..., / 1 2 /, , , | |  nX X X nf x x x fX x , where, may be deterministic or
random. Our aim is to make an inference on  from an observed sample of 1, 2 , , nX X X .
An estimator ( ) x is a rule by which we guess about the value of an unknown  on the basis of
X.
( ) X being a function of random variables, is a random variable. For a particular observation
1, 2 , , nX X X , we get what is known as an estimate (not estimator). An estimator for a parameter
is also called a point estimate.
Example 1:
Let 1 2, , , nX X X be a sequence of independent and identically distributed (i.i.d) random
variables with mean  and variance 2
 .
1
1 n
i
i
X
n


  is an estimator for  .
 2 2
1
1
ˆ
1
n
i
i
X
n
 

 

 is an estimator for 2
 .
An estimator is a function of the random sequence 1 2, , , nX X X and if it does not involve any
unknown parameters. Such a function is generally called a Statistic.
Example 2:
Suppose we have DC voltage X corrupted by noise iV and the observed data , 1,2,...,iY i n are
given by
i iY X V 
Then,
1
1ˆ
n
i
i
X Y
n 
  is an estimator for X.
Properties of Estimators
A good estimator should satisfy some properties. ( ) X should be as close to  as possible.
Some desirable properties of the estimator can be described in terms of the mean and variance of
the estimator.
(a) Unbiased Estimator
An estimator  of  is said to be unbiased if and only if E( )= . The quantity E( )- is
called the bias of the estimator. Unbiasedness is necessary but not sufficient to make an
estimator a good one.
A random parameter θ is unbiased. if E E  . We consider θ to be a deterministic parameter in
this discussion.
Consider two estimators,
 
22
1
1
1 n
i
i
X
n
 

  and
 
22
2
1
1
1
v
i
i
X
n
 

 


for an i.i.d. sequence 1, 2, , nX X X .
We can show 2
2 is an unbiased estimator.
   
       
2 2
1 1
2 2
1
2
n n
i i
i i
n
i i
i
E X E X
E X X
   
     
 

    
      
 

Now,  
2 2
iE X    and
 
 
 
2
2
2
2
2
2
1
1
( )
i
i
i
X
E E
n
E n X
n
E X
n
  


 
   
 
 
 



    2
2
1
i i j
i j i
E X E X X
n
  

    
 
2
2
1
iE X
n
  (because of independence)
2
n


also,   
2 2
i
i
X
E X E
n n
 
  
 
     
 
 
2
2 2 2 2
1
2 ( 1)
n
i x
i
E X n n    

      
So,  
2
2 2
1
1
ˆ ˆ( )
1
n
i
i
E E X
n
  

  


2
ˆ is an unbiased estimator of 2
 .
Similarly, sample mean is an unbiased estimator
   
1
1
1
1
n
x i
i
n
i
i
X
n
n
E E X
n n


 



  


.
Example 4 Suppose 1 2, , , nX X X is an i.i.d. Poisson random variables with unknown
parameter . Then,
1
1
1ˆ
n
i
i
X
n


  and  
2
1
1
1ˆ
1
n
i
i
X
n
 

 

 are two unbiased estimators of .
(b) Variance of the Estimator
The variance of the estimator  is given by:
   
2
var ( )E E   
For an unbiased case
   
2
var E   
The variance of the estimator should be as low as possible.
An unbiased estimator  is called a minimum variance unbiased estimator (MVUE) if
   
2 2
'E E      where ' is any other unbiased estimator.
(c) Mean square error of Estimator
 
2
MSE E    .MSE should be as small as possible. Out of all unbiased estimator, the
MVUE has the minimum mean square error.
MSE is related to the bias and variance as shown below:
   
      
      
2 2
2 2
2 2
2
2
MSE E E E E
E E E E E E E
E E E E E E E
     
       
       
     
      
      
   2
var 0b   
So,    2
varMSE b  
(d) Consistent Estimators
As we have more data, the quality of estimation should be better. This idea is used in defining
the consistent estimator. An estimator  is called a consistent estimator of  if  converges in
probability to .
  0lim
N
P   

   for any 0 
Less rigorous test is obtained by applying the Markov Inequality
   
2
2
E
P
 
  


  
If  is an unbiased estimator [   0b   ], then  varMSE  .
Therefore, if  
2
0lim
N
E  

  , then  will be a consistent estimator.
Also, note that    2
varMSE b   .
Therefore, if the estimator is asymptotically unbiased (i.e.   0b   as n ) and  var 0 
as n ,then 0MSE  .Therefore for an asymptotically unbiased estimator  , if  var 0 
asn   , then  will be a consistent estimator.
Example 3
Suppose 1 2, , , nX X X is an i.i.d. random sequence with unknown x and known 2
x .
Let
1
1 n
i
i
X
n


  be an estimator for x . We have already shown that x is unbiased. also,
 
2
var
n

  . Is it a consistent estimator?
Clearly,  
2
var 0lim limx
n n n


 
  . Therefore,  is a consistent estimator of  .
Efficient Estimator
Suppose 1 and 2 be two unbiased estimator of the parameter . The relative efficiency of the
estimator 2 with respect to the estimator 1 I s defined by
1
2
var( )
ˆvar( )
Relative Efficiency



Particularly, if 1 is an MVUE, it is called an efficient estimator and the absolute efficiency of
an unbiased estimator with respect to this estimator.
Example 5 Suppose 1 2, , , nX X X is an i.i.d. normal random variables with unknown mean  .
Then, ˆ and the sample median ˆm are two
We have shown that  
2
var
n

  and it can be shown that  
2
ˆvar
2
m
n

 . Therefore,
 
2
ˆEfficiencyof m


Example 6 Suppose 1 2, , , NX X X is an i.i.d. normal random variables with unknown mean  .
Then, 1
1
1
ˆ
1
n
i
i
X
n




 is a biased estimator of  . Note that
1
2 2
1 2 2
ˆ ˆ
1
ˆ ˆvar( ) var( )
( 1) ( 1)
n
n
n n
n n
 

 


  
 
We have shown that  
2
var
n

  and it can be shown that  
2
ˆvar
2
m
n

 . Therefore,
 
2
ˆEfficiencyof m


Minimum Variance Unbiased Estimator
We described about the Minimum Variance Unbiased Estimator (MVUE) which is a desirable
estimator
ˆ is an MVUE if
 )ˆ(E
and ˆ ˆ( ) ( )Var Var 
where ˆ is any other unbiased estimator of .
Theorem: MVUE is unique
Suppose 1
ˆ and 2
ˆ are two MVUEs for the deterministic parameter  .
Clearly , 1 2
ˆ ˆE E   
Suppose 2
1 2( ) ( )Var Var   
Assume another estimator 1 2
3
2
 



Then
 
1 2 1 2
3
1 2 1 2
1 2 1 2
2
var( ) var( ) 2cov( , )
( )
4
var( ) var( ) 2 cov( , )
4
var( ) var( ) 2 var( )var( )
.
4
using CS. inequality
var
   

   
   

 

 

 


But 3( )var  cannot be less than 2
 .
2 2
3 1 2( ) cov( , )var         .
Now consider
1 2 1 2 1 2
2 2 2
( ) var( ) var( ) 2cov( , )
2
0
var      
  
   
  

1 2
ˆ ˆ  with probability 1.
Cramer Rao theorem
Can we reduce the variance of an unbiased estimator indefinitely? The answer is given by the
Cramer Rao theorem.
Suppose ˆ is an unbiased estimator of random sequence. Let us denote the sequence by the
vector
1
2
n
X
X
X
 
 
 
 
 
 
X
Let / 1( ,......, / )nf x x X be the joint PDF which characterises .X This function is called
likelihood function. Note that may also be random. In that case likelihood function will
represent conditional joint density function.
The quantity / 1 2( / ) ln ( , ,..., / )nL f x x x  Xx is called the log- likelihood function.
Statement of the Cramer Rao theorem
Suppose ˆ is an unbiased estimator of D  , where D is an open interval and
/ 1( ,..., / )nf x x X satisfies the following regularity conditions:
(i) The support /{ | ( / ) 0}Xf   x x does not depend on  . We may assume n
 
to be the support.
(ii) For , D  x ,
( / )L L 
 
 

 
x
exists and is finite.
Then
1ˆ( )
( )n
Var
I



where 2
( ) ( )n
L
I E




and ( )nI  is a measure of average information in the random sequence
and is called Fisher information statistic.
The equality of CR bound holds if )ˆ( 




c
L
where c is a constant.
Proof: ˆ is an unbiased estimator of
0)ˆ(  E .
/
/
ˆ( ) ( / ) 0.
ˆ( ) ( / ) 0.
f d
f d


  
  




  
  


X
X
x x
x x
where the integration is an n-fold integration.
Differentiate with respect to , we get
/
ˆ{( ) ( / )} 0.
d
f dx
d
  



  X x
Note that the regularity condition that the limits of integration are not function of . Therefore,
the processes of integration and differentiation can be interchanged and we get,
/
/ /
/
ˆ{( ) ( / )} 0.
ˆ( ) ( / )} ( / ) 0.
ˆ( ) ( / ) ( / ) 1. (1)
f d
f d f d
f d f d

 

  

   

   



 
 
 
 

 


   


   


 
  X
X
X X
X
x x
x x x x
x x x x
Note that
/
/ / /( / ) {ln ( / )} ( / )
( ) ( / )
f f f
L
f 
    
 


 

 


 X
X X Xx x x
x
Therefore, from (1)
/
ˆ( ){ ( / )} ( / )} 1.L f d   




 
 Xx x x
So that
/ /
2
ˆ( ) ( / ) ( / ) ( / )dx 1f L f d 
    



 
  
 
 X X
x x x x . (2)
since /
( / ) 0.f 
 X
x
Recall that he Cauchy Schawarz inequality is given by
222
, baba 
where the equality holds when ba c ( where c is any scalar ).
Applying this inequality to the L.H.S. of equation (2) we get
2
2
2
-
ˆ( ) ( / ) ( / ) ( / )dx
ˆ( - ) ( / ) d ( ( / ) ( / ) d
f L f d
f L f
    

    



 
 
 
 
 
 
  
 

 
X X
X X
x x x x
x x x x x
= ˆvar( ) ( )nI 
ˆ. . var( ) ( )nL H S I  
But R.H.S. = 1
ˆvar( ) I ( ) 1.n  
1ˆvar( ) ,
( )nI


 
which is the Cramer Rao Inequality. The right hand side is the Cramer Rao lower bound (CRLB)
for ˆvar( ) .
The equality will hold when
ˆ( / ) ( / )} ( ) ( / ) ,L f c f    


 

X Xx x x
so that
where c is independent of ˆ and may be a function of  . Noting that
2
2 21 ( / ) ˆ= ( - )
ˆvar( )
L
E c E

 

 
  
 
x
,
we get
( )nc I 
Thus the CRLB is achieved if and only if
( / ) ˆ( )( - )n
L
I

  




x
( / ) ˆ( - )
L
c

 




x
If ˆ satisfies CR -bound with equality, then ˆ is called an efficient estimator. Note that an
efficient estimator is always an MVUE.
Also from /
( / ) 1,f d



 X
x x we get
/
/
( / ) 0
( / ) 0
f d
L
f d














 



X
X
x x
x x
Taking the partial derivative with respect to  again, we get
/ /
/ /
2
2
22
2
( / ) ( / ) 0
( / ) ( / ) 0
L L
f f d
L L
f f d
 
 
 
  
 
 






   
  
   
    
    
    
X X
X X
x x x
x x x
2 2
2
L( / ) L( / )
E - E
 
 
  
  
  
x x
Thus the CR inequality may be written as:
2
2
1ˆvar( )
L( / )
- E






x
Remark
(1) If the information ( )I  is more, ˆvar( ) will be less.
(2) Suppose 1 2, ,..., nX X X are iid. Then
/ /1 1
, ,..., /1 2
, ,..., /1 2
/
/
2 2
1 2
2
1 2
2
1 22
2
2
1
2
2
( ) ln( ( )) ln( ( ))
( ) ln( ( , ,..., / ))
ln( ( , ,..., / ))
ln( ( / ))
ln(
X X
X X Xn
X X Xn
Xi
Xi
n n
n
n
i
i
I E f x E f x
I E f x x x
E f x x x
E f x
E f
 




 
 








  
   
  
 
   
 
 
   
 
 
   
 

 
 1
1
( / ))
( )
n
i
i
x
nI





 
 
 

(3) If ˆ satisfies CR -bound with equality, then ˆ is called an efficient estimator.
Extension to Vector Parameters
Suppose 1 2, ,..., k   are k parameters which are represented as the vector 1 2[ ... ]k   θ .
Then the log-likelihood function is given by
/ 1 2( / ) ln ( , ,..., )nL f x x x X θx θ
We can represent the 1st
-order partial derivatives of ( / )L x θ as
1 2
( / ) ( / ) ( / )... ... ( / )
k
L L L L
  
    
      
x θ x θ x θ x θ
θ
The Fisher Information matrix is given by
( / ) ( / )E L L
  
  
  
nI x θ x θ
θ θ
where E is performed on each term of the matrix.
It can be shown that
2 2 2
2
1 1 2 1 2
2 2 2
2
1 2
( / )
( / ) ( / ) ( / )
... ....
( / ) ( / ) ( / )
... ....
n n n
E L
L L L
E
L L L
    
    
  
   
  
   
     
 
 
   
   
     
 
 
nI x θ
θ θ
x θ x θ x θ
x θ x θ x θ
Assume that the pdf / 1 2( , ,..., /nf x x xX θ θ) satisfies the regularity condition
( ( / )) 0E L



x θ
θ
where the expectation is taken with respect to / 1 2( , ,..., / )nf x x xX θ θ Then, the
covariance matrix of any unbiased estimator satisfies:
ˆ ( )n -1
θ
C I θ 0
where the in equality with respect to the Zero matrix implies that the left-hand side is a positive
semi-definite matrix.
The CR theorem for the individual variances is given by
( , )
ˆ( ) ( ) |i n i iVar   -1
I θ
where ( , )| i i denotes the ith element of the matrix.
The equality will hold when
ˆ( / ) ( / )} ( ) ( / ) ,L f c f    


 

X Xx x x
so that
Example 3:
Let 1 2, ,..., nX X X are iid Gaussian random sequence with known 2
variance  and unknown
mean  .
Suppose
1
1
ˆ
n
i
i
X
n


  which is unbiased. Find CR bound and hence show that ˆ is an efficient
estimator.
( / ) ˆ( ) ( - )
L


n
x θ
I θ θ θ
θ
The likelihood function / 2( , ,...., / )nf x x x X 1 will be product of individual densities (since iid)
/ 2 n
1 2( )
22 11
( , ,....., / )
( (2 )
n n
n
xi
if x x x e


 
 
 X 1
so that 2
2
1
1
( / ) ln( 2 ) ( )
2
n
n n
i
i
L x   
 
   X
Now
2
1
2
2 2
2
2 2
1
0 - ( -2) ( )
2
n
-
n
So that E -
n
i
i
L
X
L
L

 
 
 



 
 

 




 CR Bound =
2
2
2
2
1 1 1
( )
-n
nLI n
E 



  


2 2 2
1
1
ˆ( ) - ( - )
2
n
i
i
i i
L n X n
X
n
   
   
  
      
 
estimator.efficientanisˆand
)-ˆ(c-Hence





L
Example 4 Suppose n nX a bn V   , 2
~ (0, ), and are known constants.nV N a b Here
[ ]a b θ . The -likelihood function is given by
1
/ 1 2 n
1 2( )
21 2( , ,..., )
( (2 )
i
i
n n
n
x a bi
f x x x e 
 


 
X θ
2
2
1
2 2
1 1
2 2 2
2 2 2 2 2
1
( / ) ln( 2 ) ( )
2
1 1
( ), ( ) ,
( 1) ( 1)(2 1)
, and
2 6
n
n n
i
i
n n
i i
i i
L x a bi
L L
x a bi x a bi i
a b
L n L n n L n n n
a a b a
 

 
  

 

 
    
 
      
 
        
  
   
x θ
2
( 1)
2
1 ( 1) ( 1)(2 1)
2 6
n n
n
n n n n

 
 
 
    
 
 
 
  
nI
Taking the inverse we get,
1 2
2
2
2(2 1) 6
( 1) ( 1)
6 12
( 1) ( 1)
2(2 1) 12
var( ) and var( )
( 1) ( 1)
n
n
n n n n
n n n n
n
a b
n n n n

  
  
 
 
    
 
 
 
 

  
 
I
MVUE through Sufficient Statistic
W saw that MVUE achieving the CRLB can be obtained through the factorization of
 ( / ) ˆ( )
L
 

x θ
I θ θ θ
θ
. However, the CRLB may not be achieved by the MVUE. The sufficient
statistic can be used to find the MVUE under certain conditions.
The observations 1 2, ,...., nX X X contain information about the unknown parameter . An
estimator should carry the same information about  as the observed data. The concept of
sufficient statistic is based on this idea.
A measurable function 1 2( , ,...., )nT X X X is called a sufficient statistic of  if it contains the
same information about  as contained in the random sequence 1 2, ,...., .nX X X In other word the
joint conditional density 1 2 1 2, ,..., | ( , ,..., ) 1 2( , ,..., )n nX X X T X X X nf x x x does not involve .
There are a large number of sufficient statistics for a particular criterion. One has to select a
sufficient statistic which has good estimation properties.
Example 7: Suppose  ,1ix N  and  1 2 1 2,T x x x x  . Then,
        
    
 
 
    
 
1 2 1 2
1 2 1 2
1 2
1 2
1 2
2 2
1 2
2
1 2
1 2 1 2, , ,
1 2, | ,
1 2,
, 1 2
1 2
1
2
1
2
4
, , ,
,
,
,
1
2
1
4
1
x x T x x
x x T x x
T x x
x x
x x
x x
x x
f x x T x x
f x x
f T x x
f x x
f x x
e
e
e
 





   
  





     
 
 
2 2 2 2 2 2
1 2 1 2 1 2 1 2 1 2
2 2
1 2 1 2
2
1 2
1 1
2 2 4 4 2
2 4
1
2
4
1
4
1
1
x x x x x x x x x x
x x x x
x x
e
e
   


          
  
 


Thus    1 2 1 2 1 2, | ,
,x x T x x
f x x does not involve the parameter  . Hence  1 2 1 2,T x x x x  is a
sufficient statistic.
Remark: If  1 2 1 2, 3T x x x x  we can show in a similar way that  1 2,T x x is not a sufficient
statistic.
The above definition allows us to check whether a given statistic is sufficient or not. A way to
determine a sufficient statistic is through the Neyman Fisher Factorization theorem .
Factorization theorem
For continuous RVS 1 2, , , nX X X , the statistic 1 2( , ,...., )nT X X X is a sufficient statistic for 
if and only if
1 2, ,..., / 1 2 1 2
ˆ( , ,...., ) ( , ) ( , ,...., )nX X X n nf x x x g h x x x  
where )ˆ,( g is a non-constant and nonnegative function of  and ˆ and 1 2( , ,...., )nh x x x
does not involve  and is a nonnegative function of 1 2, ,...., nx x x .
For the discrete case, the factorization theorem states:
T(x) is sufficient if and only if
( ) ( , ( )) ( )p g T h X
x x x
Proof: Denote the value T(x) by t. Suppose T(X) is a sufficient statistic. Then,
: ( )
( ) ( )
( , ( ) )
( ( ) ) ( ( ) , )
( ( ) ) ( ( ) ) [ ( ) is a sufficient statistic ]
( ) ( )
( , )) ( )
T t
p P
P T t
P T t P T t
P T t P T t T
p h
g t h




 



 
  
   
   
 
  
 


X
X
x x
x X x
X x X
X X x X
X X x X X
x x
x
where
: ( )
( , ) ( )
T t
g t p 

  X
x x
x and ( ) ( ( ) ))h P T t  x X x X
Conversely, suppose ( ) ( , )) ( )p g t h X
x x . Then,
 
 
 
 
 
: ( )
: ( )
: (
, ( )
( )
( )
( )
( , ) ( )
( , ) ( )
( , ) ( )
( , ) ( )
( )
( )
T t
T t
T t
P T t
P T t
P T t
P
P T t
g t h
g t h
g t h
g t h
h
h







 
  










x x
x x
x x)
X x X
X x X
X
X x
X
x
x
x
x
x
x
which does not depend on θ.
Example 8: Suppose 1 2, ,..., nX X X are iid Gaussian random variables with unknown mean 
and known variance 1.
Then
n
i
i 1
1
( ) XT
n 
 X is a sufficient statistic of .
Because
 
 
1 2
1
2 2
1 1
21
2
, ,..., / 1 2
1
21
2
1 1
2 2
1
( , ,...., )
2
1
( 2 )
1
( 2 )
n
n
i
n n
i
i i
n xi
X X X n
i
xi
n
n x xi
n
f x x x e
e
e e








 
 

 
   
 


The first exponential is a function of 1 2, ,..., nx x x and the second exponential is a function of
 and
1
( )
n
i
i
T x x

  . Therefore
1
( )
n
i
i
T x x

  is a sufficient statistics of  .
Rao-Blackwell Theorem
Suppose ˆ is an unbiased estimator of  and ( )T X is a sufficient for  .
Then ˆ ˆ( / ( ))E T  X is unbiased and ˆ ˆvar( ) var( )  .
Proof : Using the property of conditional expectation ,we have
ˆ ˆ( ( / ( )))
ˆ( )
E E E T
E
 





X
∴ ˆ is an unbiased estimator of �. Now
2
2
2
2
ˆ ˆvar( ) ( )
ˆ( ( ) / ( ))
ˆ( (( ) / ( )) )
ˆ( )
ˆvar( )
(Using Jensen's inequality for a convex finction)
E
E E T
E E T
E
  
 
 
 

  
 
 
 

X
X
Complete statistic
A statistic   X is said to be complete if for any  bounded function   g  X
    0 forE g   X
implies that
   0 1 forP g    X
Example: Suppose 1 2 3, , ,........, n    are iid  1, random variables and
 
1
n
i
i
  X
Clearly    ,i n  X and   X takes values 0,1,...,t n .
         
0
1 0 0,1
n
n tt
t
n
E g g t
t
  


 
       
 
X
     
0
1 0 0,1
1
tn
n
t
n
g t
t

 

  
      
  

 
0
0
1
tn
t
n
g t
t


  
   
  

The left hand side are polynomials in
1


 
 
 
and can be zero if and only if the coefficients
vanish
 g 0 for 0,1,2,...,t t n  
Hence  is complete statistic.
Remark: If   X is a complete statistic then there is only one function   g  X which is
unbiased. Suppose there is another function   1g x which is unbiased.
Then        1 0g g     X X
       1 0g g    X X
      1 0g g     X X
     1g g   X X with probability 1
Lehmann-Scheff theorem
Suppose   X is complete sufficient statistic for  and   g  X is unbiased estimator based
on   X . Then   g  X is the MVUE
Proof:
Using Rao Blackwell theorem,
       ˆ /g    X X X ,where
 ˆ X is any unbiased estimator of  , is unbiased.   g  X is unique as   X is a complete
statistic and
      
       
ˆ
ˆ /
Var g Var
g


 
    
X X
X X X
is an MVUE
Exponential Family of Distribution
A family of distribution with the probability density function ( or probability mass function) of
the form
 / ( ) ( )exp( ( ) ( ))Xf x a b x c t x  
with ( ) 0a   and ( )c  as real functions of  and ( ) 0b x 
is called an exponential family of distribution.
Similarly a family of distributions      /
1
( ) ( )exp
k
X i i
i
f x a b x c t x  

 
  
 
 with
     , and ia b x c  as specified above , is called the k-parameter exponential family. An
e po e tial fa il of dis rete RV’s ill ha e the pro a ilit ass fu tio i the a o e for s
Example 9
Suppose  2
,X N   then
   
 
2
2
2/
2 2
2
2
2 2 2
1 1
exp
22
1 1
= exp 2
22
1 1 1
= exp exp
2 22
X
f x x
x x
x x

 
   
 
 
    
 
    
     
     
Thus  2
/X
f x
belongs to a 2-parameter exponential family with
           2
2
2
1 2 1 22 2 2
1 1 1
, exp , 1, , , , and
2 22
a b x c c t x x t x x   



  
              
If 1 2, ,..., nX X X are iid random variables ,then
         
       
1 2
1
1 1
1
, ,..., exp
= exp
k nn
n
n j i i j
j
i j
k
n
i i
i
f x x x a b x c t x
a b c T
 
 

 

 
   
 
 
 
 
 

X/θ
x x
Define
 
 
 
 
 
 
 
1
2
1
1
2
1
1
:
:
=
:
:
k
n
j
j
n
j
j
n
n j
j
T
T
T
T
t x
t x
t x



 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  



x
x
x
x
It is easy to show that  T x is a sufficient and complete.
Criteria for Estimation
The estimation of a parameter is based on several well-known criteria. Each of the criteria tries
to optimize some functions of the observed samples with respect to the unknown parameter to be
estimated. Some of the most popular estimation criteria are:
 Maximum Likelihood
 Minimum Mean Square Error.
 Baye’s Method.
Maximum Likelihood Estimator (MLE)
Suppose 1 2, ,..., nX X X are random samples with the joint probability density
function 1 2, ,..., / 1 2( , ,..., )nX X X nf x x x which depends on an unknown nonrandom
parameter  .
/ 1 2( , , ..., / )nf x x x X is called the likelihood function. If 1 2, ,..., nX X X are discrete,
then the likelihood function will be a joint probability mass function. We represent
the concerned random variables and their values in vector notation by
1 2[ ... ]nX X X X and 1 2[ ... ]nx x x x respectively. Note that
/( / ) ln ( / )L f   Xx x is the log likelihood function. As a function of the random
variables, the likelihood and log-likelihood functions are random variables.
The maximum likelihood estimator ˆ
MLE is such an estimator that
/ 1 2 / 1 2
ˆ( , ,..., / ) ( , ,..., / ),n MLE nf x x x f x x x    X X
If the likelihood function is differentiable with respect to  , then ˆ
MLE is given by
MLE
ˆ/ θ
( / ) 0f  




X x
or 0
θ
)|L(
MLEθˆ 

 x
Thus the MLE is given by the solution of the likelihood equation given above.
If we have k unknown parameters given by
1
2
k



 
 
 
 
 
 
θ
Then MLE is given by a set of conditions.
1 1 2 2
ˆ ˆ ˆ1 2
L( / ) L( / ) L( / )
.... 0
θ θ θ
MLE MLE k kMLE
k     
  
  
   
        
x x x
Sinceln( ( / ))L x is a monotonic function of the argument, it is convenient to express the MLE
conditions in terms of the log-likelihood function. The condition is now given by
1 1 2 2
ˆ ˆ ˆ1 2
(L( / )) (L( / )) (L( / ))
.... 0
θ θ θ
MLE MLE k kMLE
k
Ln Ln Ln
     
  
  
   
    
    
x x x
Example 10:
Let 1 2, ,..., nX X X are independent identically distributed sequence of 2
( , )N   distributed
random variables. Find MLE for  and 2
 .
2
2
1 2/ ,
( , ,..., / , )nf x x x 
 X
=
2
1
2
1
1
2
ix
n
i
e



 
  
 


2
2 2
/ ,
( / , ) ln ( / , )L f  
    X
x x
=
2
1
1
ln 2 ln -
2
n
i
i
x
n n

 

 
   
 

ˆ
1
0
ˆ( ) 0
MLE
n
i MLE
i
L
x





 
  
2
2
ˆ
2
2 4
0
ˆ( )
0
ˆ ˆ
MLE
MLEi
MLE MLE
L
xn


 
 
 

   

Solving we get
 
1
22
1
1
ˆ
1
ˆ ˆ
n
MLE i
i
n
MLE i MLE
i
x and
n
x
n

 



 


Example 11:
Let 1 2, ,..., nX X X are independent identically distributed random samples with
1/
1
( ) -
2
x
Xf x e x


 
    
Show that 1 2( , ,..., )nX X Xmedian is the MLE for .
1
1 2, ,...., / 1 2
1
( , ,...., )
2
n
i
i
n
x
X X X n n
f x x x e



 

/
1
( / ) ln ( )
ln2
n
i
i
L f
n x




   
Xx x
1
n
i
i
x 

 is minimized by 21, ,( ..., )nxmedian x x
21, ,ˆ ( ..., )MLE nxmedian x x 
Properties of MLE
(1) MLE may be biased or unbiased. In Example 4, ˆMLE is unbiased where as 2
ˆMLE is a
biased estimator.
(2) If an efficient estimator exists, the MLE estimator is the efficient estimator.
Supposes an efficient estimator θˆ exists . Then
ˆ( / ) ( )L x c  


 

at ˆ ,MLE 
ˆ
( / )
0
ˆ ˆ( ) 0
ˆ ˆθ
MLE
MLE
MLE
L x
c



 




  
 
(3) The MLE is asymptotically unbiased and efficient. Thus for large n, the MLE is
approximately efficient.
(4) Invariance Properties of MLE
It is a remarkable property of the MLE and not shaerd by other estimators. If ˆ
MLE is the MLE
of  and ( )h  is a function, then ˆ( )MLEh  is the

Estimation theory 1

  • 1.
    ESTIMATION THEORY 4.1. Introduction: When we fit the random data by an AR model, we have to determine the process parameters observed data.  In RADAR signal processing, we have to determine the location and the velocity of a target by observing the received noisy data  In communication, we have to infer about the transmitted signal from the received noisy data. Generally estimation includes: Parametric estimation and Non parametric estimation. RADAR signal processing
  • 2.
    For example, considerthe problem of estimating the probability density function  Xf x of a random variable X. We may assume a model for X, say the Gaussian, and find the mean  and the variance 2  of the RV. Finding out  and 2  from the observed value of X is a problem of parameter estimation. Particularly, we have the problem to find each value of a signal from noisy observation. This problem is known as the signal estimation problem. Otherwise, we may be interested to find the true value of  Xf x directly from data for all values of X without assuming any model for  Xf x .This is the non-parametric method of estimation. We will discuss the problem of parameter estimation here: We have a sequence of observable random variables 1, 2 nX X X , represented by the vector: 1 2 n X X X             X X is governed by the joint density function which depends on some unobservable parameter and is given by:    1 2, ,..., / 1 2 /, , , | |  nX X X nf x x x fX x , where, may be deterministic or random. Our aim is to make an inference on  from an observed sample of 1, 2 , , nX X X . An estimator ( ) x is a rule by which we guess about the value of an unknown  on the basis of X. ( ) X being a function of random variables, is a random variable. For a particular observation 1, 2 , , nX X X , we get what is known as an estimate (not estimator). An estimator for a parameter is also called a point estimate. Example 1: Let 1 2, , , nX X X be a sequence of independent and identically distributed (i.i.d) random variables with mean  and variance 2  .
  • 3.
    1 1 n i i X n    is an estimator for  .  2 2 1 1 ˆ 1 n i i X n        is an estimator for 2  . An estimator is a function of the random sequence 1 2, , , nX X X and if it does not involve any unknown parameters. Such a function is generally called a Statistic. Example 2: Suppose we have DC voltage X corrupted by noise iV and the observed data , 1,2,...,iY i n are given by i iY X V  Then, 1 1ˆ n i i X Y n    is an estimator for X. Properties of Estimators A good estimator should satisfy some properties. ( ) X should be as close to  as possible. Some desirable properties of the estimator can be described in terms of the mean and variance of the estimator. (a) Unbiased Estimator An estimator  of  is said to be unbiased if and only if E( )= . The quantity E( )- is called the bias of the estimator. Unbiasedness is necessary but not sufficient to make an estimator a good one. A random parameter θ is unbiased. if E E  . We consider θ to be a deterministic parameter in this discussion. Consider two estimators,   22 1 1 1 n i i X n      and   22 2 1 1 1 v i i X n       
  • 4.
    for an i.i.d.sequence 1, 2, , nX X X . We can show 2 2 is an unbiased estimator.             2 2 1 1 2 2 1 2 n n i i i i n i i i E X E X E X X                             Now,   2 2 iE X    and       2 2 2 2 2 2 1 1 ( ) i i i X E E n E n X n E X n                         2 2 1 i i j i j i E X E X X n            2 2 1 iE X n   (because of independence) 2 n   also,    2 2 i i X E X E n n                  2 2 2 2 2 1 2 ( 1) n i x i E X n n             So,   2 2 2 1 1 ˆ ˆ( ) 1 n i i E E X n          2 ˆ is an unbiased estimator of 2  . Similarly, sample mean is an unbiased estimator     1 1 1 1 n x i i n i i X n n E E X n n             . Example 4 Suppose 1 2, , , nX X X is an i.i.d. Poisson random variables with unknown parameter . Then,
  • 5.
    1 1 1ˆ n i i X n     and  2 1 1 1ˆ 1 n i i X n        are two unbiased estimators of . (b) Variance of the Estimator The variance of the estimator  is given by:     2 var ( )E E    For an unbiased case     2 var E    The variance of the estimator should be as low as possible. An unbiased estimator  is called a minimum variance unbiased estimator (MVUE) if     2 2 'E E      where ' is any other unbiased estimator. (c) Mean square error of Estimator   2 MSE E    .MSE should be as small as possible. Out of all unbiased estimator, the MVUE has the minimum mean square error. MSE is related to the bias and variance as shown below:                   2 2 2 2 2 2 2 2 MSE E E E E E E E E E E E E E E E E E E                                              2 var 0b    So,    2 varMSE b   (d) Consistent Estimators As we have more data, the quality of estimation should be better. This idea is used in defining the consistent estimator. An estimator  is called a consistent estimator of  if  converges in probability to .   0lim N P        for any 0 
  • 6.
    Less rigorous testis obtained by applying the Markov Inequality     2 2 E P           If  is an unbiased estimator [   0b   ], then  varMSE  . Therefore, if   2 0lim N E      , then  will be a consistent estimator. Also, note that    2 varMSE b   . Therefore, if the estimator is asymptotically unbiased (i.e.   0b   as n ) and  var 0  as n ,then 0MSE  .Therefore for an asymptotically unbiased estimator  , if  var 0  asn   , then  will be a consistent estimator. Example 3 Suppose 1 2, , , nX X X is an i.i.d. random sequence with unknown x and known 2 x . Let 1 1 n i i X n     be an estimator for x . We have already shown that x is unbiased. also,   2 var n    . Is it a consistent estimator? Clearly,   2 var 0lim limx n n n       . Therefore,  is a consistent estimator of  . Efficient Estimator Suppose 1 and 2 be two unbiased estimator of the parameter . The relative efficiency of the estimator 2 with respect to the estimator 1 I s defined by 1 2 var( ) ˆvar( ) Relative Efficiency   
  • 7.
    Particularly, if 1is an MVUE, it is called an efficient estimator and the absolute efficiency of an unbiased estimator with respect to this estimator. Example 5 Suppose 1 2, , , nX X X is an i.i.d. normal random variables with unknown mean  . Then, ˆ and the sample median ˆm are two We have shown that   2 var n    and it can be shown that   2 ˆvar 2 m n   . Therefore,   2 ˆEfficiencyof m   Example 6 Suppose 1 2, , , NX X X is an i.i.d. normal random variables with unknown mean  . Then, 1 1 1 ˆ 1 n i i X n      is a biased estimator of  . Note that 1 2 2 1 2 2 ˆ ˆ 1 ˆ ˆvar( ) var( ) ( 1) ( 1) n n n n n n             We have shown that   2 var n    and it can be shown that   2 ˆvar 2 m n   . Therefore,   2 ˆEfficiencyof m  
  • 8.
    Minimum Variance UnbiasedEstimator We described about the Minimum Variance Unbiased Estimator (MVUE) which is a desirable estimator ˆ is an MVUE if  )ˆ(E and ˆ ˆ( ) ( )Var Var  where ˆ is any other unbiased estimator of . Theorem: MVUE is unique Suppose 1 ˆ and 2 ˆ are two MVUEs for the deterministic parameter  . Clearly , 1 2 ˆ ˆE E    Suppose 2 1 2( ) ( )Var Var    Assume another estimator 1 2 3 2      Then   1 2 1 2 3 1 2 1 2 1 2 1 2 2 var( ) var( ) 2cov( , ) ( ) 4 var( ) var( ) 2 cov( , ) 4 var( ) var( ) 2 var( )var( ) . 4 using CS. inequality var                         But 3( )var  cannot be less than 2  . 2 2 3 1 2( ) cov( , )var         . Now consider 1 2 1 2 1 2 2 2 2 ( ) var( ) var( ) 2cov( , ) 2 0 var                  1 2 ˆ ˆ  with probability 1. Cramer Rao theorem Can we reduce the variance of an unbiased estimator indefinitely? The answer is given by the Cramer Rao theorem. Suppose ˆ is an unbiased estimator of random sequence. Let us denote the sequence by the vector
  • 9.
    1 2 n X X X            X Let / 1( ,......, / )nf x x X be the joint PDF which characterises .X This function is called likelihood function. Note that may also be random. In that case likelihood function will represent conditional joint density function. The quantity / 1 2( / ) ln ( , ,..., / )nL f x x x  Xx is called the log- likelihood function. Statement of the Cramer Rao theorem Suppose ˆ is an unbiased estimator of D  , where D is an open interval and / 1( ,..., / )nf x x X satisfies the following regularity conditions: (i) The support /{ | ( / ) 0}Xf   x x does not depend on  . We may assume n   to be the support. (ii) For , D  x , ( / )L L         x exists and is finite. Then 1ˆ( ) ( )n Var I    where 2 ( ) ( )n L I E     and ( )nI  is a measure of average information in the random sequence and is called Fisher information statistic. The equality of CR bound holds if )ˆ(      c L where c is a constant. Proof: ˆ is an unbiased estimator of 0)ˆ(  E .
  • 10.
    / / ˆ( ) (/ ) 0. ˆ( ) ( / ) 0. f d f d                     X X x x x x where the integration is an n-fold integration. Differentiate with respect to , we get / ˆ{( ) ( / )} 0. d f dx d         X x Note that the regularity condition that the limits of integration are not function of . Therefore, the processes of integration and differentiation can be interchanged and we get, / / / / ˆ{( ) ( / )} 0. ˆ( ) ( / )} ( / ) 0. ˆ( ) ( / ) ( / ) 1. (1) f d f d f d f d f d                                                  X X X X X x x x x x x x x x x Note that / / / /( / ) {ln ( / )} ( / ) ( ) ( / ) f f f L f                   X X X Xx x x x Therefore, from (1) / ˆ( ){ ( / )} ( / )} 1.L f d           Xx x x So that / / 2 ˆ( ) ( / ) ( / ) ( / )dx 1f L f d                  X X x x x x . (2) since / ( / ) 0.f   X x Recall that he Cauchy Schawarz inequality is given by
  • 11.
    222 , baba  wherethe equality holds when ba c ( where c is any scalar ). Applying this inequality to the L.H.S. of equation (2) we get 2 2 2 - ˆ( ) ( / ) ( / ) ( / )dx ˆ( - ) ( / ) d ( ( / ) ( / ) d f L f d f L f                                   X X X X x x x x x x x x x = ˆvar( ) ( )nI  ˆ. . var( ) ( )nL H S I   But R.H.S. = 1 ˆvar( ) I ( ) 1.n   1ˆvar( ) , ( )nI     which is the Cramer Rao Inequality. The right hand side is the Cramer Rao lower bound (CRLB) for ˆvar( ) . The equality will hold when ˆ( / ) ( / )} ( ) ( / ) ,L f c f          X Xx x x so that where c is independent of ˆ and may be a function of  . Noting that 2 2 21 ( / ) ˆ= ( - ) ˆvar( ) L E c E            x , we get ( )nc I  Thus the CRLB is achieved if and only if ( / ) ˆ( )( - )n L I         x ( / ) ˆ( - ) L c        x
  • 12.
    If ˆ satisfiesCR -bound with equality, then ˆ is called an efficient estimator. Note that an efficient estimator is always an MVUE. Also from / ( / ) 1,f d     X x x we get / / ( / ) 0 ( / ) 0 f d L f d                    X X x x x x Taking the partial derivative with respect to  again, we get / / / / 2 2 22 2 ( / ) ( / ) 0 ( / ) ( / ) 0 L L f f d L L f f d                                              X X X X x x x x x x 2 2 2 L( / ) L( / ) E - E              x x Thus the CR inequality may be written as: 2 2 1ˆvar( ) L( / ) - E       x Remark (1) If the information ( )I  is more, ˆvar( ) will be less. (2) Suppose 1 2, ,..., nX X X are iid. Then
  • 13.
    / /1 1 ,,..., /1 2 , ,..., /1 2 / / 2 2 1 2 2 1 2 2 1 22 2 2 1 2 2 ( ) ln( ( )) ln( ( )) ( ) ln( ( , ,..., / )) ln( ( , ,..., / )) ln( ( / )) ln( X X X X Xn X X Xn Xi Xi n n n n i i I E f x E f x I E f x x x E f x x x E f x E f                                                         1 1 ( / )) ( ) n i i x nI             (3) If ˆ satisfies CR -bound with equality, then ˆ is called an efficient estimator. Extension to Vector Parameters Suppose 1 2, ,..., k   are k parameters which are represented as the vector 1 2[ ... ]k   θ . Then the log-likelihood function is given by / 1 2( / ) ln ( , ,..., )nL f x x x X θx θ We can represent the 1st -order partial derivatives of ( / )L x θ as 1 2 ( / ) ( / ) ( / )... ... ( / ) k L L L L                x θ x θ x θ x θ θ The Fisher Information matrix is given by ( / ) ( / )E L L          nI x θ x θ θ θ where E is performed on each term of the matrix. It can be shown that
  • 14.
    2 2 2 2 11 2 1 2 2 2 2 2 1 2 ( / ) ( / ) ( / ) ( / ) ... .... ( / ) ( / ) ( / ) ... .... n n n E L L L L E L L L                                                     nI x θ θ θ x θ x θ x θ x θ x θ x θ Assume that the pdf / 1 2( , ,..., /nf x x xX θ θ) satisfies the regularity condition ( ( / )) 0E L    x θ θ where the expectation is taken with respect to / 1 2( , ,..., / )nf x x xX θ θ Then, the covariance matrix of any unbiased estimator satisfies: ˆ ( )n -1 θ C I θ 0 where the in equality with respect to the Zero matrix implies that the left-hand side is a positive semi-definite matrix. The CR theorem for the individual variances is given by ( , ) ˆ( ) ( ) |i n i iVar   -1 I θ where ( , )| i i denotes the ith element of the matrix. The equality will hold when ˆ( / ) ( / )} ( ) ( / ) ,L f c f          X Xx x x so that Example 3: Let 1 2, ,..., nX X X are iid Gaussian random sequence with known 2 variance  and unknown mean  . Suppose 1 1 ˆ n i i X n     which is unbiased. Find CR bound and hence show that ˆ is an efficient estimator. ( / ) ˆ( ) ( - ) L   n x θ I θ θ θ θ
  • 15.
    The likelihood function/ 2( , ,...., / )nf x x x X 1 will be product of individual densities (since iid) / 2 n 1 2( ) 22 11 ( , ,....., / ) ( (2 ) n n n xi if x x x e        X 1 so that 2 2 1 1 ( / ) ln( 2 ) ( ) 2 n n n i i L x         X Now 2 1 2 2 2 2 2 2 1 0 - ( -2) ( ) 2 n - n So that E - n i i L X L L                       CR Bound = 2 2 2 2 1 1 1 ( ) -n nLI n E          2 2 2 1 1 ˆ( ) - ( - ) 2 n i i i i L n X n X n                     estimator.efficientanisˆand )-ˆ(c-Hence      L Example 4 Suppose n nX a bn V   , 2 ~ (0, ), and are known constants.nV N a b Here [ ]a b θ . The -likelihood function is given by 1 / 1 2 n 1 2( ) 21 2( , ,..., ) ( (2 ) i i n n n x a bi f x x x e        X θ 2 2 1 2 2 1 1 2 2 2 2 2 2 2 2 1 ( / ) ln( 2 ) ( ) 2 1 1 ( ), ( ) , ( 1) ( 1)(2 1) , and 2 6 n n n i i n n i i i i L x a bi L L x a bi x a bi i a b L n L n n L n n n a a b a                                               x θ
  • 16.
    2 ( 1) 2 1 (1) ( 1)(2 1) 2 6 n n n n n n n                      nI Taking the inverse we get, 1 2 2 2 2(2 1) 6 ( 1) ( 1) 6 12 ( 1) ( 1) 2(2 1) 12 var( ) and var( ) ( 1) ( 1) n n n n n n n n n n n a b n n n n                               I MVUE through Sufficient Statistic W saw that MVUE achieving the CRLB can be obtained through the factorization of  ( / ) ˆ( ) L    x θ I θ θ θ θ . However, the CRLB may not be achieved by the MVUE. The sufficient statistic can be used to find the MVUE under certain conditions. The observations 1 2, ,...., nX X X contain information about the unknown parameter . An estimator should carry the same information about  as the observed data. The concept of sufficient statistic is based on this idea. A measurable function 1 2( , ,...., )nT X X X is called a sufficient statistic of  if it contains the same information about  as contained in the random sequence 1 2, ,...., .nX X X In other word the joint conditional density 1 2 1 2, ,..., | ( , ,..., ) 1 2( , ,..., )n nX X X T X X X nf x x x does not involve . There are a large number of sufficient statistics for a particular criterion. One has to select a sufficient statistic which has good estimation properties. Example 7: Suppose  ,1ix N  and  1 2 1 2,T x x x x  . Then,
  • 17.
                            1 2 1 2 1 2 1 2 1 2 1 2 1 2 2 2 1 2 2 1 2 1 2 1 2, , , 1 2, | , 1 2, , 1 2 1 2 1 2 1 2 4 , , , , , , 1 2 1 4 1 x x T x x x x T x x T x x x x x x x x x x f x x T x x f x x f T x x f x x f x x e e e                              2 2 2 2 2 2 1 2 1 2 1 2 1 2 1 2 2 2 1 2 1 2 2 1 2 1 1 2 2 4 4 2 2 4 1 2 4 1 4 1 1 x x x x x x x x x x x x x x x x e e                         Thus    1 2 1 2 1 2, | , ,x x T x x f x x does not involve the parameter  . Hence  1 2 1 2,T x x x x  is a sufficient statistic. Remark: If  1 2 1 2, 3T x x x x  we can show in a similar way that  1 2,T x x is not a sufficient statistic. The above definition allows us to check whether a given statistic is sufficient or not. A way to determine a sufficient statistic is through the Neyman Fisher Factorization theorem . Factorization theorem For continuous RVS 1 2, , , nX X X , the statistic 1 2( , ,...., )nT X X X is a sufficient statistic for  if and only if 1 2, ,..., / 1 2 1 2 ˆ( , ,...., ) ( , ) ( , ,...., )nX X X n nf x x x g h x x x   where )ˆ,( g is a non-constant and nonnegative function of  and ˆ and 1 2( , ,...., )nh x x x does not involve  and is a nonnegative function of 1 2, ,...., nx x x . For the discrete case, the factorization theorem states: T(x) is sufficient if and only if ( ) ( , ( )) ( )p g T h X x x x
  • 18.
    Proof: Denote thevalue T(x) by t. Suppose T(X) is a sufficient statistic. Then, : ( ) ( ) ( ) ( , ( ) ) ( ( ) ) ( ( ) , ) ( ( ) ) ( ( ) ) [ ( ) is a sufficient statistic ] ( ) ( ) ( , )) ( ) T t p P P T t P T t P T t P T t P T t T p h g t h                                X X x x x X x X x X X X x X X X x X X x x x where : ( ) ( , ) ( ) T t g t p     X x x x and ( ) ( ( ) ))h P T t  x X x X Conversely, suppose ( ) ( , )) ( )p g t h X x x . Then,           : ( ) : ( ) : ( , ( ) ( ) ( ) ( ) ( , ) ( ) ( , ) ( ) ( , ) ( ) ( , ) ( ) ( ) ( ) T t T t T t P T t P T t P T t P P T t g t h g t h g t h g t h h h                       x x x x x x) X x X X x X X X x X x x x x x x which does not depend on θ. Example 8: Suppose 1 2, ,..., nX X X are iid Gaussian random variables with unknown mean  and known variance 1. Then n i i 1 1 ( ) XT n   X is a sufficient statistic of . Because
  • 19.
        12 1 2 2 1 1 21 2 , ,..., / 1 2 1 21 2 1 1 2 2 1 ( , ,...., ) 2 1 ( 2 ) 1 ( 2 ) n n i n n i i i n xi X X X n i xi n n x xi n f x x x e e e e                        The first exponential is a function of 1 2, ,..., nx x x and the second exponential is a function of  and 1 ( ) n i i T x x    . Therefore 1 ( ) n i i T x x    is a sufficient statistics of  . Rao-Blackwell Theorem Suppose ˆ is an unbiased estimator of  and ( )T X is a sufficient for  . Then ˆ ˆ( / ( ))E T  X is unbiased and ˆ ˆvar( ) var( )  . Proof : Using the property of conditional expectation ,we have ˆ ˆ( ( / ( ))) ˆ( ) E E E T E        X ∴ ˆ is an unbiased estimator of �. Now 2 2 2 2 ˆ ˆvar( ) ( ) ˆ( ( ) / ( )) ˆ( (( ) / ( )) ) ˆ( ) ˆvar( ) (Using Jensen's inequality for a convex finction) E E E T E E T E                     X X Complete statistic A statistic   X is said to be complete if for any  bounded function   g  X     0 forE g   X implies that    0 1 forP g    X Example: Suppose 1 2 3, , ,........, n    are iid  1, random variables and   1 n i i   X Clearly    ,i n  X and   X takes values 0,1,...,t n .
  • 20.
             0 1 0 0,1 n n tt t n E g g t t                  X       0 1 0 0,1 1 tn n t n g t t                     0 0 1 tn t n g t t              The left hand side are polynomials in 1         and can be zero if and only if the coefficients vanish  g 0 for 0,1,2,...,t t n   Hence  is complete statistic. Remark: If   X is a complete statistic then there is only one function   g  X which is unbiased. Suppose there is another function   1g x which is unbiased. Then        1 0g g     X X        1 0g g    X X       1 0g g     X X      1g g   X X with probability 1 Lehmann-Scheff theorem Suppose   X is complete sufficient statistic for  and   g  X is unbiased estimator based on   X . Then   g  X is the MVUE Proof: Using Rao Blackwell theorem,        ˆ /g    X X X ,where  ˆ X is any unbiased estimator of  , is unbiased.   g  X is unique as   X is a complete statistic and                ˆ ˆ / Var g Var g          X X X X X is an MVUE Exponential Family of Distribution A family of distribution with the probability density function ( or probability mass function) of the form
  • 21.
     / () ( )exp( ( ) ( ))Xf x a b x c t x   with ( ) 0a   and ( )c  as real functions of  and ( ) 0b x  is called an exponential family of distribution. Similarly a family of distributions      / 1 ( ) ( )exp k X i i i f x a b x c t x            with      , and ia b x c  as specified above , is called the k-parameter exponential family. An e po e tial fa il of dis rete RV’s ill ha e the pro a ilit ass fu tio i the a o e for s Example 9 Suppose  2 ,X N   then       2 2 2/ 2 2 2 2 2 2 2 1 1 exp 22 1 1 = exp 2 22 1 1 1 = exp exp 2 22 X f x x x x x x                                    Thus  2 /X f x belongs to a 2-parameter exponential family with            2 2 2 1 2 1 22 2 2 1 1 1 , exp , 1, , , , and 2 22 a b x c c t x x t x x                         If 1 2, ,..., nX X X are iid random variables ,then                   1 2 1 1 1 1 , ,..., exp = exp k nn n n j i i j j i j k n i i i f x x x a b x c t x a b c T                          X/θ x x Define
  • 22.
                 1 2 1 1 2 1 1 : : = : : k n j j n j j n n j j T T T T t x t x t x                                                x x x x It is easy to show that  T x is a sufficient and complete.
  • 23.
    Criteria for Estimation Theestimation of a parameter is based on several well-known criteria. Each of the criteria tries to optimize some functions of the observed samples with respect to the unknown parameter to be estimated. Some of the most popular estimation criteria are:  Maximum Likelihood  Minimum Mean Square Error.  Baye’s Method. Maximum Likelihood Estimator (MLE) Suppose 1 2, ,..., nX X X are random samples with the joint probability density function 1 2, ,..., / 1 2( , ,..., )nX X X nf x x x which depends on an unknown nonrandom parameter  . / 1 2( , , ..., / )nf x x x X is called the likelihood function. If 1 2, ,..., nX X X are discrete, then the likelihood function will be a joint probability mass function. We represent the concerned random variables and their values in vector notation by 1 2[ ... ]nX X X X and 1 2[ ... ]nx x x x respectively. Note that /( / ) ln ( / )L f   Xx x is the log likelihood function. As a function of the random variables, the likelihood and log-likelihood functions are random variables. The maximum likelihood estimator ˆ MLE is such an estimator that / 1 2 / 1 2 ˆ( , ,..., / ) ( , ,..., / ),n MLE nf x x x f x x x    X X If the likelihood function is differentiable with respect to  , then ˆ MLE is given by MLE ˆ/ θ ( / ) 0f       X x or 0 θ )|L( MLEθˆ    x Thus the MLE is given by the solution of the likelihood equation given above.
  • 24.
    If we havek unknown parameters given by 1 2 k                θ Then MLE is given by a set of conditions. 1 1 2 2 ˆ ˆ ˆ1 2 L( / ) L( / ) L( / ) .... 0 θ θ θ MLE MLE k kMLE k                         x x x Sinceln( ( / ))L x is a monotonic function of the argument, it is convenient to express the MLE conditions in terms of the log-likelihood function. The condition is now given by 1 1 2 2 ˆ ˆ ˆ1 2 (L( / )) (L( / )) (L( / )) .... 0 θ θ θ MLE MLE k kMLE k Ln Ln Ln                           x x x Example 10: Let 1 2, ,..., nX X X are independent identically distributed sequence of 2 ( , )N   distributed random variables. Find MLE for  and 2  . 2 2 1 2/ , ( , ,..., / , )nf x x x   X = 2 1 2 1 1 2 ix n i e             2 2 2 / , ( / , ) ln ( / , )L f       X x x = 2 1 1 ln 2 ln - 2 n i i x n n              ˆ 1 0 ˆ( ) 0 MLE n i MLE i L x           2 2 ˆ 2 2 4 0 ˆ( ) 0 ˆ ˆ MLE MLEi MLE MLE L xn               Solving we get
  • 25.
      1 22 1 1 ˆ 1 ˆ ˆ n MLEi i n MLE i MLE i x and n x n           Example 11: Let 1 2, ,..., nX X X are independent identically distributed random samples with 1/ 1 ( ) - 2 x Xf x e x          Show that 1 2( , ,..., )nX X Xmedian is the MLE for . 1 1 2, ,...., / 1 2 1 ( , ,...., ) 2 n i i n x X X X n n f x x x e       / 1 ( / ) ln ( ) ln2 n i i L f n x         Xx x 1 n i i x    is minimized by 21, ,( ..., )nxmedian x x 21, ,ˆ ( ..., )MLE nxmedian x x  Properties of MLE (1) MLE may be biased or unbiased. In Example 4, ˆMLE is unbiased where as 2 ˆMLE is a biased estimator. (2) If an efficient estimator exists, the MLE estimator is the efficient estimator. Supposes an efficient estimator θˆ exists . Then ˆ( / ) ( )L x c        at ˆ ,MLE  ˆ ( / ) 0 ˆ ˆ( ) 0 ˆ ˆθ MLE MLE MLE L x c              
  • 26.
    (3) The MLEis asymptotically unbiased and efficient. Thus for large n, the MLE is approximately efficient. (4) Invariance Properties of MLE It is a remarkable property of the MLE and not shaerd by other estimators. If ˆ MLE is the MLE of  and ( )h  is a function, then ˆ( )MLEh  is the