Estimationtheory2

Maximum Likelihood Estimator (MLE)
Suppose 1 2, ,..., nX X X are random samples with the joint probability density
function 1 2, ,..., / 1 2( , ,..., )nX X X nf x x x which depends on an unknown nonrandom
parameter  .
/ 1 2( , , ..., / )nf x x x X is called the likelihood function. If 1 2, ,..., nX X X are discrete,
then the likelihood function will be a joint probability mass function. We represent
the concerned random variables and their values in vector notation by
1 2[ ... ]nX X X X and 1 2[ ... ]nx x x x respectively. Note that
/( / ) ln ( / )L f   Xx x is the log likelihood function. As a function of the random
variables, the likelihood and log-likelihood functions are random variables.
The maximum likelihood estimator ˆ
MLE is such an estimator that
/ 1 2 / 1 2
ˆ( , ,..., / ) ( , ,..., / ),n MLE nf x x x f x x x    X X
If the likelihood function is differentiable with respect to  , then ˆ
MLE is given by
MLE
ˆ/ θ
( / ) 0f  




X x
or 0
θ
)|L(
MLEθˆ 

 x
Thus the MLE is given by the solution of the likelihood equation given above.
If we have k unknown parameters given by
1
2
k



 
 
 
 
 
 
θ
Then MLE is given by a set of conditions.
1 1 2 2
ˆ ˆ ˆ1 2
L( / ) L( / ) L( / )
.... 0
θ θ θ
MLE MLE k kMLE
k     
  
  
   
        
x x x

Sinceln( ( / ))L x is a monotonic function of the argument, it is convenient to express the MLE
conditions in terms of the log-likelihood function. The condition is now given by
1 1 2 2
ˆ ˆ ˆ1 2
(L( / )) (L( / )) (L( / ))
.... 0
θ θ θ
MLE MLE k kMLE
k
Ln Ln Ln
     
  
  
   
    
    
x x x
Example 10:
Let 1 2, ,..., nX X X are independent identically distributed sequence of 2
( , )N   distributed
random variables. Find MLE for  and 2
 .
2
2
1 2/ ,
( , ,..., / , )nf x x x 
 X
=
2
1
2
1
1
2
ix
n
i
e



 
  
 


2
2 2
/ ,
( / , ) ln ( / , )L f  
    X
x x
=
2
1
1
ln 2 ln -
2
n
i
i
x
n n

 

 
   
 

ˆ
1
0
ˆ( ) 0
MLE
n
i MLE
i
L
x





 
  
2
2
ˆ
2
2 4
0
ˆ( )
0
ˆ ˆ
MLE
MLEi
MLE MLE
L
xn


 
 
 

   

Solving we get
 
1
22
1
1
ˆ
1
ˆ ˆ
n
MLE i
i
n
MLE i MLE
i
x and
n
x
n

 



 


Example 11:
Let 1 2, ,..., nX X X are independent identically distributed random samples with

1/
1
( ) -
2
x
Xf x e x


 
    
Show that 1 2( , ,..., )nX X Xmedian is the MLE for .
1
1 2, ,...., / 1 2
1
( , ,...., )
2
n
i
i
n
x
X X X n n
f x x x e



 

/
1
( / ) ln ( )
ln2
n
i
i
L f
n x




   
Xx x
1
n
i
i
x 

 is minimized by 21, ,( ..., )nxmedian x x
21, ,ˆ ( ..., )MLE nxmedian x x 
Properties of MLE
(1) MLE may be biased or unbiased. In Example 4, ˆMLE is unbiased where as 2
ˆMLE is a
biased estimator.
(2) If an efficient estimator exists, the MLE estimator is the efficient estimator.
Supposes an efficient estimator θˆ exists . Then
ˆ( / ) ( )L x c  


 

at ˆ ,MLE 
ˆ
( / )
0
ˆ ˆ( ) 0
ˆ ˆθ
MLE
MLE
MLE
L x
c



 




  
 
(3) The MLE is asymptotically unbiased and efficient. Thus for large n, the MLE is
approximately efficient.
(4) Invariance Properties of MLE
It is a remarkable property of the MLE and not shaerd by other estimators. If ˆ
MLE is the
MLE of  and ( )h  is a function, then ˆ( )MLEh  is the MLE of ( )h  .
(5) ˆ
MLE is a consistent estimator of 

Proof:
Suppose 0 is the true value of 
Given the iid observations 1 2, 3, ,......, nX X X X the sample average of the log-likelihood
function is given by
   /
1
1
ln
n
n X i
i
L f X
n


 
ˆ
MLE maximizes  nL  .
    
   
0
0
/
/ /
Let ln
= ln i
X i
X i X i i
L E f X
f x f x dx
 
 





Note that the expectation is carried out with respect to true value 0
According to WLLN
    (1)p
nL L  
Now we show thjat
   0nL L 
Note that    0nL L 
     
 
 
 
 
 
 
 
   
0 0 0
0
0
0
0
0
0
/ /
/
/
/
/
/
/
/
0
ln ln
ln
1 log t t-1
= 1
1 1 0
0
X X
X
X
X
X
X
X
X
n
E f X E f X
f X
E
f X
f X
E
f X
f X
f x dx
f X
L L
   









 


 
 
  
 
 
 
   
 
 
 
 
 
 
  
  

In other words  L  is maximum at 0 
Now from (1)    p
nL L 

Bayesean Estimators
We may have some prior information about  in a sense that some values of  are more likely
(a priori information). We can represent this prior information in the form of a prior density
function.
In the following we omit the suffix in density functions just for notational simplicity.
The likelihood function will now be the conditional density )./( xf
, /( , ) ( ) ( )Xf f f x    Xx
Also we have the Bayes rule
/
/
/
/
( ) ( / )
( / )
( )
( ) ( / )
( ) ( / )

 

 
  
 

 
 



D
f f
f
f
f f
f f d
X
X
X
X
X
x
x
x
x
x
where / ( )f  X is the a posteriori density function
Example: Let X be Gaussian random samples with unknown mean  and variance 1. Given
~ (0,1) N . Find the a posteriori PDF / ( / )f xX for a single observation x.
Solution: We have
21
2
( )
2




 
e
f ,
21
( )
2
/ ( / )
2



 
 
x
X
e
f x
Parameter 
with density
)(f
/
Obervation
( / )f  X
x
x

2 2
2
2
2 2
2 2
2
2
2
2
2
2 2
2
2
1 1
( )
2 2
1
2
1
( 2 )
2 4 2 4
1
( )1 1 22
2( 2 ) 2
1
2( 2 )
1 1
( )
2 2
/ 1
2( 2 )
(
( )
2 2
2 2
2 2
2 2 1
2
2
2 2
2 2( / )
2 2
 
 



 


 

 

 




 

  

   

    

 
 


  


 
 







 






x
X
x
x
x x
x x
x
x
x
x
X
x
x
e e
f x d
e e
d
e e
d
e e
d
e
e e
f x
e
e
2
)
2
1
2
2
 
The parameter  is a random variable and the estimator ˆ( )X is another random variable.
Estimation error .θˆ  
We associate a cost function or a loss function ),θˆ( C with every estimator .θˆ It represents
the positive penalty with each wrong estimation.
Thus ),θˆ( C is a non-negative function.
The three most popular cost functions are:
Quadratic cost function 2
)θˆ( 
Absolute cost function θˆ
ˆ( )   

2

ˆ( )   

Hit or miss cost function (also called uniform cost function)
minimising means minimising on an average)
Bayesean Risk function or average cost
,
ˆ ˆ( , )C EC θ,θ) C(θ θ) f ( ,θ d dθ
 

 
    X x x
The estimator seeks to minimize the Bayescan Risk.
Case I. Quadratic Cost Function and the Minimum Mean-square Error Estimator
2
θ)-θˆ()θˆ,(θ C
Estimation problem is
Minimize 2
,
ˆ( )θ θ) f ( ,θ d dθ
 

 
  X x x
with respect to θˆ .
This is equivalent to minimizing
 
 










df)dθf(θ)θθ
ddθ)ff(θ)θθ
xxx
xxx
)()|ˆ((
)(|ˆ(
2
2
Since )(xf is always +ve, the above integral will be minimum if the inner integral is minimum.
This results in the problem:
Minimize 


 )dθf(θ)θθ )|ˆ( 2
x
with respect to .ˆ
2
/
ˆ ) 0
ˆθ




  
  (θ θ) f (θ dθX
/
ˆ2 ) 0



    (θ θ) f (θ dθX
1
-
2

2
 
( )C 

/ /
ˆ ) )
 
 
 
  θ f (θ dθ θ f (θ dθX X
/
ˆ )



  θ θ f (θ dθX
θˆ is the conditional mean or mean of the a posteriori density. Since we are minimizing
quadratic cost it is also called minimum mean square error estimator (MMSE).
Salient Points
 Given a priori density function )f (θ and the conditional PDF / ( )fX x ,
 We have to determine a posteriori density / )f (θ X . This is determined form the Bayes
rule:
/
/
/
/
( ) ( )
( )
( )
( ) ( )
( ) ( )
D
f f
f
f
f f
f f d




 
 

 
 




X
X
X
X
X
x
x
x
x
Example Let 1 2, ,...., nX X X be samples of ~ ( ,1)X N  and unknown mean ~ (0,1)N . Find
ˆ
MMSE .
We are given
2
2
2
1
1
2
( )
2
/
1
( )
2
1
( )
2
1
/ )
2
1
( 2 )
















 




i
n
i
i
xn
i
x
n
f e
f ( e
e
X x
Also, /
/
( ) ( )
/ )
( )
 
 
X
f f
f (θ
f
X
X
x
x
x
where

 
 
2
2
1
2 2 2
1
/
( )1
2 2
1
2 2( 1)
( ) ( ) ( / )
2
2
n
i
i
n
i
i
X
x
n
x n x
n
n
f f f d
e
d
e


  






 


 



 








Xx x
 
 
2
2
1
2 2 2
1
2
( )1
2 2
1
/
2 2( 1)
1
2 1
2
/ )
2
1
2
1
ˆ ( / ( ))
1
n
i
i
n
i
i
x
n
x n x
n
n
n n
θ x
n
MMSE
e
f (θ
e
e
n
E
n
x
n









 


 

  
  
 

 




  


X x
X = x

Estimationtheory2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Estimationtheory2

Similar to Estimationtheory2 (20)

More from Gopi Saiteja

More from Gopi Saiteja (20)

Recently uploaded

Recently uploaded (20)

Estimationtheory2