Upcoming SlideShare
×

# MLE Example

217 views

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
217
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
2
0
Likes
0
Embeds 0
No embeds

No notes for slide

### MLE Example

1. 1. Example of MLE Computations, using R First of all, do you really need R to compute the MLE? Please note that MLE in many cases have explicit formula. Second of all, for some common distributions even though there are no explicit formula, there are standard (existing) routines that can compute MLE. Example of this catergory include Weibull distribution with both scale and shape parameters, logistic regression, etc. If you still cannot ﬁnd anything usable then the following notes may be useful. We start with a simple example so that we can cross check the result. Suppose the observations X1 , X2 , ..., Xn are from N (µ, σ 2 ) distribution (2 parameters: µ and σ 2 ). The log likelihood function is − (Xi − µ)2 − 1/2 log 2π − 1/2 log σ 2 + log dXi 2σ 2 (actually we do not have to keep the terms −1/2 log 2π and log dXi since they are constants. In R software we ﬁrst store the data in a vector called xvec xvec <- c(2,5,3,7,-3,-2,0) # or some other numbers then deﬁne a function (which is negative of the log lik) fn <- function(theta) { sum ( 0.5*(xvec - theta[1])^2/theta[2] + 0.5* log(theta[2]) ) } where there are two parameters: theta[1] and theta[2]. They are components of a vector theta. then we try to ﬁnd the max (actually the min of negative log lik) nlm(fn, theta <- c(0,1), hessian=TRUE) or optim(theta <- c(0,1), fn, hessian=TRUE) 1
2. 2. You may need to try several starting values (here we used c(0,1)) for the theta. ( i.e. theta[1]=0, theta[2]=1. ) Actual R output session: > xvec <- c(2,5,3,7,-3,-2,0) # you may try other values > fn # I have pre-defined fn function(theta) { sum( 0.5*(xvec-theta[1])^2/theta[2] + 0.5* log(theta[2]) ) } > nlm(fn, theta <- c(0,2), hessian=TRUE) # minimization \$minimum [1] 12.00132 \$estimate [1] 1.714284 11.346933 \$gradient [1] -3.709628e-07 -5.166134e-09 \$hessian [,1] [,2] [1,] 6.169069e-01 -4.566031e-06 [2,] -4.566031e-06 2.717301e-02 \$code [1] 1 \$iterations [1] 12 > mean(xvec) [1] 1.714286 > sum( (xvec -mean(xvec))^2 )/7 [1] 11.34694 > output1 <- nlm(fn, theta <- c(2,10), > solve(output1\$hessian) 2 # this checks out with estimate[1] # this also checks out w/ estimate[2] hessian=TRUE) # to compute the inverse of hessian # which is the approx. var-cor matrix
3. 3. [,1] [,2] [1,] 1.6209919201 3.028906e-04 [2,] 0.0003028906 3.680137e+01 > sqrt( diag(solve(output1\$hessian)) ) [1] 1.273182 6.066413 > 11.34694/7 [1] 1.620991 > sqrt(11.34694/7) [1] 1.273182 # st. dev. of mean checks out > optim( theta <- c(2,9), fn, hessian=TRUE) \$par [1] 1.713956 11.347966 # minimization, diff R function \$value [1] 12.00132 \$counts function gradient 45 NA \$convergence [1] 0 \$message NULL \$hessian [,1] [,2] [1,] 6.168506e-01 1.793543e-05 [2,] 1.793543e-05 2.717398e-02 Comment: We know long ago the variance of x can be estimated by s2 /n. ¯ (or replace s2 by the MLE of σ 2 ) (may be even this is news to you? then you need to review some basic stat). But how many of you know (or remember) the variance/standard deviation of the MLE of σ 2 (or s2 )? (by above calculation we know its standard 3
4. 4. deviation is approx. equal to 6.066413) How about the covariance between x and v? here it is approx. 0.0003028 ¯ (very small). Theory say they are independent, so the true covariance should equal to 0. Example of inverting the (Wilks) likelihood ratio test to get conﬁdence interval Suppose independent observations X1 , X2 , ..., Xn are from N (µ, σ 2 ) distribution (one parameter: σ). µ assumed known, for example µ = 2. The log likelihood function is − (Xi − µ)2 − 1/2 log 2π − 1/2 log σ 2 + log dXi 2σ 2 We know the log likelihood function is maximized when σ= (xi − µ)2 n This is the MLE of σ. The Wilks statistics is −2 log maxH0 lik = 2[log max Lik − log max Lik] H0 max lik In R software we ﬁrst store the data in a vector called xvec xvec <- c(2,5,3,7,-3,-2,0) # or some other numbers then deﬁne a function (which is negative of log lik) (and omit some constants) fn <- function(theta) { sum ( 0.5*(xvec - theta[1])^2/theta[2] + 0.5* log(theta[2]) ) } In R we can compute the Wilks statistics for testing H0 : σ = 1.5 vs Ha : σ = 1.5 as follows: assume we know µ = 2 then the MLE of σ is 4
5. 5. mleSigma <- sqrt( sum( (xvec - 2)^2 ) /length(xvec)) The Wilks statistics is WilksStat <- 2*( fn(c(2,1.5^2)) - fn(c(2,mleSigma^2)) ) The actual R session: > xvec <- c(2,5,3,7,-3,-2,0) > fn function(theta) { sum ( 0.5*(xvec-theta[1])^2/theta[2] + 0.5* log(theta[2]) ) } > mleSigma <- sqrt((sum((xvec - 2)^2))/length(xvec)) > mleSigma [1] 3.380617 > 2*( fn(c(2,1.5^2)) - fn(c(2,mleSigma^2)) ) [1] 17.17925 This is much larger then 3.84 ( = 5% signiﬁcance of a chi-square distribution), so we should reject the hypothesis of σ = 1.5. After some trial and error we ﬁnd > 2*( fn(c(2,2.1635^2)) - fn(c(2,mleSigma^2)) ) [1] 3.842709 > 2*( fn(c(2,6.37^2)) - fn(c(2,mleSigma^2)) ) [1] 3.841142 So the 95% conﬁdence interval for σ is (approximately) [2.1635, 6.37] We also see that the 95% conﬁdence Interval for σ 2 is [2.16352 , 6.372 ] sort of invariance property (for the conﬁdence interval). We point out that the conﬁdence interval from the Wald construction do not have invariance property. The Wald 95% conﬁdence interval for sigma is (using formula we derived in the midterm exam) 5
6. 6. 3.380617 +- 1.96*3.380617/sqrt(2*length(xvec)) = [1.609742, 5.151492] The Wald 95% conﬁdence interval for σ 2 is (homework) (3.380617)^2 +- 1.96* ... 6
7. 7. Deﬁne a function (the log lik of the multinomial distribution) > loglik <- function(x, p) { sum( x * log(p) ) } For the vector of observation x (integers) and probability proportion p (add up to one) We know the MLE of the p is just x/N where N is the total number of trials = sumxi . Therefore the −2[log lik(H0 ) − log lik(H0 + Ha )] is > -2*(loglik(c(3,5,8), c(0.2,0.3,0.5))-loglik(c(3,5,8),c(3/16,5/16,8/16))) [1] 0.02098882 > This is not signiﬁcant (not larger then 5.99) The cut oﬀ values are obtained as follows: > qchisq(0.95, df=1) [1] 3.841459 > qchisq(0.95, df=2) [1] 5.991465 > -2*(loglik(c(3,5,8),c(0.1,0.8,0.1))-lik(c(3,5,8),c(3/16,5/16,8/16))) [1] 20.12259 This is signiﬁcant, since it is larger then 5.99. Now use Pearson’s chi square: > chisq.test(x=c(3,5,8), p= c(0.2,0.3,0.5)) Chi-squared test for given probabilities data: c(3, 5, 8) X-squared = 0.0208, df = 2, p-value = 0.9896 Warning message: Chi-squared approximation may be incorrect in: chisq.test(x = c(3, 5, 8), p = c( 0.2, 0.3, 0.5)) 7
8. 8. > chisq.test(x=c( 3,5,8), p= c(0.1,0.8,0.1)) Chi-squared test for given probabilities data: c(3, 5, 8) X-squared = 31.5781, df = 2, p-value = 1.390e-07 Warning message: Chi-squared approximation may be incorrect in: chisq.test(x = c(3, 5, 8), p = c( 0.1, 0.8, 0.1)) 1 t-test and approximate Wilks test Use the same function we deﬁned before but now we always plug-in the MLE for the (nuisance parameter) σ 2 . As for the mean µ, we plug the MLE for one and plug the value speciﬁed in H0 in the other (numerator). > xvec <- c(2,5,3,7,-3,-2,0) > t.test(xvec, mu=1.2) One Sample t-test data: xvec t = 0.374, df = 6, p-value = 0.7213 alternative hypothesis: true mean is not equal to 1.2 95 percent confidence interval: -1.650691 5.079262 sample estimates: mean of x 1.714286 Now use Wilks likelihood ratio: > mleSigma <- sqrt((sum((xvec - mean(xvec) )^2))/length(xvec)) > mleSigma2 <- sqrt((sum((xvec - 1.2)^2))/length(xvec)) > 2*( fn(c(1.2,mleSigma2^2)) - fn(c(mean(xvec),mleSigma^2)) ) [1] 0.1612929 8
9. 9. > pchisq(0.1612929, df=1) [1] 0.3120310 P-value is 0.312031 9