1. The document discusses statistical estimation and properties of estimators such as bias, variance, consistency, and asymptotic normality.
2. Key concepts covered include unbiasedness, mean squared error, relative efficiency, sufficiency, and properties of estimators like consistency, asymptotic unbiasedness, and best asymptotic normality.
3. Examples are provided to illustrate theoretical estimators for parameters like the variance of a distribution or coefficients in a linear regression model.
Third-kind Chebyshev Polynomials Vr(x) in Collocation Methods of Solving Boun...IOSR Journals
This paper proposed the use of third-kind Chebyshev polynomials as trial functions in solving boundary value problems via collocation method. In applying this method, two different collocation points are considered, which are points at zeros of third-kind Chebyshev polynomials and equally-spaced points. These points yielded different results on each considered problem, thus possessing different level of accuracy. The
method is computational very simple and attractive. Applications are equally demonstrated through numerical examples to illustrate the efficiency and simplicity of the approach
Conformable Chebyshev differential equation of first kindIJECEIAES
In this paper, the Chebyshev-I conformable differential equation is considered. A proper power series is examined; there are two solutions, the even solution and the odd solution. The Rodrigues’ type formula is also allocated for the conformable Chebyshev-I polynomials.
Third-kind Chebyshev Polynomials Vr(x) in Collocation Methods of Solving Boun...IOSR Journals
This paper proposed the use of third-kind Chebyshev polynomials as trial functions in solving boundary value problems via collocation method. In applying this method, two different collocation points are considered, which are points at zeros of third-kind Chebyshev polynomials and equally-spaced points. These points yielded different results on each considered problem, thus possessing different level of accuracy. The
method is computational very simple and attractive. Applications are equally demonstrated through numerical examples to illustrate the efficiency and simplicity of the approach
Conformable Chebyshev differential equation of first kindIJECEIAES
In this paper, the Chebyshev-I conformable differential equation is considered. A proper power series is examined; there are two solutions, the even solution and the odd solution. The Rodrigues’ type formula is also allocated for the conformable Chebyshev-I polynomials.
An antiderivative of a function is a function whose derivative is the given function. The problem of antidifferentiation is interesting, complicated, and useful, especially when discussing motion.
This is the slideshow version from class.
I am Eugeny G. I am a Calculus Assignment Expert at mathsassignmenthelp.com. I hold a Master's in Mathematics from, Columbia University. I have been helping students with their assignments for the past 8 years. I solve assignments related to Calculus.
Visit mathsassignmenthelp.com or email info@mathsassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Calculus Assignment.
Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片Chyi-Tsong Chen
The slides of Chapter 1 of the book entitled "MATALB Applications in Chemical Engineering": Solution of a System of Linear Equations. Author: Prof. Chyi-Tsong Chen (陳奇中 教授); Center for General Education, National Quemoy University; Kinmen, Taiwan; E-mail: chyitsongchen@gmail.com.
Ebook purchase: https://play.google.com/store/books/details/MATLAB_Applications_in_Chemical_Engineering?id=kpxwEAAAQBAJ&hl=en_US&gl=US
I am Ben R. I am a Statistics Assignment Expert at statisticshomeworkhelper.com. I hold a Ph.D. in Statistics, from University of Denver, USA. I have been helping students with their homework for the past 5 years. I solve assignments related to Statistics.
Visit statisticshomeworkhelper.com or email info@statisticshomeworkhelper.com.
You can also call on +1 678 648 4277 for any assistance with Statistics Assignment.
Sabe ud que:
Como promedio por cada habitante que aumenta el planeta, se venden más de 21 celulares, más de 5 computadoras y más de 3 televisores, casi 30 pantallas.
An antiderivative of a function is a function whose derivative is the given function. The problem of antidifferentiation is interesting, complicated, and useful, especially when discussing motion.
This is the slideshow version from class.
I am Eugeny G. I am a Calculus Assignment Expert at mathsassignmenthelp.com. I hold a Master's in Mathematics from, Columbia University. I have been helping students with their assignments for the past 8 years. I solve assignments related to Calculus.
Visit mathsassignmenthelp.com or email info@mathsassignmenthelp.com.
You can also call on +1 678 648 4277 for any assistance with Calculus Assignment.
Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片Chyi-Tsong Chen
The slides of Chapter 1 of the book entitled "MATALB Applications in Chemical Engineering": Solution of a System of Linear Equations. Author: Prof. Chyi-Tsong Chen (陳奇中 教授); Center for General Education, National Quemoy University; Kinmen, Taiwan; E-mail: chyitsongchen@gmail.com.
Ebook purchase: https://play.google.com/store/books/details/MATLAB_Applications_in_Chemical_Engineering?id=kpxwEAAAQBAJ&hl=en_US&gl=US
I am Ben R. I am a Statistics Assignment Expert at statisticshomeworkhelper.com. I hold a Ph.D. in Statistics, from University of Denver, USA. I have been helping students with their homework for the past 5 years. I solve assignments related to Statistics.
Visit statisticshomeworkhelper.com or email info@statisticshomeworkhelper.com.
You can also call on +1 678 648 4277 for any assistance with Statistics Assignment.
Sabe ud que:
Como promedio por cada habitante que aumenta el planeta, se venden más de 21 celulares, más de 5 computadoras y más de 3 televisores, casi 30 pantallas.
I am Bella A. I am a Statistical Method In Economics Assignment Expert at economicshomeworkhelper.com/. I hold a Ph.D. in Economics. I have been helping students with their homework for the past 9 years. I solve assignments related to Economics Assignment.
Visit economicshomeworkhelper.com/ or email info@economicshomeworkhelper.com.
You can also call on +1 678 648 4277 for any assistance with Statistical Method In Economics Assignments.
The midpoint method or technique is a “measurement” and as each measurement it has a tolerance, but
worst of all it can be invalid, called Out-of-Control or OoC. The core of all midpoint methods is the accurate
measurement of the difference of the squared distances of two points to the “polar” of their midpoint
with respect to the conic. When this measurement is valid, it also measures the difference of the squared
distances of these points to the conic, although it may be inaccurate, called Out-of-Accuracy or OoA. The
primary condition is the necessary and sufficient condition that a measurement is valid. It is comletely
new and it can be checked ultra fast and before the actual measurement starts. .
Modeling an incremental algorithm, shows that the curve must be subdivided into “piecewise monotonic”
sections, the start point must be optimal, and it explains that the 2D-incremental method can find, locally,
the global Least Square Distance. Locally means that there are at most three candidate points for a given
monotonic direction; therefore the 2D-midpoint method has, locally, at most three measurements.
When all the possible measurements are invalid, the midpoint method cannot be applied, and in that case
the ultra fast “OoC-rule” selects the candidate point. This guarantees, for the first time, a 100% stable,
ultra-fast, berserkless midpoint algorithm, which can be easily transformed to hardware. The new algorithm
is on average (26.5±5)% faster than Mathematica, using the same resolution and tested using 42
different conics. Both programs are completely written in Mathematica and only ContourPlot[] has been
replaced with a module to generate the grid-points, drawn with Mathematica’s
Graphics[Line{gridpoints}] function.
This is the entrance exam paper for ISI MSQE Entrance Exam for the year 2008. Much more information on the ISI MSQE Entrance Exam and ISI MSQE Entrance preparation help available on http://crackdse.com
Minimum mean square error estimation and approximation of the Bayesian updateAlexander Litvinenko
We develop a Bayesian update surrogate. Our formula allows us to update polynomial chaos coefficients. In contrast to classical Bayesian approach, we suggest to update PCE coefficients. We show that classical Kalman filter is a particular case of our update.
A. Basic Calculus [15]A1. The function f (x) is de…ned as.docxannetnash8266
A. Basic Calculus [15%]
A1. The function f (x) is de…ned as
f (x) = exp
�
x3 � x
�
:
Show that by writing f (x) as f;
df
dx
=
�
3x2 � 1
�
f:
Use Leibnitz’formula to di¤erentiate this equation n times. Hence show that,
at x = 0;
f
(n+1)
0 = �f
(n)
0 ; if n = 1
f
(n+1)
0 = �f
(n)
0 + 3n (n� 1) f
(n�2)
0 ; if n > 1;
where f (n)0 denotes the n
th derivative of f; evaluated at x = 0:
A2. The integral In is de…ned, for positive integers n; as
In =
Z 1
0
�
1 + x2
��n
dx:
Using a reduction formula deduce that
In = 2n (In � In+1) :
Hence or otherwise show that
I4 =
Z 1
0
�
1 + x2
��4
dx =
5�
32
A3. If f is a di¤erentiable function of u and v and the variables (u; v) are related to
x and y by the formulae
u = xy; v = y � x;
show that
@f
@x
= y
@f
@u
� @f
@v
:
Determine the corresponding formula for
@f
@y
: Verify these formulae by direct
substitution in the special case when
f = u+ v2:
2
B. Linear Algebra [15%]
B1. Show that the linear system
2x+ y + z = �6�
2x+ y + (� + 1) z = 4
�x + 3y + 2z = 2�
has a unique solution except when � = 0 and � = 6:
If � = 0 show that there is only one value of � for which a solution exists, and
…nd the general solution in this case. Discuss the situation when � = 6: Hint:
In the augmented matrix swap the …rst two columns and the …rst two
rows before row reduction. Consider each case of � separately.
B2. Given that detA means ‘determinant of the matrix A’, solve the equation
det
[email protected]
x a a a
a x a a
a a x a
a a a x
1CCA = 0
B3. For the matrix A given by [email protected] 1 0 10 1 0
1 0 1
1A
…nd a matrix P such that D = P�1AP is diagonal and calculate the form of D:
3
C. Probability and Stochastic Calculus [30%]
C1. The Moment Generating Function (MGF) M� (X) for the random variable X
is de…ned by
M� (X) = E
�
e�x
�
=
Z
R
e�xp (x) dx
=
1X
n=0
�n
n!
E [xn]
where p (x) is a general probability density function.
Consider the probability density function p (x)
p (x) =
�
� exp (��x) x � 0
0 x < 0
where � (> 0) is a constant.
(a) Show that for this probability density function
E
�
e�x
�
=
�
1� �
�
��1
Hint: You may assume � > � in obtaining this result.
(b) By expanding
�
1� �
�
��1
as a Binomial series and equating with
1P
n=0
�n
n!
E [xn] ;
show that
E [xn] =
n!
�n
; n = 0; 1; 2; ::::
(c) Calculate the skew and kurtosis.
4
C2. Consider the di¤usion process for the state variable Ut which evolves according
to the process
dUt = ��Utdt+ �dXt; U (0) = � (1)
Both � and � are constants. dXt is an increment in Brownian motion.
(a) Show that a solution of (1) can be obtained by using an Integrating Factor
and Stochastic Integration to give
Ut = �e
��t + �
�
Xt �
Z t
0
exp (�� (t� s))Xsds
�
:
(b)Write (not derive) the forward Fokker-Planck equation for the steady state
transition probability density function p (U 0; t0) for this process, where a primed
variable refers to a future state/time.
By solving the Fokker-Planck equati.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
2. 1 Introduction
All models are wrong. Some models are useful. –George E.P. Box
1. Data Generating Process (DGP), the joint distribution of the data
f (z1; : : : zn; )
where zi in general are vector valued observations.
2. Theoretical (economic) model, being a simpli…cation, is di¤erent fromthe DGP.
3. The DGP is unknown.
4. Statistical model of the data.
(a) Provide a su¢ ciently good approximation to the DGP to make inference
valid.
(b) If the approximation is ”bad” and inference is invalid we say that the
model is misspeci…ed.
(c) There may be several ”valid”models, di¤ering in ”goodness”.
5. If the parameters of the theoretical model can be uniquely determined from
the parameters of the statistical model we say that the theoretical model is
identi…ed.
6. In many cases we are only interested in a subset of the variables, yi; and can
write the DGP as
f (z1; : : : zn; ) = f1 (y1; : : : ynjx1; : : : xn; 1) f2 (x1; : : : xn; 2) :
If xi is exogenous, f2 can be ignored and it is su¢ cient to model f1: Roughly
speaking this is the case when 2 does not contain any information about 1.
In what follows the DGP is assumed known and all these issues are
ignored!
2 Small sample properties of general estimators
(criteria)
De…nition 1 An estimator, b;of is a function of the data, b (Z1; : : : ;Zn) : As such
it is a random variable and has a sampling variability.
De…nition 2 An estimate of is the estimator evaluated at the current sample,
b (z1; : : : ; zn) :
1
3. De…nition 3 (Unbiased) An estimator b of is unbiased if E
b
= : b
b;
=
E
b
is the bias of b:
Example 1 Consider the estimator b2 = 1
n
Pn
i=1
Xi X
2 of 2 where the Xi are
uncorrelated, E (Xi) = and V ar (Xi) = 2: We have
Xi X
2
=
Xi + X
2
= (Xi )2 2 (Xi )
X
+
X
2
E
Xi X
2
= E (Xi )2 2E (Xi )
X
+ E
X
2
= 2 2
2
n
+
2
n
n 2 with b (b2; 2) = 2
and it is clear the E (b2) = n1
n :
De…nition 4 (MSE) The Mean Square Error (MSE) of an estimator, b;
is given
by MSE
b;
= E
b
2
:
Remark 1 Note that we have
E
b
2
= E
b E
b
+ E
b
2
= E
b E
b
2
+ 2E
b E
b
E
b
+ E
E
b
2
= V ar
b
+ 0 + b
b;
2
:
That is the MSE of an unbiased estimator is just the variance.
De…nition 5 (Relative e¢ ciency) Let b1 and b2 be two alternative estimators of
. Then ratio of the MSEs MSE
b1;
=MSE
b2;
is called the relative e¢ -
ciency of b1 with respect to b2:
De…nition 6 (UMVUE) An
estimator b is a uniformly minimum variance unbi-
ased estimator (UMVUE) if E
b
= and for any other unbiased estimator, ;
V ar
b
V ar () for all :
Example 2 Consider the class, b =
Pn
i=1 wiXi, of linear estimators of = E (Xi) ;
where 2 and the are uncorrelated. Unbiasedness clearly requires
P
V ar (Xi) = Xi that
wi = 1 and the variance is given by
V ar (b) = E
X
2
wi (Xi )
= E
X
i
X
j
wiwj (Xi ) (Xj )
= 2
X
w2
i
2
4. One unbiased estimator in this class is the familiar X which sets wi = 1=n and has
variance 2=n:We will show that this is the UMVUE in the class of linear estimators.
P
The …rst order condition for minimizing V ar (b) subject to the restriction
wi = 1
is
2wi =
for the Lagrange multiplier. That is, all the weights are equal, together with
P
wi = 1 this gives wi = 1=n:
Remark 2 The notion of minimizing the variance is suggestive. One can de…ne
a general class of estimators by requiring the estimator to minimize the sample
analogue of the variance
b = arg min n1
Xn
i=1
(Xi )2 ;
with FOC 2n1Pn
i=1 (Xi ) = 0 and solution b = 1
n
P
Xi: This is the class of
Least Squares estimators.
Example 3 Consider the linear regression model
yi =
9. ; b; is obtained byminimizing q = 0 = (y Xb)0 (y Xb).
The FOC is
@q
@b0
= 2 (y Xb)0X = 0
y0X = b0X0X
with solution
b = (X0X)1X0y
provided that X0X has full rank (so the inverse is well-de…ned), i.e. that X has rank
k:
Theorem 1 (Gauss-Markov) Assume that X is non-stochastic and E () = 0;
V ar () = 2I: Then V ar (b) = 2 (X0X)1 and b is the BLUE (Best Linear Un-
biased Estimator) of
10. : That is, b is the UMVUE in the class of linear estimators,
eb
= Ay:
Proof. Write
b = (X0X)1X0y =(X0X)1X0 (X
16. )0 = E
h
(X0X)1X00X(X0X)1
i
= (X0X)1X0E (0)X(X0X)1 = (X0X)1X02IX (X0X)1
= 2 (X0X)1 :
To prove that b is BLUE, let= Ay be an unbiased linear estimator of
17. : De…ning
C eb
= A(X0X)1X0 we have=
C + (X0X)1X0
y = Cy+b = CX
25. 0 = E
h
C+(X0X)1X0
i
0
h
C+(X0X)1X0
i
0
= 2CC0 + 2CX(X0X)1 + 2 (X0X)1X0C + 2 (X0X)1
= 2CC0 + 2 (X0X)1
and the variance of eb
exceeds the variance of b by the positive semi-de…nite matrix
2CC0: This implies that V ar
0eb
= V ar (0b) + 20CC0 V ar (0b) for any
linear combination :
De…nition 7 (Su¢ ciency) Let f (x; ) be the joint density of the data. T (x) is
said to be a su¢ cient statistic for if g (xjT) ; the density of x conditional on T
does not depend on :
Remark 3 A su¢ cient statistic T captures all the information about in the data.
This means that we can base estimators on T rather than the full sample.
Theorem 2 (Factorization theorem) Let X1; :::;Xn be a random sample from
f (x; ). Then T(x) is su¢ cient statistic for i¤
f (x; ) = g (x) f (T(x); )
where g does not depend on :
Example 4 Let Xi be iid Bernoulli with parameter p. T =
Pn
i=1 Xi is then a
su¢ cient statistic (i.e. the number of successes in n trials). The joint pdf is given
by
f (x;p) =
Yn
i=1
pxi (1 p)1xi = p
P
xi (1 p)n
P
xi = g (xjT) h (T; ) :
and we can put g (x) = 1 and f (T; p) = pT (1 p)nT with T =
Pn
i=1 Xi.
Remark 4 Note that su¢ cient statistics are not unique and may di¤er in how
P
good they are at reducing the data. In the previous example T2 = n
Xi and
T3 = (
P
Xi; n
P
Xi) are clearly su¢ cient statistics as well.
4
26. 3 Large sample properties of general estimators
(criteria)
De…nition 8 (Consistency) An estimator b of is consistent if b
p!
.
De…nition 9 (Asymptotically unbiased) An estimator b of is asymptotically
unbiased if n
b
d!
Z; for 0, and Z is a non-degenerate random variable
with E (Z) = 0:
Remark 5 The requirement n
b
d!
Z; 0 implies that b is a consistent
estimator. Typically = 1=2 and b is referred to as a pn-consistent estimator.
De…nition 10 (ARE) Let b1 and b2 be two estimators of such that pn
b1
d!
1 ()) and pn
N (0; 2
b2
d!
2 ()) ; the asymptotic relative e¢ ciency
N (0; 2
(ARE) of b1 relative to b2 is given by 2
1 () =2
2 () where 2
1 () = limn!1 nV ar(b1)
1 () = limn!1 nV ar(b2).
and 2
De…nition 11 (Best asymptotically normal (BAN)) b is said to be asymptot-
ically e¢ cient if
1. b
p!
for all 2 ;
2. pn
b
d!
N (0; 2 ()) ;
3. There is no other estimator, ; ful…lling 1) and 2) with 2 () 2 () :
Example 5 Consider again the liner regression model y = X
27. + if we add the
assumption that lim!1 n1X0X = Q or plim n1X0X = lim n1E (X0X) = Q for
X stochastic with Q a positive de…nite matrix we have that the OLS estimator b is
consistent. We prove this in the case for X being …xed. We have that
b =
29. +
X0X
n
1 X0
n
:
By assumption lim!1 n1X0X ! Q and
X0
n
=
P
xii
n
looks like something a law of large numbers could apply to. We have E (xii) = 0 and
V ar (X0) = E (X00X) = E (X02IX) = 2E (X0X) = 2X0X: This immediately
gives lim
n!1
V ar (X0=n) = lim
n!1
1
nn12X0X = lim
n!1
1
n lim
n!1
n12X0X = lim
n!1
1
n2Q =0
and plim n1X0 = 0 by theMarkov LLN. It follows that plim b =
33. If in addition E j0xiij2+ B 1 for 0 = 1 then b is also asymptotically
normal. We will use this to establish that the condition
lim
n!1
Pn
i=1 E j0xiij2+
2
Pn
(
i=1 V ar (0xii))2+ = 0
for the Liapunov theorem holds. Since the numerator is dominated by n2B2 and
P
P
lim
n1V ar (0xii) = 20 [lim n1xix0i] = 20Q 0 we have
n!1
lim
n!1
P
E j0xiij2+
2
P
(
V ar (0xii))2+ lim
n!1
n2B2
P
(
V ar (0xii))2+
= lim
n!1
nB2
(n1
P
V ar (0xii))2+ =
limn!1 nB2
(limn!1 n1
P
V ar (0xii))2+ = 0
We also have that limn!1pn (n ) = 0 trivially holds because n = E(0xii=n) =
0 holds for all i, and thus limn!1 n = 0 = . The Liapunov CLT now gives that
pn
P
(0xii=n) = pn0 (X0=n) d!
N (0; 20Q) : Applying the Cramér-Wold
device gives pn (X0=n) d!
N (0; 2Q) : Using Cramérs theorem then gives
pn (b
34. ) =
n1X0X
1 pn (X0=n) d!
N
0; 2Q1
since lim
n!1
(n1X0X)1 = Q1:
4 Maximum likelihood
De…nition 12 (Likelihood) The likelihood is the data density viewed as function
of the parameters,
L (; x) = f (x; ) :
The likelihood is a random variable since it depends on the data.
De…nition 13 (MLE) We de…ne the maximum likelihood estimator (MLE) as
b = arg max
2
L (; x)
where x =(x1; : : : ; xn) denotes the data, and xi and may be vectors.
Remark 6 Alternatively, the MLE can be de…ned as the solution to the FOC
@L (; x)
@
= 0:
This de…nition has two problems. The likelihood may have local maxima, i.e. there
are multiple solutions to the FOC and the derivative may not be well de…ned. De-
spite these shortcomings we will, for simplicity, rely on this de…ning the MLE for
much of what follows.
6
35. Example 6 Suppose that Xi; i = 1; : : : n; are iid U (0; ). We have
f (x) =
(
1
0 x
0 otherwise
and the likelihood is given by
L (; x) =
1
n I
X(n)
where X(n) is the nth order statistic, i.e. X(n) = max (X1; : : : ;Xn) : It is clear that
the FOC n
n+1 = 0 will not provide a sensible answer. On the other hand it is easily
seen, since L (; x) is decreasing in ; that the likelihood is maximized by b = X(n):
Remark 7 For independent data we can write the likelihood as
L (; x) = f (x1; : : : ; xn; ) =
Yn
i=1
fi (xi; )
and, conveniently, the log-likelihood as
ln L (; x) = l (; x) =
Xn
i=1
ln fi (xi; ) :
This decomposition turns out to be crucial in the derivation of many of the properties
of MLEs. For dependent data we can, somewhat less conveniently, write
L (; x) = f (x1; ) f (x2jx1; ) : : : f (xnjx1; : : : ; xn1; )
=
Yn
i=1
f (xijxj ; j i; )
l (x; ) =
Xn
i=1
ln f (xijxj ; j i; ) :
and the derivations below goes through with relatively small changes.
De…nition 14 (Score) The derivative of the log-likelihood
s (; x) =
@l (; x)
@
is referred to as the score vector.
Lemma 1 The score vector evaluated at the true parameter values, 0; has expec-
tation zero.
Proof. Since L (; x) is the density of the data we have
1 =
Z
L (0; x) dx:
7
36. Di¤erentiate both sides w.r.t.
0 =
@
@
Z
L (0; x) dx =
Z
@L (0; x)
@
dx
=
Z
1
L (0; x)
@L (0; x)
@
L (0; x) dx
=
Z
@l (0; x)
@
L (0; x) dx = E [s (0; x)]
De…nition 15 (Fisher Information) The information matrix is the variance-covariance
matrix of the score vector evaluated at the true parameter values 0;
I () = E
s (0; x) s (0; x)0
= E
@l (0; x)
@
@l (0; x)
@0
:
Remark 8 Note the use of the convention that the derivative w.r.t. the column
vector is a column vector and the derivative w.r.t. the row vector 0 is a row
vector.
Remark 9 The Fisher information is a measure of the information about we, on
average, can expect to …nd in a sample of given size.
Theorem 3 (Information matrix equality)
I () = E
@2l (0; x)
@@0
= E
@l (0; x)
@
@l (0; x)
@0
= V ar (s (0; x))
Proof. Write
0 =
Z
@l (0; x)
@
L (0; x) dx
and di¤erentiate both sides
0 =
Z
@l (0; x)
@
@L (0; x)
@0
dx+
Z
@2l (0; x)
@@0
L (0; x) dx
=
Z
@l (0; x)
@
@l (0; x)
@0
L (0; x) dx+
Z
@2l (0; x)
@@0
L (0; x) dx:
That is
E
@l (0; x)
@
@l (0; x)
@0
= E
@2l (0; x)
@@0
Remark 10 For iid data we can write the information as
I () = nE
@2 ln f (xi; 0)
@@0
= nE
@ ln f (xi; 0)
@
@ ln f (xi; 0)
@0
:
8
37. Condition 1 We have assumed that @
@
R
L (0; x) dx =
R @L(0;x)
@ dx holds. This is
not necessarily the case. Roughly speaking, the requirement for this to hold is that
the distribution isn’t too fat tailed and that the domain of x does not depend on :
Su¢ cient conditions for this and the Cramér-Rao theorem below (theorem 5) is that
1. The parameter space ; 2 is an open rectangle or we can restrict the
parameter space to an open rectangle.
2. The domain of x does not depend on :
3. The score vector s has …nite expectation and variance 82 :
Example 7 Example 6 continued. With the Uniform likelihood we have l (x; ) =
n ln () and
@l (x; )
@
=
n
@2l (x; )
@2 =
n
2
and it is clear that both the information matrix equality and the lemma fails to
hold. This should not be surprising since the domain of Xi depends on :
Example 8 Suppose that Xi NID(; 2) ; f (x) = 1 p22 e(x)2=22 with likeli-
hood
L
; 2; x
=
22
n=2
exp
Xn
i=1
(xi )2 =22
!
l
; 2; x
=
n
2
ln 2
n
2
ln 2
1
22
Xn
i=1
(xi )2
with
@l
@
=
Pn
i=1 (xi )
2
@l
@2 =
n
22 +
1
24
Xn
i=1
(xi )2
yielding the familiar estimates b = x; b2 = 1
n
Pn
i=1 (xi x)2 :
It is easily veri…ed that E
@l
@
= E
@l
@2
= 0: Furthermore
E
@l
@
2
= E
Pn
i=1 (xi )
2
2
=
1
4E
Xn
i=1
Xn
j=1
#
=
(xi ) (xj )
n2
4 =
n
2
9
38. E
@l
@2
2
= E
n
22 +
1
24
Xn
i=1
(xi )2
#2
= E
n2
44
n
26
Xn
i=1
(xi )2 +
1
48
Xn
i=1
Xn
j=1
(xi )2 (xj )2
#
,
E (xi )2 (xj )2 =
(
34 i = j
4 i6= j
,
by independence
=
n2
44
n2
24 +
[3n + n (n 1)] 4
48 =
n
24
E
@l
@
@l
@2
= E
1
2
Xn
i=1
!
(xi )
n
22 +
1
24
Xn
i=1
(xi )2
!
= E
n
24
Xn
i=1
(xi ) +
1
26
Xn
i=1
Xn
j=1
(xi ) (xj )2
!
= 0
and the information matrix is given by
I
; 2
=
n
2 0
0 n
24
!
:
To verify that the information matrix equality holds we evaluate
E
@2l
@2
= E
Pn
i=1 1
2
=
n
2
E
@2l
@ (2)2
= E
n
24
1
6
Xn
i=1
(xi )2
!
=
n
24
n2
6 =
n
24
E
@2l
@@2
= E
1
4
Xn
i=1
!
(xi )
= 0
and it is clear that E
@l
@
@l
@0
= E
@2l
@@0
holds.
5 Small sample optimality results
Remark 11 Maximum likelihood estimators are functions of su¢ cient statistics
rather than the full sample. To see this, note that if T is a su¢ cient statistic we
can write the likelihood as (recall the Factorization theorem)
L (x; ) = g (x) f (T; ) =) l(x; ) =ln g (x) + ln f (T; )
where g (x) is a function of the data only and f (T; ) is the marginal density of
T: Maximizing lnf (T; ) w.r.t. will obviously give the same result as maximizing
l (x; ) :
10
39. Theorem 4 (Rao-Blackwell) Let the density of the data be indexed by the para-
meter , T be a su¢ cient statistic for and t (x) be an unbiased estimator of u () :
De…ne the new estimator b = E (t (x) jT) : then
1. b is unbiased estimator of u ()
2. V ar
b
V ar (t) :
Proof. We must …rst establish that b can be used as an estimator, i.e. that it
does not depend on and can be computed from the sample. To see this note that
t (x) is a function of the sample and since T is R
a su¢ cient statistic g (xjT) does not
depend on : Consequently b = E (t (x) jT) =
t (x) g (xjT) dx is independent of :
To show part 1 note that E
b
= E [E (t (x) jT)] = E (t (x)) = u () by the law
of iterated expectations. For part 2 we have from theorem 5.6 in Ramanathan that
V ar (X) = E [V ar (XjY )] + V ar [E (XjY )] ; setting t = X and b = E (XjY ) it is
clear that part 2 must hold.
Remark 12 Rao-Blackwellization provides a general way of obtaining a reasonable
estimator. Find an unbiased estimator (which by no means has to bee a good
estimator) and a su¢ cient statistic and construct the new estimator using the Rao-
Blackwell theorem. In some cases this will even be an optimal estimator in the sense
that it is an UMVUE.
Example 9 Consider again the case with iid Bernoulli data with parameter p: Sup-
pose we take t (X) = X1: Clearly this is an unbiased estimator of P
p; E (X1) = p
and V ar (X1) = p (1 p) : The su¢ cient statistic is T =
Xi: Calculating bp =
E (X1jT) is a combinatorial problem, there are in total n!= [T! (n T)!] equally
likely permutations of the T ones and n T zeros given T: Of these there are
(n 1)!= [(T 1)! (n T)!] permutations where X1 = 1: This gives
P (X1 = 1jT) =
(n 1)!T!
n! (T 1)!
=
T
n
and bp = T=n with E (bp) = E (T) =n = p and V ar (bp) = V ar (T) =n2 = p (1 p) =n:
De…nition 16 (Exponential family) A distribution characterized by a kdimensional
parameter vector is said to belong to the exponential family if its density or prob-
ability function can be written on the form
f (x) = C () exp
Xk
i=1
#
h (x) :
qi () Ti (x)
Remark 13 It follows from the factorization theorem that (T1; : : : Tk) are su¢ cient
statistics for :
11
40. Remark 14 The exponential family is a large class of distributions, containing
among others the binomial, normal, geometric, exponential and Poisson distribu-
tions.
Example 10 Consider the randomvariable X with the normal pdf f(x) = (22)0:5
e0:5(x)2=2 . To deduce that his pdf belongs to the exponential family …rst note
that =(; 2)0 and write
(22)0:5e0:5(x)2=2
=
e0:52=2
p22
22 +x
22
ex2 1
= C () eq1()T1(x)+q2()T2(x)h (x)
where C () = e0:52=2
22 , T1(x) = x2, q2 () =
22 , T2(x) = x, and
p22 , q1 () = 1
h (x) = 1.
In many cases it is not possible to establish the existence of an UMVUE. In those
cases it is of interest to know how good the estimator at hand is. Is it worth the
e¤ort to try to …nd a better estimator? To answer this question we need to know
how far o¤ we are from the best possible case.
Theorem 5 (Cramér-Rao) Let b be an unbiased estimator of the k-dimensional
parameter vector and suppose that the regularity conditions 1 hold. Then V ar
b
I1 () is a positive semi de…nite matrix and we write V ar
b
I1 () :
Proof. We have = E
b
=
R bL (; x) dx and di¤erentiate both sides w.r.t.
:
@
@0
= I =
Z
b
@L (; x)
@0
dx =
Z
b
1
L (; x)
@L (; x)
@0
L (; x) dx
=
Z
b
@l (; x)
@0
L (; x) dx =
Z
bs (; x)0 L (; x) dx
= Cov
b; s
since E (s) = 0 where s is the score vector. The variance of
b0; s0
0 is then
V ar
b
s
!
=
V ar
b
I
I I ()
!
:
Note that any variance matrix is positive
semi-de…nite and hence the variance of
the linear combination [I; I1 ()]
b0; s0
0 is positive semi-de…nite. This variance
12
41. is given by
I I1 ()
V ar
b
I
I I ()
!
I
I1 ()
!
= V ar
b
I1 () 0
which establishes the result.
Remark 15 The inverse information matrix I1 () provides a lower bound for the
variance of an unbiased estimator and is referred to as the Cramér-Rao lower bound.
Remark 16 In the scalar parameter case the Cramér-Rao lower bound reduces to
V ar
b
I1 () 1:
Remark 17 The notation V ar
b
I1 () is justi…ed in the vector valued para-
meter case by noting that a0
h
V ar
b
I1 ()
i
a 0 or a0V ar
b
aa0I1 () a
0 for an arbitrary vector a when V ar
b
I1 () is positive semi-de…nite. That is,
there is no linear combination a0b of any unbiased estimator b with smaller variance
than a0I1 () a.
Remark 18 There is no guarantee that there is an unbiased estimator that attains
the Cramér-Rao lower bound.
Example 11 The information for the parameters (; 2) with iid normal data was
obtained in example 8 as
I
; 2
=
n
2 0
0 n
24
!
and the Cramér-Rao lower bound is given by
I1
; 2
=
2
n 0
0 24
n
!
:
It is clear that x attains the lower bound but s2 = 1
n1
Pn
i=1 (xi x)2 does not
because V (s2) = 24
n1 which follows from noting that
Pn
i=1
Xi X
2
2 2 (n 1) :
Clearly V ar (s2) is greater than the Cramér-Rao lower bound for any …nite n:
Theorem 6 Suppose that t is an unbiased estimator of that attains the Cramér-
Rao lower bound. Then t is the MLE of :
13
42. Proof. From
the proof of the Cramér-Rao theorem we have that V ar (t)
I1 () = V ar
(I; I1 ()) (t0; s0)0
if t is an unbiased estimator. By assumption
V ar (t) I1 () = 0 and (I; I1 ()) (t0; s0)0 must be constant and there is an
exact linear relation between t and s. Since t is unbiased the linear relation has the
form t = A() s (; x)+ or s (; x) = A1 () (t ) : Setting the score to zero we
obtain the MLE as b = t:
Remark 19 This is a rather strong optimality result for MLEs but it should not
be taken to imply that the MLE always is unbiased or that it always attains the
Cramér-Rao lower bound. In particular it does not imply that a MLE is UMVUE.
Example 12 Consider again the case of iid normal data. The MLE of 2 is b2 P=
1
n
2 (biased) and V ar (b2) = 24(n1)
:
n
n n2 i=1 (xi x)2 with E (b2) = n1
6 Large sample optimality results
Theorem 7 (Consistency of MLE) Subject to the regularity conditions 1 the MLE
bn is consistent, bn
p!
0; the true parameter value.
Theorem 8 (Asymptotic normality of MLE) Let
1 = lim 1
nI () : If the reg-
ularity conditions 1 hold and if in addition the statistical model is identi…ed and
l (; x) is twice continuously di¤erentiable then the asymptotic distribution of the
MLE, b; is normal,
pn
bn 0
d!
N (0;
) :
Proof. We will again, for simplicity, assume that the data is iid. Note that this
implies that
I () = nE
@ ln f (Xi; 0)
@
@ ln f (Xi; 0)
@0
= nE
@2 ln f (Xi; 0)
@@0
:
That is
1=E
@ ln f (Xi; 0)
@
@ ln f (Xi; 0)
@0
= V ar
@ ln f (Xi; 0)
@
in this case. By the mean value theorem we can write, for some value between 0
and bn;
sn (0; x) = sn
bn; x
+
@sn (; x)
@0
0 bn
=
@sn (; x)
@0
0 bn
14
43. since the MLE bn sets the score to zero. Alternatively we can write this as
0 bn
=
@sn (; x)
@0
1
sn (0; x)
provided that @sn(;x)
@0 has full rank. Since
sn (0; x) =
Xn
i=1
@ ln f (Xi; 0)
@
where f (Xi; 0) and @ ln f(Xi;0)
@ are iid random variables, we have by the (multivari-
ate) Lindeberg-Lévy CLT that
1
pn
sn (0; x) d!
N
0;
1
Secondly
@sn (0; x)
@0
=
Xn
i=1
@2 ln f (Xi; 0)
@@0
;
is a sum of iid random matrices and
1
n
@sn (0; x)
@0
p!
1
by the Kinchine WLLN. In addition, bn
p!
0 implies p!
0 and
1
n
@sn (; x)
@0
p!
1
by the Slutsky theorem. Note that this implies
1
n
@sn(;x)
@0
p!
I: Next, write
1
n
@sn (; x)
@0
pn
0 bn
=
1
pn
sn (0; x) :
Since
1
n
@sn(;x)
@0
p!
I we have that
pn
0 bn
d!
1
pn
sn (0; x)
d!
N (0;
)
which establishes the result.
Remark 20 The variance of the limiting distribution for the MLE is the inverse
of the limit of the average information. That is, asymptotically MLE attains the
Cramér-Rao lower bound. This implies that MLE is Best Asymptotic Normal, i.e.
there is no other asymptotically normal estimator whose limiting distribution has a
smaller variance. This provides a strong rationale for the use of maximum likelihood.
Remark 21 Note the crucial role that the Information matrix equality plays in
giving us a simple form for the variance of the limiting distribution.
15
44. Example 13 For normal data, Xi iid N (; 2) ; the information matrix is given by
I
; 2
=
n
2 0
0 n
24
!
:
It follows that
pn
b
b2
!
2
!#
d!
N (0;
)
for
= lim nI1
; 2
=
2 0
0 24
!
:
From exercise 5 in the asymptotics lecture notes we deduce that pn (b2n
2) d!
N (0; ( 1) 4) where = E (Xi )4 =4 = 3 for normal data.
Example 14 Suppose that Xi; i = 1; : : : ; n; is iid Bernoulli with parameter p: The
loglikelihood is
l (p; x) = T ln p + (n T) ln (1 p)
for T =
Pn
i=1 xi: The score is
@l (p; x)
@p
=
T
p
n T
1 p
:
Setting the score to zero and solving for p gives the MLE as bp = T
n : We obtain the
Fisher information as
I (p) = E
@2l (p; x)
@p2
= E
T
p2 +
n T
(1 p)2
=
np
p2 +
n (1 p)
(1 p)2
=
n
p
+
n
1 p
=
n
p (1 p)
:
Since the regularity conditions hold it follows that bp is consistent and that
pn (bp p) d!N (0; p (1 p)) :
The results are easily veri…ed by applying a suitable LLN and CLT to bp =
Pn
i=1 xi=n:
A common rule of thumb for when the asymptotic distribution provides a good
approximation to the exact …nite sample distribution is that np (1 p) 9: Noting
that T Bin (n; p)
7 When the form of the likelihood is unknown
(optional)
1. It generally is unknown.
2. We can’t expect to get exact small sample results.
16
45. (a) Must rely on asymptotic results.
(b) In special cases we may be able to obtain the small sample bias and
variance of the estimator.
3. Maximum likelihood is out of the question.
4. Maximize the wrong likelihood, on purpose or out of ignorance. Quasi Max-
imum Likelihood (QML).The QMLE can, under more restrictive conditions
than above, be shown to be consistent and asymptotically normal. The major
di¤erence is that the Information matrix equality doesn’t hold for QMLE and
we get
pn
bQML 0
d!
N
0;A1BA1
for
A = plim
1
n
@sn (0; x)
@0
B = plim
1
n
sn (0; x) sn (0; x)0 :
5. Estimators that doesn’t rely on the likelihood.
(a) Least squares.
(b) Generalized Method of Moments (GMM).
GMM speci…es a set of k moment conditions E [gn (0; x)] = 0; where is
a k-dimensional parameter vector and minimizes gn (; x)0 gn (; x) :It is
possible to show, under more restrictive conditions than above, that the
GMME is consistent and asymptotically normal,
pn
bGMM 0
d!
N (0;V)
where V1 = lim 1
nV ar (gn (0; x)) :
Remark 22 We know that the MLE attains the Cramér-Rao lower bound asymp-
totically and it should be clear that we in general su¤er from a loss in e¢ ciency by
using other estimators than MLE.
Remark 23 Note that Least Squares and ML are special cases of GMM. This
is seen by setting the FOCs of LS or ML as the GMM moment conditions, e.g.
E [sn (0; x)] = 0 for ML.
17
46. 8 Worked exercises
8.1 Exercises
1. Exercise 8.1 (b)-(e) in Ramanathan.
2. Exercise 8.2 in Ramanathan.
3. Exercise 8.9 (a)-(c) in Ramanathan. In addition, obtain E (b) and V ar (b)
where b is the MLE of :
4. Consider the regression model y = x+
48. are scalars. In addition we are told that the iare iid, the xi are iid
with E (x) = c, xi and i are independent of each other, x0x
n
p!
c = E (x2i
)6= 0
and x0z
n
p!
d6= 0.
(a) Suppose
56. ) d!
N(0; 1_)
.
De…ne the estimator
e =
x0(y e
57. z)
x0x
and obtain the limiting distribution of e.
(c) Are b and e consistent estimators of ?
8.2 Solutions
1. f (x; ) = kx a discrete geometric distribution, i.e. k = 1 :
b) We have
L (x;) =
Yn
i=1
(1 ) xi = (1 )n
Pn
i=1 xi
and it is clear from the factorization theorem that
Pn
i=1 xi and x =
1
n
Pn
i=1 xi are su¢ cient statistics.
(a) We have
@l (x; )
@
=
n
1
+
Pn
i=1 xi
@2l (x; )
@2 =
n
(1 )2
Pn
i=1 xi
2 :
18
58. Since E (xi) =
1 we have
I () = E
@2l (x; )
@2 =
n
(1 )2 +
n
(1 ) 2
=
n
(1 )2
:
It is easy to verify that the outer product of the score form of the infor-
mation matrix gives the same result,
E
@l (x; )
@
2
= E
n
1
+
Pn
i=1 xi
2
=
n2
(1 )2
2nE
Pn
i=1 xi
(1 )
+
Pn
E (
i=1 xi)2
2
/independence/ =
n2
(1 )2
2n2
(1 )2 +
E
Pn
i=1 x2i
2 +
E
Pn
i=1
P
j6=i xixj
2
=
n2
(1 )2 +
n
(1)2 + 2
(1)2
2 +
n (n 1) 2
(1)2
2
=
n2
(1 )2 +
n
(1 )2 +
n2
(1 2)
=
n
(1 )2
(b) Setting the score to zero we have
@l (x; )
@
=
n
1
+
Pn
i=1 xi
= 0
Pn
i=1 xi
n
=
1
with the solution
b =
x
1 + x
:
(c) Since xi are iid we have x
p!
1 by the Kinchine WLLN. It
E (xi) =
follows from the Slutsky theorem that b = g (x) = x
1+x
p!
g
1
= :
2. f(x; ) = x1 for 0 x 1 and 0.
(a)
R 1
0 x1dx = 1
x
59.
60. 1
: It follows that
0 = 1
R 1
0 xdx = 1
+1 and hence that
E(x) =
R 1
0 xx1dx =
+1:
R 1
0 lnxe(1)lnxdx = R 1
0 lnxx1dx: It follows that E(lnx) =
(b) 1
2 = @
@
1
= @
@
R 1
0 x1dx =
R 1
0
@
@ e(1)lnxdx =
R 1
0 lnxx1dx = 1
:
(c)
2
3 =
@
@2
1
=
@
@2
Z 1
0
x1dx =
Z 1
0
@
@
lnxe(1)lnxdx
=
Z 1
0
(lnx)2e(1)lnxdx =
Z 1
0
(lnx)2 x1dx:
19
61. Which gives
E(lnx)2 =
Z 1
0
(lnx)2 x1dx =
2
2 :V ar(lnx) = E (lnx E (lnx))2
= E
(lnx)2 2lnxE(lnx) + [E(lnx)]2
= E(lnx)2 [E(lnx)]2
=
2
2
1
2 =
1
2 :
(d) We have the random sample x1; :::; xn: Independence gives the joint den-
sity as
f(x1; :::; xn; ) =
Yn
i=1
f(xi; ) = n
Yn
i=1
xi
!1
:
Qn
The likelihood is thus L(; x1; :::; xn) = n (
i=1 xi)1 : It follows from
the factorization theorem (8.1) that T1 =
Qn
i=1 xi is a su¢ cient statistic
since we can factorize the likelihood into the function h(T1; ) = nT1
1 ,
depending only on and T1, and the function g() = 1, which does not
depend on P and T1: The factorization theorem is if and only if, that is
T2 =
n
i=1 xi is a su¢ cient statistic only if we can factorize the likelihood
correspondingly for T2: Inspection of the likelihood function shows that
this is impossible and consequently that T2 is not a su¢ cient statistic for
P. T3 =
n
i=1 lnxi, on the other hand, is a su¢ cient statistic.
(e) lnL(; x) = nln() + ( 1)
Pn
i=1 lnxi and
@lnL
@
=
n
+
Xn
i=1
lnxi:
Setting the derivative to zero yields b = P n n
i=1 lnxi
: To verify that this is
a maximum we need to show that the second derivative is negative at b:
We have @2lnL
2 and @2lnL
@2
@2 = n
67. = 1=b =
Pn
i=1 lnxi
. It is easy to establish that this guess is correct for
68. = g()
n when g is a monotone function (it holds for non-monotone functions
as well, but is trickier to show). By monotonicity the inverse func-
tion = g1(
113. =
lnx E
lnx
q
V ar
lnx
:
We then have Zn
d!
N(0; 1) since the lnxi are independent and V ar(lnxi) =
114. 2 1 and thus ful…lls the conditions of the Lindeberg-Lévy CLT.
Comment: we have (for this estimator) veri…ed the claim that ML esti-
mators are asymptotically normally distributed. To see that the result
is in accordance with theorem 8.12 we write Zn =
q
179. 2:
Comment: The mean and variance we obtained shouldn’t be too surpris-
ing. The distribution of x is an exponential distribution with a shift in
the location. That is if y is exponentially distributed with parameter
180. ;
then x is obtained as x = y + :
(b) The likelihood is given by L =
Qn
i=1
1
212. Pn
The expectation E (
i=1 (xi ))2 =
Pn
i=1
Pn
j=1 E [(xi ) (xj )] is
a little bit tricky. For i6= j we have independence and E [(xi ) (xj )] =
E (xi )E (xj ) =
213. 2 and there are n (n 1) terms with i6= j: This
leaves n terms with i = j where we have E (xi )2 = 2
214. 2:
Comment: The reason for using the score form of the information hmatrix
is that the information matrix equality E (SS0) = E
@lnL
@
@lnL
@
0
i
=
E
@2lnL
@@0
for = (;
215. )0 doesn’t hold for this likelihood. When estab-
lishing that E
h@lnL
@
@lnL
@
0
i
= E
@2lnL
@@0
we needed to interchange
the order of integration and di¤erentiation. That is we needed, for exam-
ple, that @2
@2
R
1
L(;
219. as
1
n
Pn
i=1 (xi ), provided is known. S1 is obviously of little use for ob-
taining the MLE of . Instead we need to look at the likelihood function
itself, writing this as L = 1
222. it is clear that the likelihood
is an increasing function of : On the other hand we have the condition
xi ; that is, the likelihood of observing a value of x smaller than is
zero. The value of maximizing the likelihood is thus the smallest value
of xi in the sample or the …rst order statistic, denote this by x(1): We
have T1 = b = x(1) and T2 = b
223. = 1
n
Pn
i=1
xi x(1)
:
extra. From p. 137 in Ramanathan we get the density of the …rst order statistic
as
fx(1)(x) = n [1 Fx(x)]n1 fx(x):
We obtain the distribution function of x as Fx(x) =
R x
1
238. z)
x0x
=
x0 (x + )
x0x
=
x0x + x0
x0x
= +
x0
x0x
where x0x
n
p!
c: In addition, 1
nx0 =1
n
Pn
i=1 xii; a sample average which a
CLT might apply to. By assumption we have E (xii) = E (xi)E (i) = 0
23
239. and V ar (xii) = E (x2i
2i
) = E (x2i
) 2 = 2c 1: Since xi and i are iid,
xii is iid as well and the conditions for the Lindeberg-Lévy CLT holds.
That is,
1
pn
x0 d!
N
0; 2c
:
Write
pn (b ) =
n1=2x0
x0x
n
and it follows that
pn (b ) d!
N
0; 2=c
:
(b) We have
e =
x0
ye
250. x0z=n
x0x=n
where the …rst
term converges in distribution to N (0; 2=c) and the sec-
ond term to N
0;
c
d
2
since plim
x0z=n
x0x=n
= c=d: Note that these limit-
c and pn
ing distributions are the same as for pnx0
252. c
d and hence
does not depend on x and z. By independence of and e
253. it follows
that pn (e ) converges in distribution to the sum of two independent
normal random variables. That is,
pn (e ) d!
N
0;
2
c
+
c2
d2
:
(c) In both cases we have convergence in distribution when scaling by pn: It
follows from corollary 2 in the Asymptotics lecture notes that (b )
p!
0 and (e )
p!
0 and the estimators are consistent.
24