SlideShare a Scribd company logo
1 of 25
Download to read offline
Lecture notes Statistics 
Estimation 
Rickard Sandberg e-mail:rickard.sandberg@hhs.se 
January 22, 2010
1 Introduction 
All models are wrong. Some models are useful. –George E.P. Box 
1. Data Generating Process (DGP), the joint distribution of the data 
f (z1; : : : zn; ) 
where zi in general are vector valued observations. 
2. Theoretical (economic) model, being a simpli…cation, is di¤erent fromthe DGP. 
3. The DGP is unknown. 
4. Statistical model of the data. 
(a) Provide a su¢ ciently good approximation to the DGP to make inference 
valid. 
(b) If the approximation is ”bad” and inference is invalid we say that the 
model is misspeci…ed. 
(c) There may be several ”valid”models, di¤ering in ”goodness”. 
5. If the parameters of the theoretical model can be uniquely determined from 
the parameters of the statistical model we say that the theoretical model is 
identi…ed. 
6. In many cases we are only interested in a subset of the variables, yi; and can 
write the DGP as 
f (z1; : : : zn; ) = f1 (y1; : : : ynjx1; : : : xn; 1) f2 (x1; : : : xn; 2) : 
If xi is exogenous, f2 can be ignored and it is su¢ cient to model f1: Roughly 
speaking this is the case when 2 does not contain any information about 1. 
In what follows the DGP is assumed known and all these issues are 
ignored! 
2 Small sample properties of general estimators 
(criteria) 
De…nition 1 An estimator, b;of  is a function of the data, b (Z1; : : : ;Zn) : As such 
it is a random variable and has a sampling variability. 
De…nition 2 An estimate of  is the estimator evaluated at the current sample, 
b (z1; : : : ; zn) : 
1
De…nition 3 (Unbiased) An estimator b of  is unbiased if E 
 
b 
 
= : b 
 
b;  
 
= 
E 
 
b 
 
  is the bias of b: 
Example 1 Consider the estimator b2 = 1 
n 
Pn 
i=1 
 
Xi  X 
2 of 2 where the Xi are 
uncorrelated, E (Xi) =  and V ar (Xi) = 2: We have 
 
Xi  X 
2 
= 
 
Xi   +   X 
2 
= (Xi  )2  2 (Xi  ) 
 
X   
 
+ 
 
  X 
2 
E 
 
Xi  X 
2 
= E (Xi  )2  2E (Xi  ) 
 
X   
 
+ E 
 
  X 
2 
= 2  2 
2 
n 
+ 
2 
n 
n 2 with b (b2; 2) = 2 
and it is clear the E (b2) = n1 
n : 
De…nition 4 (MSE) The Mean Square Error (MSE) of an estimator, b;  
 
is given 
by MSE 
b;  
= E 
 
b   
2 
: 
Remark 1 Note that we have 
E 
 
b   
2 
= E 
 
b  E 
 
b 
 
+ E 
 
b 
 
  
2 
= E 
 
b  E 
 
b 
2 
+ 2E 
 
b  E 
 
b 
  
E 
 
b 
 
  
 
+ E 
 
E 
 
b 
 
  
2 
= V ar 
 
b 
 
+ 0 + b 
 
b;  
2 
: 
That is the MSE of an unbiased estimator is just the variance. 
De…nition 5 (Relative e¢ ciency) Let b1 and b2 be two alternative estimators of 
. Then ratio of the MSEs MSE 
 
b1;  
 
=MSE 
 
b2;  
 
is called the relative e¢ - 
ciency of b1 with respect to b2: 
De…nition 6 (UMVUE) An  
estimator b is a uniformly minimum variance unbi- 
ased estimator (UMVUE) if E 
b 
 
=  and for any other unbiased estimator, ; 
V ar 
 
b 
 
 V ar () for all : 
Example 2 Consider the class, b = 
Pn 
i=1 wiXi, of linear estimators of  = E (Xi) ; 
where 2 and the are uncorrelated. Unbiasedness clearly requires 
P 
V ar (Xi) = Xi that 
wi = 1 and the variance is given by 
V ar (b) = E 
X 
2 
wi (Xi  ) 
= E 
X 
i 
X 
j 
wiwj (Xi  ) (Xj  ) 
= 2 
X 
w2 
i 
2
One unbiased estimator in this class is the familiar X which sets wi = 1=n and has 
variance 2=n:We will show that this is the UMVUE in the class of linear estimators. 
P 
The …rst order condition for minimizing V ar (b) subject to the restriction 
wi = 1 
is 
2wi =  
for  the Lagrange multiplier. That is, all the weights are equal, together with 
P 
wi = 1 this gives wi = 1=n: 
Remark 2 The notion of minimizing the variance is suggestive. One can de…ne 
a general class of estimators by requiring the estimator to minimize the sample 
analogue of the variance 
b = arg min n1 
Xn 
i=1 
(Xi  )2 ; 
with FOC 2n1Pn 
i=1 (Xi  ) = 0 and solution b = 1 
n 
P 
Xi: This is the class of 
Least Squares estimators. 
Example 3 Consider the linear regression model 
yi =
1 + x2i
2 + : : : + xki
k + i 
or in matrix notation 
y = X
+ : 
The least squares estimator of
; b; is obtained byminimizing q = 0 = (y  Xb)0 (y  Xb). 
The FOC is 
@q 
@b0 
= 2 (y  Xb)0X = 0 
y0X = b0X0X 
with solution 
b = (X0X)1X0y 
provided that X0X has full rank (so the inverse is well-de…ned), i.e. that X has rank 
k: 
Theorem 1 (Gauss-Markov) Assume that X is non-stochastic and E () = 0; 
V ar () = 2I: Then V ar (b) = 2 (X0X)1 and b is the BLUE (Best Linear Un- 
biased Estimator) of
: That is, b is the UMVUE in the class of linear estimators, 
eb 
= Ay: 
Proof. Write 
b = (X0X)1X0y =(X0X)1X0 (X
+ ) =
+ (X0X)1X0: 
3
This immediately gives E (b) =
and 
V ar (b) = E (b
) (b
)0 = E 
h 
(X0X)1X00X(X0X)1 
i 
= (X0X)1X0E (0)X(X0X)1 = (X0X)1X02IX (X0X)1 
= 2 (X0X)1 : 
To prove that b is BLUE, let= Ay be an unbiased linear estimator of
: De…ning 
C eb 
 
 
= A(X0X)1X0 we have= 
C + (X0X)1X0 
y = Cy+b = CX
+C +
+ 
eb 
(X0X)1X0: Clearly 
E 
 
eb 
 
= CX
+ CE () +
= CX
+
and unbiasedness implies that CX = 0: The variance is then 
V ar 
 
eb 
 
= E 
 
eb
eb
0 = E 
h 
C+(X0X)1X0 
i 
0 
h 
C+(X0X)1X0 
i 
0 
= 2CC0 + 2CX(X0X)1 + 2 (X0X)1X0C + 2 (X0X)1 
= 2CC0 + 2 (X0X)1 
and the variance of eb 
exceeds the variance of b by the positive semi-de…nite matrix 
2CC0: This implies that V ar 
 
0eb 
 
= V ar (0b) + 20CC0  V ar (0b) for any 
linear combination : 
De…nition 7 (Su¢ ciency) Let f (x; ) be the joint density of the data. T (x) is 
said to be a su¢ cient statistic for  if g (xjT) ; the density of x conditional on T 
does not depend on : 
Remark 3 A su¢ cient statistic T captures all the information about  in the data. 
This means that we can base estimators on T rather than the full sample. 
Theorem 2 (Factorization theorem) Let X1; :::;Xn be a random sample from 
f (x; ). Then T(x) is su¢ cient statistic for  i¤ 
f (x; ) = g (x) f (T(x); ) 
where g does not depend on : 
Example 4 Let Xi be iid Bernoulli with parameter p. T = 
Pn 
i=1 Xi is then a 
su¢ cient statistic (i.e. the number of successes in n trials). The joint pdf is given 
by 
f (x;p) = 
Yn 
i=1 
pxi (1  p)1xi = p 
P 
xi (1  p)n 
P 
xi = g (xjT) h (T; ) : 
and we can put g (x) = 1 and f (T; p) = pT (1  p)nT with T = 
Pn 
i=1 Xi. 
Remark 4 Note that su¢ cient statistics are not unique and may di¤er in how 
P 
good they are at reducing the data. In the previous example T2 = n  
Xi and 
T3 = ( 
P 
Xi; n  
P 
Xi) are clearly su¢ cient statistics as well. 
4

More Related Content

What's hot

Interpolation with unequal interval
Interpolation with unequal intervalInterpolation with unequal interval
Interpolation with unequal intervalDr. Nirav Vyas
 
numericai matmatic matlab uygulamalar ali abdullah
numericai matmatic  matlab  uygulamalar ali abdullahnumericai matmatic  matlab  uygulamalar ali abdullah
numericai matmatic matlab uygulamalar ali abdullahAli Abdullah
 
Lesson 21: Antiderivatives (slides)
Lesson 21: Antiderivatives (slides)Lesson 21: Antiderivatives (slides)
Lesson 21: Antiderivatives (slides)Matthew Leingang
 
Series solutions at ordinary point and regular singular point
Series solutions at ordinary point and regular singular pointSeries solutions at ordinary point and regular singular point
Series solutions at ordinary point and regular singular pointvaibhav tailor
 
Solution of non-linear equations
Solution of non-linear equationsSolution of non-linear equations
Solution of non-linear equationsZunAib Ali
 
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONSNUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONSnaveen kumar
 
Multiple Choice Questions - Numerical Methods
Multiple Choice Questions - Numerical MethodsMultiple Choice Questions - Numerical Methods
Multiple Choice Questions - Numerical MethodsMeenakshisundaram N
 
Non Linear Equation
Non Linear EquationNon Linear Equation
Non Linear EquationMdAlAmin187
 
MATLAB ODE
MATLAB ODEMATLAB ODE
MATLAB ODEKris014
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite IntegralRich Elle
 
Unidad i limites y continuidad
Unidad i    limites y continuidadUnidad i    limites y continuidad
Unidad i limites y continuidadJavierZerpaPerez
 
Engineering Mathematics-IV_B.Tech_Semester-IV_Unit-V
Engineering Mathematics-IV_B.Tech_Semester-IV_Unit-VEngineering Mathematics-IV_B.Tech_Semester-IV_Unit-V
Engineering Mathematics-IV_B.Tech_Semester-IV_Unit-VRai University
 
Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片
Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片
Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片Chyi-Tsong Chen
 
Partial differential equations
Partial differential equationsPartial differential equations
Partial differential equationsaman1894
 
6.2 the indefinite integral
6.2 the indefinite integral 6.2 the indefinite integral
6.2 the indefinite integral dicosmo178
 

What's hot (20)

Interpolation with unequal interval
Interpolation with unequal intervalInterpolation with unequal interval
Interpolation with unequal interval
 
numericai matmatic matlab uygulamalar ali abdullah
numericai matmatic  matlab  uygulamalar ali abdullahnumericai matmatic  matlab  uygulamalar ali abdullah
numericai matmatic matlab uygulamalar ali abdullah
 
Lesson 21: Antiderivatives (slides)
Lesson 21: Antiderivatives (slides)Lesson 21: Antiderivatives (slides)
Lesson 21: Antiderivatives (slides)
 
Series solutions at ordinary point and regular singular point
Series solutions at ordinary point and regular singular pointSeries solutions at ordinary point and regular singular point
Series solutions at ordinary point and regular singular point
 
Solution of non-linear equations
Solution of non-linear equationsSolution of non-linear equations
Solution of non-linear equations
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
numerical methods
numerical methodsnumerical methods
numerical methods
 
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONSNUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
NUMERICAL METHODS MULTIPLE CHOICE QUESTIONS
 
Multiple Choice Questions - Numerical Methods
Multiple Choice Questions - Numerical MethodsMultiple Choice Questions - Numerical Methods
Multiple Choice Questions - Numerical Methods
 
The integral
The integralThe integral
The integral
 
Non Linear Equation
Non Linear EquationNon Linear Equation
Non Linear Equation
 
Calculus Assignment Help
Calculus Assignment HelpCalculus Assignment Help
Calculus Assignment Help
 
MATLAB ODE
MATLAB ODEMATLAB ODE
MATLAB ODE
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite Integral
 
Unidad i limites y continuidad
Unidad i    limites y continuidadUnidad i    limites y continuidad
Unidad i limites y continuidad
 
Engineering Mathematics-IV_B.Tech_Semester-IV_Unit-V
Engineering Mathematics-IV_B.Tech_Semester-IV_Unit-VEngineering Mathematics-IV_B.Tech_Semester-IV_Unit-V
Engineering Mathematics-IV_B.Tech_Semester-IV_Unit-V
 
Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片
Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片
Ch 01 MATLAB Applications in Chemical Engineering_陳奇中教授教學投影片
 
probability assignment help (2)
probability assignment help (2)probability assignment help (2)
probability assignment help (2)
 
Partial differential equations
Partial differential equationsPartial differential equations
Partial differential equations
 
6.2 the indefinite integral
6.2 the indefinite integral 6.2 the indefinite integral
6.2 the indefinite integral
 

Viewers also liked

Programa HEUTAGOGIA
Programa HEUTAGOGIAPrograma HEUTAGOGIA
Programa HEUTAGOGIALucy Padilla
 
Acceso a la heutagogía
Acceso a la heutagogíaAcceso a la heutagogía
Acceso a la heutagogíaRené Herrera
 
Estrategias basadas en heutagogía
Estrategias basadas en heutagogíaEstrategias basadas en heutagogía
Estrategias basadas en heutagogíaLiliana Sandoval
 
Pedagogia, andragogia, heutagogia
Pedagogia, andragogia, heutagogiaPedagogia, andragogia, heutagogia
Pedagogia, andragogia, heutagogiaSkore
 

Viewers also liked (6)

Programa HEUTAGOGIA
Programa HEUTAGOGIAPrograma HEUTAGOGIA
Programa HEUTAGOGIA
 
Acceso a la heutagogía
Acceso a la heutagogíaAcceso a la heutagogía
Acceso a la heutagogía
 
Heutagogía
Heutagogía Heutagogía
Heutagogía
 
Heutagogia
HeutagogiaHeutagogia
Heutagogia
 
Estrategias basadas en heutagogía
Estrategias basadas en heutagogíaEstrategias basadas en heutagogía
Estrategias basadas en heutagogía
 
Pedagogia, andragogia, heutagogia
Pedagogia, andragogia, heutagogiaPedagogia, andragogia, heutagogia
Pedagogia, andragogia, heutagogia
 

Similar to Lecture notes on statistical estimation techniques

Jam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical StatisticsJam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical Statisticsashu29
 
Relative squared distances to a conic
Relative squared distances to a conic Relative squared distances to a conic
Relative squared distances to a conic ijcga
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetSuvrat Mishra
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksFederico Cerutti
 
Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Derbew Tesfa
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)CrackDSE
 
Spanos lecture+3-6334-estimation
Spanos lecture+3-6334-estimationSpanos lecture+3-6334-estimation
Spanos lecture+3-6334-estimationjemille6
 
Minimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian updateMinimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian updateAlexander Litvinenko
 
A. Basic Calculus [15]A1. The function f (x) is de…ned as.docx
A. Basic Calculus [15]A1. The function f (x) is de…ned as.docxA. Basic Calculus [15]A1. The function f (x) is de…ned as.docx
A. Basic Calculus [15]A1. The function f (x) is de…ned as.docxannetnash8266
 
GATE Mathematics Paper-2000
GATE Mathematics Paper-2000GATE Mathematics Paper-2000
GATE Mathematics Paper-2000Dips Academy
 
Sawinder Pal Kaur PhD Thesis
Sawinder Pal Kaur PhD ThesisSawinder Pal Kaur PhD Thesis
Sawinder Pal Kaur PhD ThesisSawinder Pal Kaur
 

Similar to Lecture notes on statistical estimation techniques (20)

Ols
OlsOls
Ols
 
Statistical Method In Economics
Statistical Method In EconomicsStatistical Method In Economics
Statistical Method In Economics
 
Jam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical StatisticsJam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical Statistics
 
Relative squared distances to a conic
Relative squared distances to a conic Relative squared distances to a conic
Relative squared distances to a conic
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
b
bb
b
 
Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research Linearprog, Reading Materials for Operational Research
Linearprog, Reading Materials for Operational Research
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)
 
chap2.pdf
chap2.pdfchap2.pdf
chap2.pdf
 
New test123
New test123New test123
New test123
 
Spanos lecture+3-6334-estimation
Spanos lecture+3-6334-estimationSpanos lecture+3-6334-estimation
Spanos lecture+3-6334-estimation
 
Linear algebra
Linear algebraLinear algebra
Linear algebra
 
Regression Analysis.pdf
Regression Analysis.pdfRegression Analysis.pdf
Regression Analysis.pdf
 
Minimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian updateMinimum mean square error estimation and approximation of the Bayesian update
Minimum mean square error estimation and approximation of the Bayesian update
 
A. Basic Calculus [15]A1. The function f (x) is de…ned as.docx
A. Basic Calculus [15]A1. The function f (x) is de…ned as.docxA. Basic Calculus [15]A1. The function f (x) is de…ned as.docx
A. Basic Calculus [15]A1. The function f (x) is de…ned as.docx
 
GATE Mathematics Paper-2000
GATE Mathematics Paper-2000GATE Mathematics Paper-2000
GATE Mathematics Paper-2000
 
Sawinder Pal Kaur PhD Thesis
Sawinder Pal Kaur PhD ThesisSawinder Pal Kaur PhD Thesis
Sawinder Pal Kaur PhD Thesis
 
Ichimura 1993: Semiparametric Least Squares (non-technical)
Ichimura 1993: Semiparametric Least Squares (non-technical)Ichimura 1993: Semiparametric Least Squares (non-technical)
Ichimura 1993: Semiparametric Least Squares (non-technical)
 

Recently uploaded

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 

Recently uploaded (20)

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 

Lecture notes on statistical estimation techniques

  • 1. Lecture notes Statistics Estimation Rickard Sandberg e-mail:rickard.sandberg@hhs.se January 22, 2010
  • 2. 1 Introduction All models are wrong. Some models are useful. –George E.P. Box 1. Data Generating Process (DGP), the joint distribution of the data f (z1; : : : zn; ) where zi in general are vector valued observations. 2. Theoretical (economic) model, being a simpli…cation, is di¤erent fromthe DGP. 3. The DGP is unknown. 4. Statistical model of the data. (a) Provide a su¢ ciently good approximation to the DGP to make inference valid. (b) If the approximation is ”bad” and inference is invalid we say that the model is misspeci…ed. (c) There may be several ”valid”models, di¤ering in ”goodness”. 5. If the parameters of the theoretical model can be uniquely determined from the parameters of the statistical model we say that the theoretical model is identi…ed. 6. In many cases we are only interested in a subset of the variables, yi; and can write the DGP as f (z1; : : : zn; ) = f1 (y1; : : : ynjx1; : : : xn; 1) f2 (x1; : : : xn; 2) : If xi is exogenous, f2 can be ignored and it is su¢ cient to model f1: Roughly speaking this is the case when 2 does not contain any information about 1. In what follows the DGP is assumed known and all these issues are ignored! 2 Small sample properties of general estimators (criteria) De…nition 1 An estimator, b;of is a function of the data, b (Z1; : : : ;Zn) : As such it is a random variable and has a sampling variability. De…nition 2 An estimate of is the estimator evaluated at the current sample, b (z1; : : : ; zn) : 1
  • 3. De…nition 3 (Unbiased) An estimator b of is unbiased if E b = : b b; = E b is the bias of b: Example 1 Consider the estimator b2 = 1 n Pn i=1 Xi X 2 of 2 where the Xi are uncorrelated, E (Xi) = and V ar (Xi) = 2: We have Xi X 2 = Xi + X 2 = (Xi )2 2 (Xi ) X + X 2 E Xi X 2 = E (Xi )2 2E (Xi ) X + E X 2 = 2 2 2 n + 2 n n 2 with b (b2; 2) = 2 and it is clear the E (b2) = n1 n : De…nition 4 (MSE) The Mean Square Error (MSE) of an estimator, b; is given by MSE b; = E b 2 : Remark 1 Note that we have E b 2 = E b E b + E b 2 = E b E b 2 + 2E b E b E b + E E b 2 = V ar b + 0 + b b; 2 : That is the MSE of an unbiased estimator is just the variance. De…nition 5 (Relative e¢ ciency) Let b1 and b2 be two alternative estimators of . Then ratio of the MSEs MSE b1; =MSE b2; is called the relative e¢ - ciency of b1 with respect to b2: De…nition 6 (UMVUE) An estimator b is a uniformly minimum variance unbi- ased estimator (UMVUE) if E b = and for any other unbiased estimator, ; V ar b V ar () for all : Example 2 Consider the class, b = Pn i=1 wiXi, of linear estimators of = E (Xi) ; where 2 and the are uncorrelated. Unbiasedness clearly requires P V ar (Xi) = Xi that wi = 1 and the variance is given by V ar (b) = E X 2 wi (Xi ) = E X i X j wiwj (Xi ) (Xj ) = 2 X w2 i 2
  • 4. One unbiased estimator in this class is the familiar X which sets wi = 1=n and has variance 2=n:We will show that this is the UMVUE in the class of linear estimators. P The …rst order condition for minimizing V ar (b) subject to the restriction wi = 1 is 2wi = for the Lagrange multiplier. That is, all the weights are equal, together with P wi = 1 this gives wi = 1=n: Remark 2 The notion of minimizing the variance is suggestive. One can de…ne a general class of estimators by requiring the estimator to minimize the sample analogue of the variance b = arg min n1 Xn i=1 (Xi )2 ; with FOC 2n1Pn i=1 (Xi ) = 0 and solution b = 1 n P Xi: This is the class of Least Squares estimators. Example 3 Consider the linear regression model yi =
  • 6. 2 + : : : + xki
  • 7. k + i or in matrix notation y = X
  • 8. + : The least squares estimator of
  • 9. ; b; is obtained byminimizing q = 0 = (y Xb)0 (y Xb). The FOC is @q @b0 = 2 (y Xb)0X = 0 y0X = b0X0X with solution b = (X0X)1X0y provided that X0X has full rank (so the inverse is well-de…ned), i.e. that X has rank k: Theorem 1 (Gauss-Markov) Assume that X is non-stochastic and E () = 0; V ar () = 2I: Then V ar (b) = 2 (X0X)1 and b is the BLUE (Best Linear Un- biased Estimator) of
  • 10. : That is, b is the UMVUE in the class of linear estimators, eb = Ay: Proof. Write b = (X0X)1X0y =(X0X)1X0 (X
  • 11. + ) =
  • 14. and V ar (b) = E (b
  • 15. ) (b
  • 16. )0 = E h (X0X)1X00X(X0X)1 i = (X0X)1X0E (0)X(X0X)1 = (X0X)1X02IX (X0X)1 = 2 (X0X)1 : To prove that b is BLUE, let= Ay be an unbiased linear estimator of
  • 17. : De…ning C eb = A(X0X)1X0 we have= C + (X0X)1X0 y = Cy+b = CX
  • 18. +C +
  • 19. + eb (X0X)1X0: Clearly E eb = CX
  • 20. + CE () +
  • 21. = CX
  • 22. +
  • 23. and unbiasedness implies that CX = 0: The variance is then V ar eb = E eb
  • 24. eb
  • 25. 0 = E h C+(X0X)1X0 i 0 h C+(X0X)1X0 i 0 = 2CC0 + 2CX(X0X)1 + 2 (X0X)1X0C + 2 (X0X)1 = 2CC0 + 2 (X0X)1 and the variance of eb exceeds the variance of b by the positive semi-de…nite matrix 2CC0: This implies that V ar 0eb = V ar (0b) + 20CC0 V ar (0b) for any linear combination : De…nition 7 (Su¢ ciency) Let f (x; ) be the joint density of the data. T (x) is said to be a su¢ cient statistic for if g (xjT) ; the density of x conditional on T does not depend on : Remark 3 A su¢ cient statistic T captures all the information about in the data. This means that we can base estimators on T rather than the full sample. Theorem 2 (Factorization theorem) Let X1; :::;Xn be a random sample from f (x; ). Then T(x) is su¢ cient statistic for i¤ f (x; ) = g (x) f (T(x); ) where g does not depend on : Example 4 Let Xi be iid Bernoulli with parameter p. T = Pn i=1 Xi is then a su¢ cient statistic (i.e. the number of successes in n trials). The joint pdf is given by f (x;p) = Yn i=1 pxi (1 p)1xi = p P xi (1 p)n P xi = g (xjT) h (T; ) : and we can put g (x) = 1 and f (T; p) = pT (1 p)nT with T = Pn i=1 Xi. Remark 4 Note that su¢ cient statistics are not unique and may di¤er in how P good they are at reducing the data. In the previous example T2 = n Xi and T3 = ( P Xi; n P Xi) are clearly su¢ cient statistics as well. 4
  • 26. 3 Large sample properties of general estimators (criteria) De…nition 8 (Consistency) An estimator b of is consistent if b p! . De…nition 9 (Asymptotically unbiased) An estimator b of is asymptotically unbiased if n b d! Z; for 0, and Z is a non-degenerate random variable with E (Z) = 0: Remark 5 The requirement n b d! Z; 0 implies that b is a consistent estimator. Typically = 1=2 and b is referred to as a pn-consistent estimator. De…nition 10 (ARE) Let b1 and b2 be two estimators of such that pn b1 d! 1 ()) and pn N (0; 2 b2 d! 2 ()) ; the asymptotic relative e¢ ciency N (0; 2 (ARE) of b1 relative to b2 is given by 2 1 () =2 2 () where 2 1 () = limn!1 nV ar(b1) 1 () = limn!1 nV ar(b2). and 2 De…nition 11 (Best asymptotically normal (BAN)) b is said to be asymptot- ically e¢ cient if 1. b p! for all 2 ; 2. pn b d! N (0; 2 ()) ; 3. There is no other estimator, ; ful…lling 1) and 2) with 2 () 2 () : Example 5 Consider again the liner regression model y = X
  • 27. + if we add the assumption that lim!1 n1X0X = Q or plim n1X0X = lim n1E (X0X) = Q for X stochastic with Q a positive de…nite matrix we have that the OLS estimator b is consistent. We prove this in the case for X being …xed. We have that b =
  • 29. + X0X n 1 X0 n : By assumption lim!1 n1X0X ! Q and X0 n = P xii n looks like something a law of large numbers could apply to. We have E (xii) = 0 and V ar (X0) = E (X00X) = E (X02IX) = 2E (X0X) = 2X0X: This immediately gives lim n!1 V ar (X0=n) = lim n!1 1 nn12X0X = lim n!1 1 n lim n!1 n12X0X = lim n!1 1 n2Q =0 and plim n1X0 = 0 by theMarkov LLN. It follows that plim b =
  • 31. + Q1 0 =
  • 32. : 5
  • 33. If in addition E j0xiij2+ B 1 for 0 = 1 then b is also asymptotically normal. We will use this to establish that the condition lim n!1 Pn i=1 E j0xiij2+ 2 Pn ( i=1 V ar (0xii))2+ = 0 for the Liapunov theorem holds. Since the numerator is dominated by n2B2 and P P lim n1V ar (0xii) = 20 [lim n1xix0i] = 20Q 0 we have n!1 lim n!1 P E j0xiij2+ 2 P ( V ar (0xii))2+ lim n!1 n2B2 P ( V ar (0xii))2+ = lim n!1 nB2 (n1 P V ar (0xii))2+ = limn!1 nB2 (limn!1 n1 P V ar (0xii))2+ = 0 We also have that limn!1pn (n ) = 0 trivially holds because n = E(0xii=n) = 0 holds for all i, and thus limn!1 n = 0 = . The Liapunov CLT now gives that pn P (0xii=n) = pn0 (X0=n) d! N (0; 20Q) : Applying the Cramér-Wold device gives pn (X0=n) d! N (0; 2Q) : Using Cramérs theorem then gives pn (b
  • 34. ) = n1X0X 1 pn (X0=n) d! N 0; 2Q1 since lim n!1 (n1X0X)1 = Q1: 4 Maximum likelihood De…nition 12 (Likelihood) The likelihood is the data density viewed as function of the parameters, L (; x) = f (x; ) : The likelihood is a random variable since it depends on the data. De…nition 13 (MLE) We de…ne the maximum likelihood estimator (MLE) as b = arg max 2 L (; x) where x =(x1; : : : ; xn) denotes the data, and xi and may be vectors. Remark 6 Alternatively, the MLE can be de…ned as the solution to the FOC @L (; x) @ = 0: This de…nition has two problems. The likelihood may have local maxima, i.e. there are multiple solutions to the FOC and the derivative may not be well de…ned. De- spite these shortcomings we will, for simplicity, rely on this de…ning the MLE for much of what follows. 6
  • 35. Example 6 Suppose that Xi; i = 1; : : : n; are iid U (0; ). We have f (x) = ( 1 0 x 0 otherwise and the likelihood is given by L (; x) = 1 n I X(n) where X(n) is the nth order statistic, i.e. X(n) = max (X1; : : : ;Xn) : It is clear that the FOC n n+1 = 0 will not provide a sensible answer. On the other hand it is easily seen, since L (; x) is decreasing in ; that the likelihood is maximized by b = X(n): Remark 7 For independent data we can write the likelihood as L (; x) = f (x1; : : : ; xn; ) = Yn i=1 fi (xi; ) and, conveniently, the log-likelihood as ln L (; x) = l (; x) = Xn i=1 ln fi (xi; ) : This decomposition turns out to be crucial in the derivation of many of the properties of MLEs. For dependent data we can, somewhat less conveniently, write L (; x) = f (x1; ) f (x2jx1; ) : : : f (xnjx1; : : : ; xn1; ) = Yn i=1 f (xijxj ; j i; ) l (x; ) = Xn i=1 ln f (xijxj ; j i; ) : and the derivations below goes through with relatively small changes. De…nition 14 (Score) The derivative of the log-likelihood s (; x) = @l (; x) @ is referred to as the score vector. Lemma 1 The score vector evaluated at the true parameter values, 0; has expec- tation zero. Proof. Since L (; x) is the density of the data we have 1 = Z L (0; x) dx: 7
  • 36. Di¤erentiate both sides w.r.t. 0 = @ @ Z L (0; x) dx = Z @L (0; x) @ dx = Z 1 L (0; x) @L (0; x) @ L (0; x) dx = Z @l (0; x) @ L (0; x) dx = E [s (0; x)] De…nition 15 (Fisher Information) The information matrix is the variance-covariance matrix of the score vector evaluated at the true parameter values 0; I () = E s (0; x) s (0; x)0 = E @l (0; x) @ @l (0; x) @0 : Remark 8 Note the use of the convention that the derivative w.r.t. the column vector is a column vector and the derivative w.r.t. the row vector 0 is a row vector. Remark 9 The Fisher information is a measure of the information about we, on average, can expect to …nd in a sample of given size. Theorem 3 (Information matrix equality) I () = E @2l (0; x) @@0 = E @l (0; x) @ @l (0; x) @0 = V ar (s (0; x)) Proof. Write 0 = Z @l (0; x) @ L (0; x) dx and di¤erentiate both sides 0 = Z @l (0; x) @ @L (0; x) @0 dx+ Z @2l (0; x) @@0 L (0; x) dx = Z @l (0; x) @ @l (0; x) @0 L (0; x) dx+ Z @2l (0; x) @@0 L (0; x) dx: That is E @l (0; x) @ @l (0; x) @0 = E @2l (0; x) @@0 Remark 10 For iid data we can write the information as I () = nE @2 ln f (xi; 0) @@0 = nE @ ln f (xi; 0) @ @ ln f (xi; 0) @0 : 8
  • 37. Condition 1 We have assumed that @ @ R L (0; x) dx = R @L(0;x) @ dx holds. This is not necessarily the case. Roughly speaking, the requirement for this to hold is that the distribution isn’t too fat tailed and that the domain of x does not depend on : Su¢ cient conditions for this and the Cramér-Rao theorem below (theorem 5) is that 1. The parameter space ; 2 is an open rectangle or we can restrict the parameter space to an open rectangle. 2. The domain of x does not depend on : 3. The score vector s has …nite expectation and variance 82 : Example 7 Example 6 continued. With the Uniform likelihood we have l (x; ) = n ln () and @l (x; ) @ = n @2l (x; ) @2 = n 2 and it is clear that both the information matrix equality and the lemma fails to hold. This should not be surprising since the domain of Xi depends on : Example 8 Suppose that Xi NID(; 2) ; f (x) = 1 p22 e(x)2=22 with likeli- hood L ; 2; x = 22 n=2 exp Xn i=1 (xi )2 =22 ! l ; 2; x = n 2 ln 2 n 2 ln 2 1 22 Xn i=1 (xi )2 with @l @ = Pn i=1 (xi ) 2 @l @2 = n 22 + 1 24 Xn i=1 (xi )2 yielding the familiar estimates b = x; b2 = 1 n Pn i=1 (xi x)2 : It is easily veri…ed that E @l @ = E @l @2 = 0: Furthermore E @l @ 2 = E Pn i=1 (xi ) 2 2 = 1 4E Xn i=1 Xn j=1 # = (xi ) (xj ) n2 4 = n 2 9
  • 38. E @l @2 2 = E n 22 + 1 24 Xn i=1 (xi )2 #2 = E n2 44 n 26 Xn i=1 (xi )2 + 1 48 Xn i=1 Xn j=1 (xi )2 (xj )2 # , E (xi )2 (xj )2 = ( 34 i = j 4 i6= j , by independence = n2 44 n2 24 + [3n + n (n 1)] 4 48 = n 24 E @l @ @l @2 = E 1 2 Xn i=1 ! (xi ) n 22 + 1 24 Xn i=1 (xi )2 ! = E n 24 Xn i=1 (xi ) + 1 26 Xn i=1 Xn j=1 (xi ) (xj )2 ! = 0 and the information matrix is given by I ; 2 = n 2 0 0 n 24 ! : To verify that the information matrix equality holds we evaluate E @2l @2 = E Pn i=1 1 2 = n 2 E @2l @ (2)2 = E n 24 1 6 Xn i=1 (xi )2 ! = n 24 n2 6 = n 24 E @2l @@2 = E 1 4 Xn i=1 ! (xi ) = 0 and it is clear that E @l @ @l @0 = E @2l @@0 holds. 5 Small sample optimality results Remark 11 Maximum likelihood estimators are functions of su¢ cient statistics rather than the full sample. To see this, note that if T is a su¢ cient statistic we can write the likelihood as (recall the Factorization theorem) L (x; ) = g (x) f (T; ) =) l(x; ) =ln g (x) + ln f (T; ) where g (x) is a function of the data only and f (T; ) is the marginal density of T: Maximizing lnf (T; ) w.r.t. will obviously give the same result as maximizing l (x; ) : 10
  • 39. Theorem 4 (Rao-Blackwell) Let the density of the data be indexed by the para- meter , T be a su¢ cient statistic for and t (x) be an unbiased estimator of u () : De…ne the new estimator b = E (t (x) jT) : then 1. b is unbiased estimator of u () 2. V ar b V ar (t) : Proof. We must …rst establish that b can be used as an estimator, i.e. that it does not depend on and can be computed from the sample. To see this note that t (x) is a function of the sample and since T is R a su¢ cient statistic g (xjT) does not depend on : Consequently b = E (t (x) jT) = t (x) g (xjT) dx is independent of : To show part 1 note that E b = E [E (t (x) jT)] = E (t (x)) = u () by the law of iterated expectations. For part 2 we have from theorem 5.6 in Ramanathan that V ar (X) = E [V ar (XjY )] + V ar [E (XjY )] ; setting t = X and b = E (XjY ) it is clear that part 2 must hold. Remark 12 Rao-Blackwellization provides a general way of obtaining a reasonable estimator. Find an unbiased estimator (which by no means has to bee a good estimator) and a su¢ cient statistic and construct the new estimator using the Rao- Blackwell theorem. In some cases this will even be an optimal estimator in the sense that it is an UMVUE. Example 9 Consider again the case with iid Bernoulli data with parameter p: Sup- pose we take t (X) = X1: Clearly this is an unbiased estimator of P p; E (X1) = p and V ar (X1) = p (1 p) : The su¢ cient statistic is T = Xi: Calculating bp = E (X1jT) is a combinatorial problem, there are in total n!= [T! (n T)!] equally likely permutations of the T ones and n T zeros given T: Of these there are (n 1)!= [(T 1)! (n T)!] permutations where X1 = 1: This gives P (X1 = 1jT) = (n 1)!T! n! (T 1)! = T n and bp = T=n with E (bp) = E (T) =n = p and V ar (bp) = V ar (T) =n2 = p (1 p) =n: De…nition 16 (Exponential family) A distribution characterized by a kdimensional parameter vector is said to belong to the exponential family if its density or prob- ability function can be written on the form f (x) = C () exp Xk i=1 # h (x) : qi () Ti (x) Remark 13 It follows from the factorization theorem that (T1; : : : Tk) are su¢ cient statistics for : 11
  • 40. Remark 14 The exponential family is a large class of distributions, containing among others the binomial, normal, geometric, exponential and Poisson distribu- tions. Example 10 Consider the randomvariable X with the normal pdf f(x) = (22)0:5 e0:5(x)2=2 . To deduce that his pdf belongs to the exponential family …rst note that =(; 2)0 and write (22)0:5e0:5(x)2=2 = e0:52=2 p22 22 +x 22 ex2 1 = C () eq1()T1(x)+q2()T2(x)h (x) where C () = e0:52=2 22 , T1(x) = x2, q2 () = 22 , T2(x) = x, and p22 , q1 () = 1 h (x) = 1. In many cases it is not possible to establish the existence of an UMVUE. In those cases it is of interest to know how good the estimator at hand is. Is it worth the e¤ort to try to …nd a better estimator? To answer this question we need to know how far o¤ we are from the best possible case. Theorem 5 (Cramér-Rao) Let b be an unbiased estimator of the k-dimensional parameter vector and suppose that the regularity conditions 1 hold. Then V ar b I1 () is a positive semi de…nite matrix and we write V ar b I1 () : Proof. We have = E b = R bL (; x) dx and di¤erentiate both sides w.r.t. : @ @0 = I = Z b @L (; x) @0 dx = Z b 1 L (; x) @L (; x) @0 L (; x) dx = Z b @l (; x) @0 L (; x) dx = Z bs (; x)0 L (; x) dx = Cov b; s since E (s) = 0 where s is the score vector. The variance of b0; s0 0 is then V ar b s ! = V ar b I I I () ! : Note that any variance matrix is positive semi-de…nite and hence the variance of the linear combination [I; I1 ()] b0; s0 0 is positive semi-de…nite. This variance 12
  • 41. is given by I I1 () V ar b I I I () ! I I1 () ! = V ar b I1 () 0 which establishes the result. Remark 15 The inverse information matrix I1 () provides a lower bound for the variance of an unbiased estimator and is referred to as the Cramér-Rao lower bound. Remark 16 In the scalar parameter case the Cramér-Rao lower bound reduces to V ar b I1 () 1: Remark 17 The notation V ar b I1 () is justi…ed in the vector valued para- meter case by noting that a0 h V ar b I1 () i a 0 or a0V ar b aa0I1 () a 0 for an arbitrary vector a when V ar b I1 () is positive semi-de…nite. That is, there is no linear combination a0b of any unbiased estimator b with smaller variance than a0I1 () a. Remark 18 There is no guarantee that there is an unbiased estimator that attains the Cramér-Rao lower bound. Example 11 The information for the parameters (; 2) with iid normal data was obtained in example 8 as I ; 2 = n 2 0 0 n 24 ! and the Cramér-Rao lower bound is given by I1 ; 2 = 2 n 0 0 24 n ! : It is clear that x attains the lower bound but s2 = 1 n1 Pn i=1 (xi x)2 does not because V (s2) = 24 n1 which follows from noting that Pn i=1 Xi X 2 2 2 (n 1) : Clearly V ar (s2) is greater than the Cramér-Rao lower bound for any …nite n: Theorem 6 Suppose that t is an unbiased estimator of that attains the Cramér- Rao lower bound. Then t is the MLE of : 13
  • 42. Proof. From the proof of the Cramér-Rao theorem we have that V ar (t) I1 () = V ar (I; I1 ()) (t0; s0)0 if t is an unbiased estimator. By assumption V ar (t) I1 () = 0 and (I; I1 ()) (t0; s0)0 must be constant and there is an exact linear relation between t and s. Since t is unbiased the linear relation has the form t = A() s (; x)+ or s (; x) = A1 () (t ) : Setting the score to zero we obtain the MLE as b = t: Remark 19 This is a rather strong optimality result for MLEs but it should not be taken to imply that the MLE always is unbiased or that it always attains the Cramér-Rao lower bound. In particular it does not imply that a MLE is UMVUE. Example 12 Consider again the case of iid normal data. The MLE of 2 is b2 P= 1 n 2 (biased) and V ar (b2) = 24(n1) : n n n2 i=1 (xi x)2 with E (b2) = n1 6 Large sample optimality results Theorem 7 (Consistency of MLE) Subject to the regularity conditions 1 the MLE bn is consistent, bn p! 0; the true parameter value. Theorem 8 (Asymptotic normality of MLE) Let 1 = lim 1 nI () : If the reg- ularity conditions 1 hold and if in addition the statistical model is identi…ed and l (; x) is twice continuously di¤erentiable then the asymptotic distribution of the MLE, b; is normal, pn bn 0 d! N (0; ) : Proof. We will again, for simplicity, assume that the data is iid. Note that this implies that I () = nE @ ln f (Xi; 0) @ @ ln f (Xi; 0) @0 = nE @2 ln f (Xi; 0) @@0 : That is 1=E @ ln f (Xi; 0) @ @ ln f (Xi; 0) @0 = V ar @ ln f (Xi; 0) @ in this case. By the mean value theorem we can write, for some value between 0 and bn; sn (0; x) = sn bn; x + @sn (; x) @0 0 bn = @sn (; x) @0 0 bn 14
  • 43. since the MLE bn sets the score to zero. Alternatively we can write this as 0 bn = @sn (; x) @0 1 sn (0; x) provided that @sn(;x) @0 has full rank. Since sn (0; x) = Xn i=1 @ ln f (Xi; 0) @ where f (Xi; 0) and @ ln f(Xi;0) @ are iid random variables, we have by the (multivari- ate) Lindeberg-Lévy CLT that 1 pn sn (0; x) d! N 0; 1 Secondly @sn (0; x) @0 = Xn i=1 @2 ln f (Xi; 0) @@0 ; is a sum of iid random matrices and 1 n @sn (0; x) @0 p! 1 by the Kinchine WLLN. In addition, bn p! 0 implies p! 0 and 1 n @sn (; x) @0 p! 1 by the Slutsky theorem. Note that this implies 1 n @sn(;x) @0 p! I: Next, write 1 n @sn (; x) @0 pn 0 bn = 1 pn sn (0; x) : Since 1 n @sn(;x) @0 p! I we have that pn 0 bn d! 1 pn sn (0; x) d! N (0; ) which establishes the result. Remark 20 The variance of the limiting distribution for the MLE is the inverse of the limit of the average information. That is, asymptotically MLE attains the Cramér-Rao lower bound. This implies that MLE is Best Asymptotic Normal, i.e. there is no other asymptotically normal estimator whose limiting distribution has a smaller variance. This provides a strong rationale for the use of maximum likelihood. Remark 21 Note the crucial role that the Information matrix equality plays in giving us a simple form for the variance of the limiting distribution. 15
  • 44. Example 13 For normal data, Xi iid N (; 2) ; the information matrix is given by I ; 2 = n 2 0 0 n 24 ! : It follows that pn b b2 ! 2 !# d! N (0; ) for = lim nI1 ; 2 = 2 0 0 24 ! : From exercise 5 in the asymptotics lecture notes we deduce that pn (b2n 2) d! N (0; ( 1) 4) where = E (Xi )4 =4 = 3 for normal data. Example 14 Suppose that Xi; i = 1; : : : ; n; is iid Bernoulli with parameter p: The loglikelihood is l (p; x) = T ln p + (n T) ln (1 p) for T = Pn i=1 xi: The score is @l (p; x) @p = T p n T 1 p : Setting the score to zero and solving for p gives the MLE as bp = T n : We obtain the Fisher information as I (p) = E @2l (p; x) @p2 = E T p2 + n T (1 p)2 = np p2 + n (1 p) (1 p)2 = n p + n 1 p = n p (1 p) : Since the regularity conditions hold it follows that bp is consistent and that pn (bp p) d!N (0; p (1 p)) : The results are easily veri…ed by applying a suitable LLN and CLT to bp = Pn i=1 xi=n: A common rule of thumb for when the asymptotic distribution provides a good approximation to the exact …nite sample distribution is that np (1 p) 9: Noting that T Bin (n; p) 7 When the form of the likelihood is unknown (optional) 1. It generally is unknown. 2. We can’t expect to get exact small sample results. 16
  • 45. (a) Must rely on asymptotic results. (b) In special cases we may be able to obtain the small sample bias and variance of the estimator. 3. Maximum likelihood is out of the question. 4. Maximize the wrong likelihood, on purpose or out of ignorance. Quasi Max- imum Likelihood (QML).The QMLE can, under more restrictive conditions than above, be shown to be consistent and asymptotically normal. The major di¤erence is that the Information matrix equality doesn’t hold for QMLE and we get pn bQML 0 d! N 0;A1BA1 for A = plim 1 n @sn (0; x) @0 B = plim 1 n sn (0; x) sn (0; x)0 : 5. Estimators that doesn’t rely on the likelihood. (a) Least squares. (b) Generalized Method of Moments (GMM). GMM speci…es a set of k moment conditions E [gn (0; x)] = 0; where is a k-dimensional parameter vector and minimizes gn (; x)0 gn (; x) :It is possible to show, under more restrictive conditions than above, that the GMME is consistent and asymptotically normal, pn bGMM 0 d! N (0;V) where V1 = lim 1 nV ar (gn (0; x)) : Remark 22 We know that the MLE attains the Cramér-Rao lower bound asymp- totically and it should be clear that we in general su¤er from a loss in e¢ ciency by using other estimators than MLE. Remark 23 Note that Least Squares and ML are special cases of GMM. This is seen by setting the FOCs of LS or ML as the GMM moment conditions, e.g. E [sn (0; x)] = 0 for ML. 17
  • 46. 8 Worked exercises 8.1 Exercises 1. Exercise 8.1 (b)-(e) in Ramanathan. 2. Exercise 8.2 in Ramanathan. 3. Exercise 8.9 (a)-(c) in Ramanathan. In addition, obtain E (b) and V ar (b) where b is the MLE of : 4. Consider the regression model y = x+
  • 47. z+, E () = 0, V ar () = 2I, where 2i and
  • 48. are scalars. In addition we are told that the iare iid, the xi are iid with E (x) = c, xi and i are independent of each other, x0x n p! c = E (x2i )6= 0 and x0z n p! d6= 0. (a) Suppose
  • 49. is known and de…ne the estimator b = x0(y
  • 50. z) x0x : Obtain the limiting distribution of b. (b) Suppose
  • 51. is unknown but that we are given the estimator e
  • 52. of
  • 54. is independent of and x and that pn(e
  • 55.
  • 56. ) d! N(0; 1_) . De…ne the estimator e = x0(y e
  • 57. z) x0x and obtain the limiting distribution of e. (c) Are b and e consistent estimators of ? 8.2 Solutions 1. f (x; ) = kx a discrete geometric distribution, i.e. k = 1 : b) We have L (x;) = Yn i=1 (1 ) xi = (1 )n Pn i=1 xi and it is clear from the factorization theorem that Pn i=1 xi and x = 1 n Pn i=1 xi are su¢ cient statistics. (a) We have @l (x; ) @ = n 1 + Pn i=1 xi @2l (x; ) @2 = n (1 )2 Pn i=1 xi 2 : 18
  • 58. Since E (xi) = 1 we have I () = E @2l (x; ) @2 = n (1 )2 + n (1 ) 2 = n (1 )2 : It is easy to verify that the outer product of the score form of the infor- mation matrix gives the same result, E @l (x; ) @ 2 = E n 1 + Pn i=1 xi 2 = n2 (1 )2 2nE Pn i=1 xi (1 ) + Pn E ( i=1 xi)2 2 /independence/ = n2 (1 )2 2n2 (1 )2 + E Pn i=1 x2i 2 + E Pn i=1 P j6=i xixj 2 = n2 (1 )2 + n (1)2 + 2 (1)2 2 + n (n 1) 2 (1)2 2 = n2 (1 )2 + n (1 )2 + n2 (1 2) = n (1 )2 (b) Setting the score to zero we have @l (x; ) @ = n 1 + Pn i=1 xi = 0 Pn i=1 xi n = 1 with the solution b = x 1 + x : (c) Since xi are iid we have x p! 1 by the Kinchine WLLN. It E (xi) = follows from the Slutsky theorem that b = g (x) = x 1+x p! g 1 = : 2. f(x; ) = x1 for 0 x 1 and 0. (a) R 1 0 x1dx = 1 x
  • 59.
  • 60. 1 : It follows that 0 = 1 R 1 0 xdx = 1 +1 and hence that E(x) = R 1 0 xx1dx = +1: R 1 0 lnxe(1)lnxdx = R 1 0 lnxx1dx: It follows that E(lnx) = (b) 1 2 = @ @ 1 = @ @ R 1 0 x1dx = R 1 0 @ @ e(1)lnxdx = R 1 0 lnxx1dx = 1 : (c) 2 3 = @ @2 1 = @ @2 Z 1 0 x1dx = Z 1 0 @ @ lnxe(1)lnxdx = Z 1 0 (lnx)2e(1)lnxdx = Z 1 0 (lnx)2 x1dx: 19
  • 61. Which gives E(lnx)2 = Z 1 0 (lnx)2 x1dx = 2 2 :V ar(lnx) = E (lnx E (lnx))2 = E (lnx)2 2lnxE(lnx) + [E(lnx)]2 = E(lnx)2 [E(lnx)]2 = 2 2 1 2 = 1 2 : (d) We have the random sample x1; :::; xn: Independence gives the joint den- sity as f(x1; :::; xn; ) = Yn i=1 f(xi; ) = n Yn i=1 xi !1 : Qn The likelihood is thus L(; x1; :::; xn) = n ( i=1 xi)1 : It follows from the factorization theorem (8.1) that T1 = Qn i=1 xi is a su¢ cient statistic since we can factorize the likelihood into the function h(T1; ) = nT1 1 , depending only on and T1, and the function g() = 1, which does not depend on P and T1: The factorization theorem is if and only if, that is T2 = n i=1 xi is a su¢ cient statistic only if we can factorize the likelihood correspondingly for T2: Inspection of the likelihood function shows that this is impossible and consequently that T2 is not a su¢ cient statistic for P. T3 = n i=1 lnxi, on the other hand, is a su¢ cient statistic. (e) lnL(; x) = nln() + ( 1) Pn i=1 lnxi and @lnL @ = n + Xn i=1 lnxi: Setting the derivative to zero yields b = P n n i=1 lnxi : To verify that this is a maximum we need to show that the second derivative is negative at b: We have @2lnL 2 and @2lnL @2 @2 = n
  • 62.
  • 63.
  • 64. =b = Pn ( i=1 lnxi)2 n 0: (f) Let
  • 65. = 1=; a reasonable guess for the MLE of
  • 67. = 1=b = Pn i=1 lnxi . It is easy to establish that this guess is correct for
  • 68. = g() n when g is a monotone function (it holds for non-monotone functions as well, but is trickier to show). By monotonicity the inverse func- tion = g1(
  • 69. ) exists and we can write the loglikelihood for
  • 71. ); x). The …rst order condition is thus given by @lnL @
  • 72. = @lnL @ @ @
  • 73. = 0: By monotonicity @ @
  • 74. 6= 0 and the FOC simpli…es to @lnL @ = 0: Since there is only one value of
  • 75. for which @lnL @ = 0 it follows that the MLE of
  • 76. is b
  • 78. ) = Pn i=1 E(lnxi) n = Pn i=1 1 n = Pn i=1
  • 79. n =
  • 81. is an unbiased estimator of
  • 82. : 20
  • 83. (g) V ar b
  • 84. = 1 n2V ar Xn i=1 ! E(lnxi) = , the xiare independent, hence ln xiare independent and speci…cally uncorrelated , = 1 n2 Xn i=1 V ar(lnxi) = 1 n2 n 2 = 1 n2 =
  • 85. 2 n : The Fisher information is given by I(
  • 86. ) = E @2lnL @
  • 87. 2 ; @lnL @
  • 88. = @lnL @ @ @
  • 89. = n + Xn i=1 lnxi ! 1
  • 90. 2 = = 1
  • 91. = n
  • 94. 2 = n
  • 95. 2 + 2 Pn i=1 lnxi
  • 96. 3 : We have I(
  • 97. ) = E @2lnL @
  • 98. 2 = E n
  • 99. 2 2 Pn i=1 lnxi
  • 100. 3 = n
  • 101. 2 2 Pn i=1 E (lnxi)
  • 102. 3 = n
  • 103. 2 2 Pn i=1 E (lnxi)
  • 104. 3 = n
  • 105. 2 + 2n
  • 106.
  • 107. 3 = n
  • 108. 2 : The Cramér-Rao lower bound is given by 1=I(
  • 109. ) =
  • 110. 2 n and it is clear that the CRLB is attained in this case. (h) Let Zn = b
  • 111. E(b
  • 112. ) r V ar b
  • 113. = lnx E lnx q V ar lnx : We then have Zn d! N(0; 1) since the lnxi are independent and V ar(lnxi) =
  • 114. 2 1 and thus ful…lls the conditions of the Lindeberg-Lévy CLT. Comment: we have (for this estimator) veri…ed the claim that ML esti- mators are asymptotically normally distributed. To see that the result is in accordance with theorem 8.12 we write Zn = q
  • 115. b
  • 116. V ar(b
  • 118.
  • 120. ) and let (
  • 122. ) n = 1
  • 123. 2 : Using Slutsky (theorem 7.1) we thus have that q n I(
  • 124. )Zn = pn b
  • 125.
  • 127. ) where z s N(0; 1) or pn b
  • 128.
  • 129. d! N 0;(
  • 131. 2): Finally we may wonder about the MLE of : To obtain the asymptotic distribution of we use the Delta Rule. = g (
  • 132. ) = 1=
  • 133. is a function with continuous derivative at
  • 134. and it follows that pn b = pn g b
  • 135. g (
  • 136. ) d! N(0; 2): 21
  • 137. Note that 2 = h limI() n i 1 : 3. We have the density f(x; ;
  • 138. ) = 1
  • 139. e(x)=
  • 140. ; x ;
  • 141. 0: (a) Z 1 e(x)=
  • 142. dx =
  • 143. e(x)=
  • 144.
  • 145.
  • 146. 1 = 0 (
  • 147. ) =
  • 148. : Di¤erentiating both sides with respect to
  • 150. @ 1 = 1 = @
  • 151. @
  • 152. e(x)=
  • 153. dx = Z 1 @ @
  • 154. e(x)=
  • 155. dx = Z 1 x
  • 157. dx: It follows that E(x ) = R 1 x
  • 158. e(x)=
  • 159. dx =
  • 161. . Di¤erentiating once more we have 0 = @ @
  • 162. Z 1 x
  • 164. dx = Z 1 @ @
  • 165. x
  • 167. dx = Z 1 2 x
  • 169. + (x )2
  • 171. ! dx = 2
  • 172. 2E (x ) + 1
  • 173. 3E(x )2 and E(x)2 = 2
  • 175. 2: It follows that V ar(x) = E (x E (x))2 = E (x
  • 176. )2 = E (x )2 2
  • 177. E (x ) +
  • 178. 2 =
  • 179. 2: Comment: The mean and variance we obtained shouldn’t be too surpris- ing. The distribution of x is an exponential distribution with a shift in the location. That is if y is exponentially distributed with parameter
  • 180. ; then x is obtained as x = y + : (b) The likelihood is given by L = Qn i=1 1
  • 181. e(xi)=
  • 182. = 1
  • 183. n e Pn i=1(xi)=
  • 184. and the loglikelihood as lnL = nln
  • 185. 1
  • 186. Pn i=1 (xi ) : This gives the ele- ments of the score vector as S1 = @lnL @ = n
  • 187. S2 = @lnL @
  • 188. = n
  • 189. + 1
  • 190. 2 Xn i=1 (xi ) Using the score form of information matrix we have I(;
  • 191. ) = E S2 1 S1S2 S2S1 S2 2 # = E n2
  • 192. 2 n2
  • 193. 2 + n
  • 194. 3 Pn i=1 (xi ) n2
  • 195. 2 + n
  • 196. 3 Pn i=1 (xi ) n2
  • 197. 2 2 n
  • 198. 3 Pn i=1 (xi ) + 1 Pn
  • 199. 4 ( i=1 (xi ))2 # = n2
  • 200. 2 n2
  • 201. 2 + n2
  • 202. 2 n2
  • 203. 2 + n2
  • 204. 2
  • 206. 2+2n
  • 207. 2
  • 208. 4 n2
  • 209. 2 2n2 # = n2
  • 210. 2 0 0 n
  • 211. 2 # 22
  • 212. Pn The expectation E ( i=1 (xi ))2 = Pn i=1 Pn j=1 E [(xi ) (xj )] is a little bit tricky. For i6= j we have independence and E [(xi ) (xj )] = E (xi )E (xj ) =
  • 213. 2 and there are n (n 1) terms with i6= j: This leaves n terms with i = j where we have E (xi )2 = 2
  • 214. 2: Comment: The reason for using the score form of the information hmatrix is that the information matrix equality E (SS0) = E @lnL @ @lnL @ 0 i = E @2lnL @@0 for = (;
  • 215. )0 doesn’t hold for this likelihood. When estab- lishing that E h@lnL @ @lnL @ 0 i = E @2lnL @@0 we needed to interchange the order of integration and di¤erentiation. That is we needed, for exam- ple, that @2 @2 R 1 L(;
  • 216. ; x)dx = R 1 @2 @2L(;
  • 217. ; x)dx; which doesn’t hold since is a limit of integration. (c) Setting S2 = 0, we have n
  • 218. = Pn i=1 (xi ) and the MLE of
  • 219. as 1 n Pn i=1 (xi ), provided is known. S1 is obviously of little use for ob- taining the MLE of . Instead we need to look at the likelihood function itself, writing this as L = 1
  • 220. n en=
  • 222. it is clear that the likelihood is an increasing function of : On the other hand we have the condition xi ; that is, the likelihood of observing a value of x smaller than is zero. The value of maximizing the likelihood is thus the smallest value of xi in the sample or the …rst order statistic, denote this by x(1): We have T1 = b = x(1) and T2 = b
  • 223. = 1 n Pn i=1 xi x(1) : extra. From p. 137 in Ramanathan we get the density of the …rst order statistic as fx(1)(x) = n [1 Fx(x)]n1 fx(x): We obtain the distribution function of x as Fx(x) = R x 1
  • 224. e(y)=
  • 225. dy = 1 e(x)=
  • 226. and we have fx(1)(x) = n e(x)=
  • 227. n1 1
  • 228. e(x)=
  • 229. = n
  • 230. en(x)=
  • 231. ; a shifted exponential distribution with parameters and
  • 232. =n: It follows that E(T1) = E x(1) = +
  • 233. =n6= and V ar(T1) =
  • 234. 2=n2: 4. The regression model is y = x +
  • 235. z + : (a) We have b = x0 (y
  • 236. z) x0x = x0 (x +
  • 237. z +
  • 238. z) x0x = x0 (x + ) x0x = x0x + x0 x0x = + x0 x0x where x0x n p! c: In addition, 1 nx0 =1 n Pn i=1 xii; a sample average which a CLT might apply to. By assumption we have E (xii) = E (xi)E (i) = 0 23
  • 239. and V ar (xii) = E (x2i 2i ) = E (x2i ) 2 = 2c 1: Since xi and i are iid, xii is iid as well and the conditions for the Lindeberg-Lévy CLT holds. That is, 1 pn x0 d! N 0; 2c : Write pn (b ) = n1=2x0 x0x n and it follows that pn (b ) d! N 0; 2=c : (b) We have e = x0 ye
  • 240. z x0x = x0 x +
  • 241. z + e
  • 242. z x0x = x0 x +
  • 243. e
  • 244. z x0x = x0x + x0
  • 245. e
  • 246. x0z x0x = + x0 x0x
  • 247. e
  • 248. x0z x0x : This gives pn (e ) = n1=2x0 x0x=n pn
  • 249. e
  • 250. x0z=n x0x=n where the …rst term converges in distribution to N (0; 2=c) and the sec- ond term to N 0; c d 2 since plim x0z=n x0x=n = c=d: Note that these limit- c and pn ing distributions are the same as for pnx0
  • 251. e
  • 252. c d and hence does not depend on x and z. By independence of and e
  • 253. it follows that pn (e ) converges in distribution to the sum of two independent normal random variables. That is, pn (e ) d! N 0; 2 c + c2 d2 : (c) In both cases we have convergence in distribution when scaling by pn: It follows from corollary 2 in the Asymptotics lecture notes that (b ) p! 0 and (e ) p! 0 and the estimators are consistent. 24